• A Common Model, Separated by Two Disciplines

    466 Lexington Ave


    We're starting off the new year with a talk about both R and Stan. Thank you to New York Presbyterian for hosting. About the Talk: Factorization machines are a powerful, flexible, and interpretable tool for modeling interactions between variables. However, there are a dearth of implementations of these models which allow for principled uncertainty estimates, especially in R. This talk will discuss implementing Bayesian Factorization Machines in R and Stan, first introducing basic models, and then, extending these models using techniques from modern hierarchical Bayesian modeling. About Adam: Adam Lauretig is the Senior Data Scientist at JUST Capital, where he works on Bayesian discrete-choice models, ranking methodology, and analyzing survey data. Previously, he completed his Ph.D. in political methodology at The Ohio State University, where he developed Bayesian Word Embeddings to study decision-making American Foreign Policy, and worked on a project with the World Bank, examining the effects of democratization on discrimination in the Indonesian Civil Service. Pizza (https://bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • Totally Tidy Tuning Tools

    Rise New York


    We have Max Kuhn, the author of caret and the tidymodels suite of packages, speaking about the new tune package. Thank you to Rise NY for hosting us. About the Talk: Many machine learning models have hyperparameters whose values cannot be analytically estimated from the data. For example, the number of neighbors in a nearest-neighbor classification model. The tidymodels collection of R packages are a tidyverse interface to modeling. In this talk, the new tune package will be introduced. Examples of the tidy interface to model tuning will be demonstrated. About Max: Max Kuhn is a software engineer at RStudio. He is currently working on improving R's modeling capabilities. He was a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He was applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. Max is the author of numerous R packages for techniques in machine learning and reproducible research and is an Associate Editor for the Journal of Statistical Software. He, and Kjell Johnson, wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, which recognizes the best book reviewed in Technometrics in 2015. Their latest book, Feature Engineering and Selection, was published in 2019. Pizza (https://bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • Simulating A Universe in Tidyverse: Using R to Generate Statistical Simulations

    Tidyverse meets simulations during this talk from Sebastian Teran Hidalgo. Thank you to Rise NY for hosting us. About the Talk: This talk will cover different tidyverse packages (dplyr, tidyr, purrr, ggplot2) and how to use them to create and work with statistical simulations. We will look at how to use simulations for statistical insights, power analyses with A/B testing, and to create aRt. The target audience is R users that want to start using simulations in their work or those who have experience with simulations but not with the tidyverse. About Sebastian: Sebastian Teran Hidalgo is a data scientist at Vroom, an e-commerce start-up that focuses on selling used cars that are delivered to your door. Previously, Sebastian was doing research on cancer genomics as a postdoc at Yale University. He holds a PhD in Biostatistics from UNC-Chapel Hill. Pizza (https://bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • A Matter of Time: A Brief History of Time Series in R and Beyond

    Visiting Nurse Service of New York


    We have more time series, this month from Jeff Ryan, the creator of the xts and quantmod packages. Thank you to the Visiting Nurse Service of New York for hosting us at the Daily Planet (https://bit.ly/2IryeJZ), I mean Daily News, Building About the Talk: Time series are everywhere, and fundamentally different than all other data. To effectively work with these series demands highly specialized tools--as dates, times, notation, calendars, time zones, and more must be carefully managed while doing even basic operations like subsetting, merging, aggregating and dealing with missingness. If not handled properly, all of your analysis could be at risk. Specialized tools for time have been part of R since the beginning, but have seen enormous changes over the last decade. In this talk we’ll explore where we started, what we have now and what might be on the horizon. I'll try to show the whole story in code and tell my unique perspective as someone who’s been lucky enough to be part of its history--and actively working on crafting its future for R, and beyond. About Jeff: Jeff Ryan is a practitioner in the quantitative hedge fund space and a long time contributor to R in both finance and data management. He introduced quantmod and xts in 2007, and co-founded the successful R/Finance conference, held annually since 2009. Over the years he's watched how time series went from an esoteric topic to the hottest thing in databases. He thinks this increased focus is only getting started. He's a bit opinionated, and enjoys building simple but powerful tools that can stand the test of time. Jeff is currently working on what he believes is be the next big step in time series management. Pizza (https://bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • The ggplot glow-up: Making Lovely Data Visualizations in R

    33 Thomas Street


    It has been long time since we have done a ggplot talk so Luda will discuss packages that extend ggplot's functionality. Thank you to AT&T Labs for hosting us. About the Talk: This talk will cover using ggplot and other ggplot-friendly packages in order to make impressive data visualizations with R. We will look at how several well-liked Tidy Tuesday submissions were generated and use a few packages that allow for different types of visualizations while also discussing ways to make even a simple bar chart more compelling. About Luda: Ludmila Janda is a Data Scientist at Amplify, a pioneer in K–12 education since 2000, leading the way in next-generation curriculum and assessment. Today, Amplify serves four million students in all 50 states. Luda’s work provides insights on student and teacher usage, student success, and Amplify’s broader impact. She is a proud R-Lady and has a Master’s in Public Policy from the University of North Carolina-Chapel Hill. Follow her on twitter at @ludmila_janda. Pizza (bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • Training & Explaining: Demystifying Long Short-Term Memory (LSTM) Behaviors

    Turning again to deep learning we have Bill Gold talking to us about LSTM models. Thank you to AT&T Labs for hosting us. About the Talk: Deep learning models can be highly effective at finding patterns contained in unstructured data. The impact on our lives from these deep learning models is profoundly redefining consumer products and business processes. However, deep learning is a black box often using millions of activations. As a result, businesses that value explainability resist or do not leverage deep learning. For example, some risk, compliance and legal stakeholders resist using deep learning. The topic of this presentation is training & explaining behaviors, strengths and gaps in deep learning and natural language processing (NLP) with Long Short-Term Memory (LSTM) models. About BIll: Bill Gold’s career intersects data science, management consulting and technology. He has delivered $500MM+ in ROI to financial services, healthcare and legal clients. Across the customer lifecycle he’s trained and deployed: hundreds of models (some patented), commercial software and data science infrastructure with trillions of transactions and hundreds of terabytes of data, all servicing thousands of data scientists. Bill guest lectures at Columbia and NYU, is working toward an MS in Computer Science & Machine Learning from Georgia Tech (expected 2022) and has a BS in Electrical Engineering from Hofstra University. Pizza (bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • nullabor: Tools for Testing Whether What You See in Plots is Really There

    Continuing our ten year celebration we have Di Cook coming all the way from Australia. Thank you to Rise New York for hosting us this month. About the Talk: Have you ever been in a talk where the speaker flashed a plot onto the screen and said something like "you can see that ...." but you couldn't see that. We will describe the tools in the nullabor package that allow you to check that what you see in a plot is really there. These include drawing samples from different null distributions and making lineups for the data plot, computing p-values and power, and measuring difference in plots numerically. About Di: Di Cook is Professor of Business Analytics in the Department of Econometrics and Business Statistics at Monash University in Melbourne, Australia. Her research focuses on data visualisation, primarily visualising high-dimensional spaces, and more recently, statistical inference for data plots. She is a contributing author of numerous R packages, and maintainer of a couple. Pizza (bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • Making Bayes Easier: A Tour Through rstanarm and the Stan Ecosystem in R

    We continue our tenth anniversary celebration with R Week! We have four days of R events, including a Meetup, a day of workshops and two days of the fifth annual New York R Conference. Visit www.rstats.nyc for more information. First up, we have repeat Meetup speaker Jonah Gabry talking about Bayes in R using Stan. Thank you to Rise New York for hosting us this month. About the Talk: In this talk we will demonstrate the rstanarm interface to Stan, which emulates standard R model-fitting functions but uses Stan for the back-end estimation. As part of the demonstration we will also take a tour of the broader Stan ecosystem in R, which consists of many packages that assist with various stages of the applied Bayesian workflow. Finally, we will briefly cover the rstantools package, which enables anyone to develop their own R packages interfacing with Stan and contribute to the growing number of R packages available for fitting complex Bayesian models. About Jonah: Jonah Gabry is a member of the Stan development team and a researcher at Columbia University working with Andrew Gelman on methods and software for Bayesian data analysis. He is co-author of the rstan and rstanarm R packages, which provide interfaces to Stan, as well as the bayesplot and shinystan packages for model visualization, the loo R package for approximate cross-validation and model comparison, and the rstantools package for assisting with Stan-based R package development. Pizza (bit.ly/pizzapoll) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • Whose Line Graph is it Anyway? Improvising an Exploratory Data Analysis with R

    This month marks the tenth anniversary of the meetup! We've grown from 21 people to over 10,000 in that time. To celebrate we will have a series of events in the coming weeks, including meetups, workshops and the Fifth Annual New York R Conference (www.rstats.nyc). First up we have David Robinson. About the Talk: The best approach to a technical presentation is careful planning and preparation. But where's the fun in that? In this talk, I'll demonstrate an exploratory data analysis in R on a dataset I've never seen in advance, and which was chosen by a friend for its novelty value. I'll demonstrate the use of tools such as dplyr and ggplot2 for data transformation and visualization, as well as other packages from the tidyverse as they're needed. I'll narrate my thought process to show how a data scientist thinks through a problem, and take suggestions from the audience at key points. About David: David Robinson is the Chief Data Scientist at DataCamp. He has previously worked as a data scientist at Stack Overflow and received his PhD from Princeton University. He is the co-author with Julia Silge of the tidytext package and the O’Reilly book Text Mining with R, as well as the author of the broom and fuzzyjoin R packages and of the e-book Introduction to Empirical Bayes. He writes about R, statistics and education on his blog Variance Explained, as well as on Twitter as @drob. This talk is inspired by his experience sharing screencasts of improvised data analyses on YouTube. Pizza (www.nyhackr.org/pizzapoll.html) begins at 6:30, the talk starts at 7, then after we head to the local bar.

  • Choosing a Deep Learning Library: There are a Lot of Them

    Once again, we bring you Deep Learning. This month's meetup is sponsored by the O'Reilly Artificial Intelligence Conference taking place at the New York Hilton April 15-18. Use code UGNYHACKR for a 20% discount. You can also use code nyhackr for a 20% discount to the New York R Conference May 9-11. About the Talk: The rise of deep learning has led to innovations such as self-driving cars, AlphaGO’s world champion beating Go AI, and the rise of voice assistants and computational photography. As a result, there has been a surge in companies and researchers looking to find the best deep learning library to experiment with, build their next app, or modernize their existing products in the age of AI. There are many deep learning toolkits, ranging from long used, supported, and robust academic libraries to new, state-of-the-art, industry-backed platforms. In this talk, I’ll share lessons learned from my fortunate (unfortunate?) experience working with different deep learning libraries in production. I will focus on what to look for in a deep learning library with a lens on building reliable, production-ready applications and services, or experiments that are easy to design and iterate. This talk will help you gain an understanding of ALL of the deep learning libraries: from the popular (Tensorflow, PyTorch, Caffe), to the lesser-known (CNTK, Deeplearning4j, coreML). You might even find that your best fit is using more than one! About Jesse: Jesse Brizzi is a Computer Vision Research Engineer at Curalate, a Philadelphia-based startup that leverages social content imagery and influencers/audiences to help their clients to sell more effectively online. He owns the full pipeline of Machine Learning Software Development; from the research and training of state-of-the-art deep learning models; to the engineering, design, and deployment of those models and services into production environments that see millions of deep learning requests a day. Jesse earned his MS in Computer Science from Stony Brook University, where he specialized in Computer Vision and Machine Learning. He graduated from the University of South Florida with a BS in Computer Science. In his spare time, he likes powerlifting, gaming, transportation memes, and eating at international McDonald’s. Follow his work at www.jessebrizzi.com. Pizza (nyhackr.org/pizzapoll.html) begins at 6:30, the talk starts at 7, then after we head to the local bar.