• R Package Workshop

    1375 Broadway

    Sebastian's workshop will focus on R package development from the ground up. We will cover package structure, set-up, creating functions, documentation, testing, dependencies, sharing the package with others, package vignettes, hex stickers, and other methods. An example package will be developed during the workshop so attendees can follow along and create their own. Sebastian is a data scientist at Vroom, an ecommerce start-up that focuses on selling used cars that are delivered to your door. Previously, Sebastian was doing research on cancer genomics as a postdoc at Yale University. He has also been a strong advocate for R-Ladies and is an experienced CRAN package author. Emily Zabor and Erin Grand are also going to be serving has TAs for this workshop. Erin Grand works as a Data Scientist at Uncommon Schools where she maintains two R packages. Prior to Uncommon, she worked as a Data Scientist at Crisis Text Line and a software programmer at NASA. In the past, Erin researched star formation and taught introductory courses in astronomy and physics.

    7
  • Building Pipelines with Text and Data

    WeWork

    We are excited this month to host two very active R-Ladies from our chapter: Amanda Dobbyn and Maryam Jahanshahi! Amanda's talk will focus on the package 'drake'. `Drake` is a package designed to make designing and executing analysis pipelines easier and more reproducible. In this talk Amanda will run through an example of how `drake` can be used to manage an analysis workflow. Amanda's Bio: Amanda is a data scientist at Earlybird Software, a cloud application and data consulting company. She was previously an organizer of R-Ladies Chicago before escaping that frozen tundra for this one. Maryam's talk will focus on designing and executing text processing pipelines. In this talk, Maryam will share lessons and takeaways on text processing and data management for natural language processing. Maryam's Bio: Maryam is a Research Scientist at TapRecruit, a software startup that helps companies improve the fairness and efficiency of their recruiting processes. In a past life, Maryam was a cancer biologist and a data journalist. In a future life, Maryam hopes to be a carpenter! Agenda: 6-6:30: Arrival and networking 6:30-7:30 Talks 7:30-8: Networking

    9
  • R-Ladies Book Club: Algorithms to Live By

    Rizzoli Bookstore

    We’ll be reading Algorithms to Live By by Brian Christian and Tom Griffiths. https://www.rizzolibookstore.com/algorithms-live-computer-science-human-decisions-0

  • Git with Joyce

    Columbia University

    We are kicking off this year with an introduction to Github by Joyce Robbins. This is a hands on workshop so please bring your laptop to follow along and please be prompt to make sure the workshop kicks off on time:) About Joyce: Joyce Robbins, Ph.D., is Lecturer in Discipline in the Statistics Department at Columbia University, where she specializes in data visualization. She is a member of the R Forwards (R Foundation taskforce on women and other under-represented groups) teaching team, and elected to serve as publications officer of the Statistical Graphics Section of the American Statistical Association Section beginning in 2018. Robbins received her doctorate in sociology from Columbia University, her M.A. in sociology and anthropology from Tel Aviv University and her B.S.E. in civil engineering and operations research from Princeton University.

    5
  • R-ladies New York End of Year Social

    AT&T NYC Data Science Research Center

    Happy Holidays! For the end of the year we are looking to celebrate how far R-Ladies New York has come and thank all of you for your incredible support. Here are the deets: We are doing this potluck style so sign up below to bring something: https://docs.google.com/spreadsheets/d/1ytU6boa7oDAM3gujH0NRIy9ezS03P4Q4BWx6v6StIcY/edit?usp=sharing to bring something to the party! Dress in casual or holiday gear - we welcome it all!

  • Reproducibility and Communication in R

    620 8th Ave, Manhattan, NY 10018

    This month it is our please to have Noam Ross and our very own, Ludmila Janda present on Reproducibility and Communication in R! Agenda: 6:30-6:45pm General networking 6:45-6:55pm R-Ladies New York Announcements 6:55-7:25pm Reproducibility in an Office World or: How I Learned to Stop Worrying and Love OpenXML by Noam Ross 7:25-7:55pm A Data Odyssey: Communicating Results With Coworkers by Ludamila Janda 7:55-8:00pm R-Ladies Community announcement 8:00-8:30pm Networking Title: Reproducibility in an Office World or: How I Learned to Stop Worrying and Love OpenXML Abstract: Many data scientists operate at the interface of two cultures with different tools and workflows - programmatic workflows (e.g., R Markdown) and WYSIWYG documents (e.g., Microsoft Word). The noisy interface between these can be an impediment to reproducibility, as well as a royal pain. I will discuss approaches I and others have tried in dealing with these issues, and why and how some have failed and succeeded. I'll also demonstrate some tools, including packages officer and rvg and some rarely used feaures of rmarkdown that ease the flow when collaborating across the divide. Bio: Noam Ross is a Senior Research Scientist at EcoHealth Alliance, a non-profit in NYC that researches the connections between human and wildlife health. Noam builds models to understand and predict disease circulation in wildlife and spillover into people. Noam is also editor for software peer review at rOpenSci, a developer collective that builds R packages and catalyzes communities to enable open research and data. He has a Ph.D. in ecology from the University of California-Davis. Follow him on twitter at @noamross. Title: A Data Odyssey: Communicating Results With Coworkers Abstract: Often, data scientists are tasked with communicating their results with people in the workplace who do not share the same technical background. Bridging the gaps between parties can be a perilous journey. In this talk, I will discuss how I have attempted to navigate this tricky terrain and provide some pointers for clear language use and data visualization choices. I’ll demonstrate how I pass on insights through visualizations using packages such as ggridges and ggalluvial and how I use rmarkdown to craft reports and take my coworkers on data adventures. Bio: Ludmila Janda is a Data Scientist at Amplify, a pioneer in K–12 education since 2000, leading the way in next-generation curriculum and assessment. Today, Amplify serves four million students in all 50 states. Luda’s work provides insights on student and teacher usage, student success, and Amplify’s broader impact. She is a proud RLady and has a Master’s in Public Policy from the University of North Carolina-Chapel Hill. Follow her on twitter at @ludmila_janda.

    1
  • R-Ladies Book Club: R Packages by Hadley Wickham

    Rizzoli Bookstore

    We’re reading R Packages by Hadley Wickham this quarter - bonus points if you give writing a package a try!

    7
  • Parallel Computing in R

    OppenheimerFunds

    We're excited to host Jared Lander, Chief Data Scientist of Lander Analytics, the organizer of the New York Open Statistical Programming Meetup and the New York R Conference, and author of R for Everyone, to talk about parallel computing in R. Agenda: 6:15-7: Food & networking 7-7:10: Kick-off and announcements 7:10-7:45: Talk 7:45-8:30: Networking Talk: Everyone wants their code to run faster and there are numerous ways to achieve this goal. We start by looking at popular packages `dplyr`, `data.table` and `purrr` and the corresponding parallel implementations. We then turn our attention to writing simple C++ functions integrated into R, both sequentially and in parallel. We also build a `data.frame` aggregation function, starting sequentially, ending in parallel. Throughout this talk we see how to speed up code by running in parallel, locally and across nodes, in R and C++, all within the friendly confines of RStudio. About Jared: Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts. He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike.

    4
  • Learn OOP and discover cool finance packages in R

    AT&T NYC Data Science Research Center

    For this event we will explore how to do OOP in R and learn the various use cases of the Performance Analytics Package in R Agenda: 6:30-6:45pm Introductions and Social 6:45-6:55pm R-Ladies New York Announcements 6:55-7:25pm Object Oriented Programming in R 7:25-7:55pm Performance Analytics with R 7:55-8:00pm Community Announcements 8:00-8:30pm Networking. Object Oriented Programming in R (Moved due to technical difficulties in August) In this talk, Soumya will provide an introduction to using these programming techniques in workflows and project development. She will primarily focus on building functions, S3 classes, as well as the most recent R6 classes. She will also provide the resources she used in her learning journey. Soumya is a quantitative analyst at the New York Federal Reserve where she focusing on developing models and tools as related to stress testing work. She is an organizer on the R-Ladies New York Board and on the committee for the R in Finance. Performance Analytics in R Gabby will introduce the PerformanceAnalytics package: a tool for performance and risk analysis on financial data. She will cover how to work with financial time series data in R, and useful metrics such as annualized return, standard deviation, sharpe ratios etc. Link: https://cran.r-project.org/web/packages/PerformanceAnalytics/vignettes/PA-charts.pdf Gabby is a Financial Planning & Analysis associate at Two Sigma Investments. She learned R at Columbia in a datamining course and first employed it as an economic consultant analyzing the performance of loans underlying residential mortgage backed securities at NERA economic consulting. She is an organizer on the R-Ladies New York Board.

    1
  • Building Infrastructure with R

    The New York Times TimesCenter, 242 W 41st St, New York, NY 10036

    For this event we will explore how to build tools and infrastructure with R with Soumya Kalra and Emily Dodwell. Agenda: 6:15-6:30pm Introductions and Social 6:30- 6:45 pm NYT announcements (Data, Tech, HR) 6:45-6:55pm R-Ladies New York Announcements 6:55-7:25pm Object Oriented Programming in R 7:25-7:55pm Big Data in R with Small Prototypes: Scaling Research Workflows from [/Local] to [Cloud/Cluster] 7:55-8:00pm Community Announcements and News/Move to the 5th Floor for food/refreshments 8:00-8:15pm Networking. Object Oriented Programming in R In this talk, Soumya will provide an introduction to using these programming techniques in workflows and project development. She will primarily focus on building functions, S3 classes, as well as the most recent R6 classes. She will also provide the resources she used in her learning journey. Soumya is a quantitative analyst at the New York Federal Reserve where she focusing on developing models and tools as related to stress testing work. She is an organizer on the R-Ladies New York Board and on the committee for the R in Finance. Big Data in R with Small Prototypes: Scaling Research Workflows from [/Local] to [Cloud/Cluster] In this talk motivated by a recent project, Emily will discuss the tools her team at AT&T Labs explored when their typical workflow to process a data set and build a machine learning model in R would not scale due to the size of the data. She will provide some considerations for the data scientist faced with such a challenge, as well as a brief introduction to sparklyr. Created by the RStudio team, this package provides a dplyr interface to Spark from R, and thereby enables straightforward access to Spark’s distributed machine learning algorithms in MLlib. Emily is a Senior Inventive Scientist in the Statistics Research Department at AT&T Labs, where she currently focuses on predictive modeling for advertising applications and the creation of interactive tools for data analysis and visualization. She is a member of R Forwards (https://forwards.github.io/), the R Foundation taskforce on women and other under-represented groups. Prior to joining AT&T Labs in 2015, Emily taught high school math for three years at Choate Rosemary Hall. She received her M.A. in statistics from Yale University and B.A. in mathematics from Smith College.

    2