R Stories from the Trenches
Details
This talk will have two parts:
• In the first part I will reveal why I switched to R for most of my data munging, analysis, visualization and machine learning needs in 2006. While R has been dominant in academia for long, by (accidentally) founding the R meetup in Los Angeles in 2009 I had the privilege to witness closely the raise of R in the industry and how it became the most widely used tool in the field that now is called data science. I’m going to share a few use cases from past LA R meetup talks and from informal discussions with seasoned R users in LA working for companies such as Google, Netflix, Activison etc.
• In a second and more technical part of the talk, I’m going to present a benchmark of tools for interactive data munging and show how using R on a laptop you can get results faster than using a Hadoop/Spark cluster on datasets with hundreds of millions of rows. I will also argue about the importance of data visualization and touch a bit on the topic of machine learning with R.
Szilard Pafka (LinkedIn (https://www.linkedin.com/in/szilard), Twitter (https://twitter.com/DataScienceLA)) studied Physics in the 90s in Budapest and has obtained a PhD by using statistical methods to analyze the risk of financial portfolios. Next he has worked in a bank quantifying and managing market risk. About a decade ago he moved to California to become the Chief Scientist of a credit card processing company doing everything data (ETL, analysis, visualization, machine learning etc). He is also the founder/organizer of several data science related meetups in Santa Monica, the epicenter of startups and tech companies in the Los Angeles area.
Schedule:
• 18:30: gate opens, pizza/soft drinks/beer served
• 19:00: talk starts promptly
• then: have a few beers in a near pub with R folks.