DSPT#9 - Perks of a Data Wizard:mastering the "R" cane & opening Pandora's box


Put on your stylish robes, capes and staves (did you know Farfetch (https://www.farfetch.com) sells Dalaran Premium™ mage items? Go check it out!) and join us for another magical lesson from Data Science Portugal (https://www.facebook.com/datascienceportugal/)! 😎

Teaching you the most powerful spells on the Data Science tome of knowledge, this gathering will dazzle your mind and pleasure your ears to the tunes of Music Information Retrieval and Data Exploration with R!

Be one of the first to attend this event, prepare for this journey and meet other fellow Data magicians, warlocks, spellcasters, tricksters... and Data Wizards in general! First timers: we...have...beer!

The nineth meetup of Data Science Portugal (https://www.facebook.com/datascienceportugal/) is going to take place on Wednesday, 19th April, 2017 around 18h30, at Farfetch - Rua da Lionesa nº 446, Edifício G12,[masked] Leça do Balio - Porto.

=== SCHEDULE ===

The preliminary agenda for the meetup is the following:

• 18:30-18:50: Welcome and networking.

• 19:00-19:30: Talk 1: "Recommendation strategies for Music Discovery at Pandora" with Fabien Gouyon (https://www.linkedin.com/in/fgouyon/), Principal Scientist at Pandora (https://www.pandora.com/).

• 19:35-19:36: Group photo.

• 19:36-20:00: Networking / Coffee Break.

• 20:00-20:30: Talk 2: "On-going Deep Learning projects at Farfetch" with Hugo Pinto and Fábio Pinto (https://www.linkedin.com/in/f%C3%A1bio-pinto-62203448/), Data Scientists at Farfetch.

• 20:35-21h05: Talk 3: "Data wrangling in the “tidyverse” with R" with Jim Porzak (https://www.linkedin.com/in/jimporzak/), co-founder of the San Francisco Bay Area R User Group (https://www.meetup.com/R-Users/)and co-organizer of the East Bay R Beginners meetup (https://www.meetup.com/r-enthusiasts/).

• 21h10 - Closing, hanging out and some beers

Dinner is optional but it might be an excellent opportunity for networking.

Do you want to be a sponsor in future meetups? Please contact us to [masked]

See you there!

=== TALKS ===

Talk1: Recommendation strategies for Music Discovery at Pandora.

Recommending what you like best?, or recommending what’s novel and you might like? This is a difficult question to answer in the context of recommending music to millions of listeners every day. In this seminar I will talk about our research on Recommender Systems and Music Information Retrieval in the Pandora internet radio, I’ll focus on the trade-off between “exploration" and “exploitation" in music recommendation. I’ll show examples of the diverse types of data we work with, introduce the Music Genome Project, and provide insights on the design of ensembles of recommenders. And we’ll listen to some music of course!

Short Bio:

Fabien Gouyon is Principal Scientist at the internet radio Pandora, where he does applied research on personalized music recommendation. He also currently serves as President of the International Society for Music Information Retrieval. His research interests are in Data Science and Music Information Retrieval, with over 90 papers published in peer-reviewed international journals and conferences. He received a PhD in Computer Science (2005) from the Pompeu Fabra University in Barcelona and did a Post-doc in the Austrian Research Institute for Artificial Intelligence in Vienna [masked]). He started and led the Sound and Music Computing Group at INESC TEC in Porto [masked]).


Talk2: On-going Deep Learning projects at Farfetch

Deep learning is a relatively new area of Machine Learning that has been achieving state-of-the-art results across several domains. As Data Scientists at Farfetch working directly with the Product team, it is our responsibility to embed our products with disruptive technology and enhance customer experience. In this talk, we will provide an overview of several on-going (and yet to come) projects in which Deep Learning methods play a major role, such as modeling fashion sense to provide recommendations of complementary products, improve the search engine through language modeling and enrich product descriptions with visual features.

Short Bio:

Hugo Pinto is a Data Scientist at Farfetch working with the Search Team in the areas of Natural Language Processing, Learning to Rank and Computer Vision. With background in Physics and Applied Maths (Dynamical Systems, Optimization and Control), he has been working in the past years in Algorithmic Trading and Modeling at stock exchange markets and in several research projects as a statistician consultant. His main research interests 'lie' in the fields of Statistical Learning and Modeling, Deep Learning, Probabilistic Graphical Models and Optimization Algorithms.
Fábio Pinto is a Data Scientist at Farfetch, a luxury fashion e-commerce platform, working on recommender systems to enhance online user experience at the Farfetch website. He is currently a PhD student in Machine Learning at the Engineering Faculty of the University of Porto, after finishing is MSc degree in Data Analysis and Decision Support Systems and graduating in Economics at the University of Minho. In the past, he has collaborated with several R&D projects, particularly on retail and predictive maintenance in an industrial scenario. His main research interests 'lie' in metalearning and automatic machine learning.


Talk3: Data wrangling in the “tidyverse” with R.

There is a revolution in the R community called the “tidyverse” instigated by Hadley Wickham, his students and co-workers at RStudio (Hadley is best known for his ggplot2 package which transformed data visualization in R). The workhouse in tidyverse is the dplyr package for efficiently doing data transformations. It is efficient in two senses: first, very efficient coding and debugging; secondly, it is fast. Under the hood, most of the work is done in C++ when working on local data. Larger data sets can reside on relational DBMS’s for which dplyr pushes up SQL to do most of the work. This is particularly effective when the data summarization is done on the server and dplyr only pulls down to R a very reduced set of data for final analysis and presentation. PostgreSQL and AWS Redshift are good targets for dplyr. This talk will be a quick introduction to dplyr (and friends like readr, tidyr, and tibble) with a few real-world examples from my recent projects. For a deep dive into tidyverse, dplyr, and much more, see the on-line version of Hadley Wickham & Garrett Grolemund’s R for Data Science at http://r4ds.had.co.nz/ .

Short Bio:

Jim Porzak is a (semi) retired data scientist living near Berkeley, California specializing in data-driven customer insights. He has been using R since 2002, is co-founder of the San Francisco Bay Area R User Group and co-organizer of the East Bay R Beginners meetup. Recent clients include Lynda.com, One Medical, Leitersburg Cinemas, and Li & Fung. Before retiring he did customer insights / BI work at, or for, Minted.com, Ancestry.com, Responsys, Sun Microsystems, LA Times, Apple, Chicago Sun Times, among others starting in 2003 at Loyalty Matrix. Jim is an active author and speaker both in the US and Europe. See his past presentations at DS4CI.org/Archives.