Skip to content

Data Council meetup @ Criteo - ML at Dailymotion and Big Data with SQL

Photo of Pete Soderling
Hosted By
Pete S.
Data Council meetup @ Criteo - ML at Dailymotion and Big Data with SQL

Details

Join us for this Data Engineering and Science meetup, with talks from Dailymotion and Criteo!

Schedule:
6:30 pm : Welcome
7:00 pm : Intro
7:05 pm : Talk 1 + Q&A
7:40 pm : Talk 2 + Q&A
8:15 pm : Networking, food and drinks
10:00 pm : End

Talk 1: "Fixing the Big Data Development Cycle with SQL" by Guillaume Bort, Senior Staff Engineer, SRE Data Processing (http://guillaume.bort.fr/) & Arnaud Dufranne, Staff Engineer, SRE Data Processing (https://www.linkedin.com/in/arnaud-dufranne-1ab64259/) - Criteo

Talk 2: "Dailymotion's Machine Learning, by Germain Tanguy (https://www.linkedin.com/in/germain-tanguy/) - Senior Data Engineer - Dailymotion

Talk 1 abstract:

"We all know how hard Big Data stacks can be to build, use and maintain. Gartner estimates that 85% of big data projects are killed before production release. In this talk engineering leaders from Criteo's SRE Data Processing team will show how they are using SQL to address one of the biggest issues in data engineering, that of developer productivity.

Criteo has hundreds of PBs of data under management with over ~150K physical cores and ~1.5PB of main memory available for processing it. In addition to the pure scale of the system there are 500+ developers from around the world interacting with the system directly, the vast majority of whom have at one point or another push data transformation code into production.

The unique challenges of truly huge scale, highly concurrent workloads, frequent releases and geographic distribution of users required an equally unique approach (and quite a lot of serious engineering and good old fashioned elbow grease). One doesn't have to look to very far back to realize that the RDBMS paradigm of a referentially transparent, lazily evaluated, declarative (and highly expressive) language executing on top of a separately optimizable and easily abstracted away run-time could reap huge benefits. With the advent of technologies like Hive, Spark-SQL and Presto we are clearly not the first engineers to think of the problem in these terms, but we decided to see just how far we could push SQL by leveraging it in every nook and cranny of our data infrastructure.

In this talk we will take a deep dive into our declarative data processing platform, Big Dataflow and show how it addresses the accidental complexity inherent in data engineering. Demo-effect notwithstanding, there will also be a demo!"

Talk 2 abstract:

"Wouldn’t it be great if we lived in a frictionless world where data engineers and data scientists built a perfect common ground for efficient exchanges? Unfortunately we’re not quite there yet, but we’ve been analyzing Dailymotion’s recent Data journey that focuses on how data engineers work with data scientists to improve production release.

The first part of the talk is about our machine learning blueprint and the common ground we found between data scientists and data engineers to maximize productivity. The second part of the talk illustrates the first part by describing a use case we implemented around channel categorization."

About Germain:
Germain is a senior data engineer at Dailymotion. He first worked in the analytics platform team, collecting and enriching event data and ensuring its accuracy, consistency and reliability. He is currently working in the content knowledge team, collaborating with data scientists to bring deep learning model to production at scale. He worked on improving collaboration between data scientist and data engineers to maximize productivity.

Photo of Data Council Paris Data Engineering & Science group
Data Council Paris Data Engineering & Science
See more events