DataTalks #1: Web traffic estimation and big ML workflows


Details
http://photos1.meetupstatic.com/photos/event/7/7/d/b/600_449850683.jpeg
DataTalks (http://datahack-il.com/) #1: Web traffic estimation and big ML workflows
A rough schedule of DataHack (http://datahack-il.com/)'s first meetup:
• 18:00 - 18:15 - Gathering, snacks, mingling
• 18:15 - 18:20 - Opening the DataTalks meetup series
• 18:20 - 19:10 - First talk:
Roy Yadoo, SimilarWeb - Web traffic estimation as a meta-analysis challenge
• 19:10 - 19:20 - A short break
• 19:20 - 20:10 - Second talk:
Daniel Marcous, Google - Production-ready big ML workflows - From zero to hero
==== Talk #1 ===
Speaker: Roy Yadoo, SimilarWeb
Title: Web Traffic Estimation as a Meta-Analysis Challenge
Abstract: Every day, users around the world make over 10 billion visits to websites on their personal computers and mobile devices. Understanding the underlying patterns and behaviors is a central challenge in web research. At SimilarWeb, our goal is to measure and analyze the traffic of each website and mobile app in the digital world, with over 60 million sites and apps estimated daily. Our estimations rely on a variety of data sources, including our panel with millions of web users. Data sources in our panel can vary by size, bias and engagement. The challenge is to find a common truth among the noise, while considering additional business requirements, such as the competing objectives of accuracy vs. consistency.
In this meetup, I will present several approaches used at SimilarWeb for estimation, such as robust regressions, Bayesian estimators, outlier detection and others.
==== Talk #2 ===
Speaker: Daniel Marcous, Google
Title: Production-Ready BIG ML Workflows - from zero to hero
Abstract: Data science isn't an easy task to pull of. You start with exploring data and experimenting with models. Finally, you find some amazing insight!
What now? How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
- Small - Medium experimentation (R)
- Big data implementation (Spark Mllib /+ pipeline)
- Setting Metrics and checks in place
- Ad hoc querying and exploring your results (Zeppelin)
- Painpoints & Lessons learned the hard way (is there any other way?)
http://photos3.meetupstatic.com/photos/event/7/7/f/a/600_449850714.jpeg

DataTalks #1: Web traffic estimation and big ML workflows