DataTalks #20: Data Science Work Flow - Challenges and Best Practices

Details
Data Science Work Flow - Challenges and Best Practicesืฅ
๐ฅ๐ฒ๐ด๐ถ๐๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป ๐๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐น๐ถ๐ป๐ธ ๐ฏ๐ฒ๐น๐ผ๐ ๐ถ๐ ๐ณ๐ฟ๐ฒ๐ฒ ๐ฏ๐๐ ๐บ๐ฎ๐ป๐ฑ๐ฎ๐๐ผ๐ฟ๐!
๐๐ด๐ฒ๐ป๐ฑ๐ฎ:
๐ 18:00 - 18:30 - Gathering, registration, snacks & mingling
๐ถ 18:30 - 19:15 - How Do We Do Massively Parallel Feature Extraction in WSC Sports Using Amazon SageMaker Batch Transform - Nir Dagani
๐ท 19:15 - 20:00 - Mental Models for the Data Science Workflow - Guy Smoilovsky
๐ฅ๐ฒ๐ด๐ถ๐๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป: http://bit.ly/DataTalks_20
๐ฅ๐ฒ๐ด๐ถ๐๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป ๐๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐น๐ถ๐ป๐ธ ๐ถ๐ ๐ณ๐ฟ๐ฒ๐ฒ ๐ฏ๐๐ ๐บ๐ฎ๐ป๐ฑ๐ฎ๐๐ผ๐ฟ๐!
The "correct" data science workflow is a work in progress. There are many technical problems, not all of which have good tools yet.
To make things more complicated, the number of services and tools is exploding rapidly, and extracting a coherent picture is difficult
To discuss the challenges in data science๏ปฟ workflow (Experiment Tracking, Data Versioning, Massive scale feature extraction, Reproducibility...) and hear from the experience of DAGsHub and WSC sports join us in this meetup.
๐ฅ๐ฒ๐ด๐ถ๐๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป: http://bit.ly/DataTalks_20
๐ฅ๐ฒ๐ด๐ถ๐๐๐ฟ๐ฎ๐๐ถ๐ผ๐ป ๐๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐น๐ถ๐ป๐ธ ๐ถ๐ ๐ณ๐ฟ๐ฒ๐ฒ ๐ฏ๐๐ ๐บ๐ฎ๐ป๐ฑ๐ฎ๐๐ผ๐ฟ๐!
โธปโธปโธปโธปโธปโธปโธปโธปโธปโธปโธป
๐๐ผ๐ ๐๐ผ ๐ช๐ฒ ๐๐ผ ๐ ๐ฎ๐๐๐ถ๐๐ฒ๐น๐ ๐ฃ๐ฎ๐ฟ๐ฎ๐น๐น๐ฒ๐น ๐๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐
๐๐ฟ๐ฎ๐ฐ๐๐ถ๐ผ๐ป ๐ถ๐ป ๐ช๐ฆ๐ ๐ฆ๐ฝ๐ผ๐ฟ๐๐ ๐จ๐๐ถ๐ป๐ด ๐๐บ๐ฎ๐๐ผ๐ป ๐ฆ๐ฎ๐ด๐ฒ๐ ๐ฎ๐ธ๐ฒ๐ฟ ๐๐ฎ๐๐ฐ๐ต ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ - ๐ก๐ถ๐ฟ ๐๐ฎ๐ด๐ฎ๐ป๐ถ
WSC Sportsโ AI driven platform analyzes live sports broadcasts, identifies each and every event that occurs in the game, creates customized short-form video content and publishes to any digital platform.
Weโll review WSC research team challenges and workflow. Weโll dive deep into the system weโve recently built for running massively parallel feature extraction over 10โs of thousands of video clips using DNN. How it reduced feature extraction time from a week to The solutions is based on Amazon SageMaker Batch Transform and docker containers.
โธปโธปโธปโธปโธปโธปโธปโธปโธปโธปโธป
๐ ๐ฒ๐ป๐๐ฎ๐น ๐ ๐ผ๐ฑ๐ฒ๐น๐ ๐ณ๐ผ๐ฟ ๐๐ต๐ฒ ๐๐ฎ๐๐ฎ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐ฒ ๐ช๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐ - ๐๐๐ ๐ฆ๐บ๐ผ๐ถ๐น๐ผ๐๐๐ธ๐
Mental Models for the Data Science Workflow
The "correct" data science workflow is a work in progress. There are many technical problems, not all of which have good tools yet.
To make things more complicated, the number of services and tools is exploding rapidly, and extracting a coherent picture is difficult.
It's a jungle out there.
At DAGsHub, we've interviewed data scientists, team leads, data engineers, and CTOs from over 100 companies in Israel and abroad, trying to get to the bottom of the workflow problems and the solutions people come up with. In this talk, we'd like to share:
- The common patterns we found
- More unique patterns, and how these divergences are closely linked to the type of problem you're trying to solve
- How data science is different from software development
- An overview of the popular tools for various parts of the workflow
- Useful techniques and ideas
- Effective collaboration with experiment tracking, reproducibility
- A case for better open source data science
- Memes, dog GIFs
โธปโธปโธปโธปโธปโธปโธปโธปโธปโธปโธป

DataTalks #20: Data Science Work Flow - Challenges and Best Practices