Skip to content

DataTalks #20: Data Science Work Flow - Challenges and Best Practices

A
Hosted By
Asaf V. and 3 others
DataTalks #20: Data Science Work Flow - Challenges and Best Practices

Details

Data Science Work Flow - Challenges and Best Practicesืฅ
๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—น๐—ถ๐—ป๐—ธ ๐—ฏ๐—ฒ๐—น๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฒ๐—ฒ ๐—ฏ๐˜‚๐˜ ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜†!

๐—”๐—ด๐—ฒ๐—ป๐—ฑ๐—ฎ:
๐Ÿ• 18:00 - 18:30 - Gathering, registration, snacks & mingling
๐Ÿ”ถ 18:30 - 19:15 - How Do We Do Massively Parallel Feature Extraction in WSC Sports Using Amazon SageMaker Batch Transform - Nir Dagani
๐Ÿ”ท 19:15 - 20:00 - Mental Models for the Data Science Workflow - Guy Smoilovsky

๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: http://bit.ly/DataTalks_20
๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—น๐—ถ๐—ป๐—ธ ๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฒ๐—ฒ ๐—ฏ๐˜‚๐˜ ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜†!

The "correct" data science workflow is a work in progress. There are many technical problems, not all of which have good tools yet.

To make things more complicated, the number of services and tools is exploding rapidly, and extracting a coherent picture is difficult

To discuss the challenges in data science๏ปฟ workflow (Experiment Tracking, Data Versioning, Massive scale feature extraction, Reproducibility...) and hear from the experience of DAGsHub and WSC sports join us in this meetup.

๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: http://bit.ly/DataTalks_20
๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—น๐—ถ๐—ป๐—ธ ๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฒ๐—ฒ ๐—ฏ๐˜‚๐˜ ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜†!

โธปโธปโธปโธปโธปโธปโธปโธปโธปโธปโธป
๐—›๐—ผ๐˜„ ๐——๐—ผ ๐—ช๐—ฒ ๐——๐—ผ ๐— ๐—ฎ๐˜€๐˜€๐—ถ๐˜ƒ๐—ฒ๐—น๐˜† ๐—ฃ๐—ฎ๐—ฟ๐—ฎ๐—น๐—น๐—ฒ๐—น ๐—™๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐—ป ๐—ช๐—ฆ๐—– ๐—ฆ๐—ฝ๐—ผ๐—ฟ๐˜๐˜€ ๐—จ๐˜€๐—ถ๐—ป๐—ด ๐—”๐—บ๐—ฎ๐˜‡๐—ผ๐—ป ๐—ฆ๐—ฎ๐—ด๐—ฒ๐— ๐—ฎ๐—ธ๐—ฒ๐—ฟ ๐—•๐—ฎ๐˜๐—ฐ๐—ต ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ - ๐—ก๐—ถ๐—ฟ ๐——๐—ฎ๐—ด๐—ฎ๐—ป๐—ถ
WSC Sportsโ€™ AI driven platform analyzes live sports broadcasts, identifies each and every event that occurs in the game, creates customized short-form video content and publishes to any digital platform.
Weโ€™ll review WSC research team challenges and workflow. Weโ€™ll dive deep into the system weโ€™ve recently built for running massively parallel feature extraction over 10โ€™s of thousands of video clips using DNN. How it reduced feature extraction time from a week to The solutions is based on Amazon SageMaker Batch Transform and docker containers.
โธปโธปโธปโธปโธปโธปโธปโธปโธปโธปโธป
๐— ๐—ฒ๐—ป๐˜๐—ฎ๐—น ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ณ๐—ผ๐—ฟ ๐˜๐—ต๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ช๐—ผ๐—ฟ๐—ธ๐—ณ๐—น๐—ผ๐˜„ - ๐—š๐˜‚๐˜† ๐—ฆ๐—บ๐—ผ๐—ถ๐—น๐—ผ๐˜ƒ๐˜€๐—ธ๐˜†
Mental Models for the Data Science Workflow

The "correct" data science workflow is a work in progress. There are many technical problems, not all of which have good tools yet.
To make things more complicated, the number of services and tools is exploding rapidly, and extracting a coherent picture is difficult.

It's a jungle out there.

At DAGsHub, we've interviewed data scientists, team leads, data engineers, and CTOs from over 100 companies in Israel and abroad, trying to get to the bottom of the workflow problems and the solutions people come up with. In this talk, we'd like to share:

  • The common patterns we found
  • More unique patterns, and how these divergences are closely linked to the type of problem you're trying to solve
  • How data science is different from software development
  • An overview of the popular tools for various parts of the workflow
  • Useful techniques and ideas
  • Effective collaboration with experiment tracking, reproducibility
  • A case for better open source data science
  • Memes, dog GIFs
    โธปโธปโธปโธปโธปโธปโธปโธปโธปโธปโธป
Photo of DataHack - Data Science, Machine Learning & Statistics group
DataHack - Data Science, Machine Learning & Statistics
See more events