DataTalks #20: Data Science Work Flow - Challenges and Best Practices

Hosted by Asaf V. and 3 others

DataHack - Data Science, Machine Learning & Statistics

Details

Data Science Work Flow - Challenges and Best Practicesץ
𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝘂𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗹𝗶𝗻𝗸 𝗯𝗲𝗹𝗼𝘄 𝗶𝘀 𝗳𝗿𝗲𝗲 𝗯𝘂𝘁 𝗺𝗮𝗻𝗱𝗮𝘁𝗼𝗿𝘆!

𝗔𝗴𝗲𝗻𝗱𝗮:
🍕 18:00 - 18:30 - Gathering, registration, snacks & mingling
🔶 18:30 - 19:15 - How Do We Do Massively Parallel Feature Extraction in WSC Sports Using Amazon SageMaker Batch Transform - Nir Dagani
🔷 19:15 - 20:00 - Mental Models for the Data Science Workflow - Guy Smoilovsky

𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻: http://bit.ly/DataTalks_20
𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝗮𝘁𝗶𝗼𝗻 𝘂𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗹𝗶𝗻𝗸 𝗶𝘀 𝗳𝗿𝗲𝗲 𝗯𝘂𝘁 𝗺𝗮𝗻𝗱𝗮𝘁𝗼𝗿𝘆!

The "correct" data science workflow is a work in progress. There are many technical problems, not all of which have good tools yet.

To make things more complicated, the number of services and tools is exploding rapidly, and extracting a coherent picture is difficult

To discuss the challenges in data science workflow (Experiment Tracking, Data Versioning, Massive scale feature extraction, Reproducibility...) and hear from the experience of DAGsHub and WSC sports join us in this meetup.

⸻⸻⸻⸻⸻⸻⸻⸻⸻⸻⸻
𝗛𝗼𝘄 𝗗𝗼 𝗪𝗲 𝗗𝗼 𝗠𝗮𝘀𝘀𝗶𝘃𝗲𝗹𝘆 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 𝗶𝗻 𝗪𝗦𝗖 𝗦𝗽𝗼𝗿𝘁𝘀 𝗨𝘀𝗶𝗻𝗴 𝗔𝗺𝗮𝘇𝗼𝗻 𝗦𝗮𝗴𝗲𝗠𝗮𝗸𝗲𝗿 𝗕𝗮𝘁𝗰𝗵 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺 - 𝗡𝗶𝗿 𝗗𝗮𝗴𝗮𝗻𝗶
WSC Sports’ AI driven platform analyzes live sports broadcasts, identifies each and every event that occurs in the game, creates customized short-form video content and publishes to any digital platform.
We’ll review WSC research team challenges and workflow. We’ll dive deep into the system we’ve recently built for running massively parallel feature extraction over 10’s of thousands of video clips using DNN. How it reduced feature extraction time from a week to The solutions is based on Amazon SageMaker Batch Transform and docker containers.
⸻⸻⸻⸻⸻⸻⸻⸻⸻⸻⸻
𝗠𝗲𝗻𝘁𝗮𝗹 𝗠𝗼𝗱𝗲𝗹𝘀 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄 - 𝗚𝘂𝘆 𝗦𝗺𝗼𝗶𝗹𝗼𝘃𝘀𝗸𝘆
Mental Models for the Data Science Workflow

The "correct" data science workflow is a work in progress. There are many technical problems, not all of which have good tools yet.
To make things more complicated, the number of services and tools is exploding rapidly, and extracting a coherent picture is difficult.

It's a jungle out there.

At DAGsHub, we've interviewed data scientists, team leads, data engineers, and CTOs from over 100 companies in Israel and abroad, trying to get to the bottom of the workflow problems and the solutions people come up with. In this talk, we'd like to share:

The common patterns we found
More unique patterns, and how these divergences are closely linked to the type of problem you're trying to solve
How data science is different from software development
An overview of the popular tools for various parts of the workflow
Useful techniques and ideas
Effective collaboration with experiment tracking, reproducibility
A case for better open source data science
Memes, dog GIFs
⸻⸻⸻⸻⸻⸻⸻⸻⸻⸻⸻

DataHack - Data Science, Machine Learning & Statistics

DataTalks #20: Data Science Work Flow - Challenges and Best Practices

DataHack - Data Science, Machine Learning & Statistics

Details

Related topics

You may also like