Anyone who is interested in the movement of data and getting data from one place to another. We will be talking about cloud based and open sourced solutions - sharing our knowledge on how we've managed to achieve real-time streaming and the challenges we've faced along the way
Hi All, to keep you going with your monthly fill of data engineering, we will be bringing you an online edition this month.
🏠Platform Host: DataEngBytes - https://www.youtube.com/dataengau
🍕Food and Drink: You 😊
💬 Join our Slack Group here: https://goo.gl/forms/DVNazDmNBg1FFm2X2
🎤 Alagappan Sethuraman, Software Engineer, Facebook
Data Discovery with Amundsen
Data scientists at Lyft spend approximately 25% of their time in the data discovery—answering questions such as does this data exist? Who owns this data? What previous analysis exists? And can I trust this data? While data discovery is a prerequisite to delivering good analysis, it does not in itself bring value to the company. Reducing the time spent on data discovery enables data scientists to spend more time doing what they do best, building models and visualizations.
Amundsen is an open-source tool built at Lyft that aims to solve the data discovery problem. We index and serve metadata about data resources in a simple and intuitive interface. A user can run a search that will return a list of results sorted by relevance and popularity. Currently, we index tables and people, with plans to index dashboards and teams as well. At Lyft, we have reduced the time spent in data discovery by 75% from baseline.
Alagappan Sethuraman is a software engineer at Facebook. He is one of the founding engineers on the Amundsen project and is an active member of the OSS community. Previously Alagappan led teams at Lyft to help scale its data platform, making it easy for its consumers to make quick and better data-driven decisions.
Remember to bring along some great questions!