DataEng meetup, April Edition
Details
Hi All, to keep you going with your monthly fill of data engineering, we will be bringing you an online edition this month.
🏠Platform Host: DataEngBytes - https://www.youtube.com/dataengau
🍕Food and Drink: You 😊
💬 Join our Slack Group here: https://goo.gl/forms/DVNazDmNBg1FFm2X2
Speakers:
🎤 Gian Merlino, CTO and Co-Founder, Imply
Inside Apache Druid's Storage and Query Engine
Apache Druid is an open-source columnar database known for high performance at scale; its largest deployments comprise thousands of servers. But no matter the scale, high performance starts with good fundamentals. This talk will dive into those fundamentals by exploring the inner workings of a single data server. We’ll cover how Apache Druid stores data, what kinds of compression it uses, how it indexes data, how the storage engine is linked with the query processing engine, and how the system handles resource management and multithreading. Together, all these pieces enable Apache Druid to process billions of records per second on a single data server.
Gian Merlino is CTO and a co-founder of Imply, a San Francisco based technology company, and the Apache Druid PMC chair. Previously, Gian led the data ingestion team at Metamarkets (now a part of Snapchat) and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.
🎤 Molly Vorwerck, Head of Content & Community, Monte Carlo
🎤 Ryan Kearns, Data Scientist, Monte Carlo
The Next Frontier of Data Engineering
To keep pace with data’s clock speed of innovation, data engineers need to invest not only in the latest modeling and analytics tools, but also technologies that can increase data trust and prevent broken pipelines. The solution? Data observability, the next frontier of data engineering. We'll discuss why data observability matters to building a better data quality strategy and tactics best-in-class organizations use to address it -- including org structure, culture, and technology. More info: https://www.montecarlodata.com/data-observability-the-next-frontier-of-data-engineering/
Molly Vorwerck is the Head of Content and Community for Monte Carlo, a data reliability company backed by Accel, GGV, Redpoint, and other top Silicon Valley investors, and creator of the industry-leading Monte Carlo Data Observability Platform. Previously, she led the Tech Brand team at Uber, where she managed editorial strategy for the Uber Engineering Blog and Uber AI. She graduated from Stanford University with a B.A. in American Studies. When she’s not writing or thinking about data, she’s probably watching The Great British Baking Show or reading a murder mystery.
Ryan is a Stanford undergraduate studying Computer Science and Philosophy, currently taking a year off to work for Monte Carlo, currently focused on building the company's anomaly detection algorithms. He is the instructor of the first-ever course on Data Observability through O'Reilly Safari. When not coding, you can find him on a run or planning his next road trip.
Remember to bring along some great questions!




