The Many Uses of Graph Databases & Building Scalable Pipelines for ML

Are you going?

113 people going

Location image of event venue


QuantumBlack has agreed to sponsor our August Meetup
There will be couple of talks, lots of networking along with pizza and beer!
Please bring an ID to get into the building.

- - - -
6:00pm - 6:30pm
Networking and snacks
6:30pm - 7:00pm
"Stop it with the tables already!" talk by Zac Ernst
7:00pm - 7:30pm
"Building Scalable Data Pipelines for Machine Learning" talk by Harsha Gopu
7:30pm - 8:00pm
Q/A with Speakers and QuantumBlack

- - - -
Tables aren't always the solution! (Store your data in graphs, instead)

Zac Ernst

Tables aren't always the solution. They destroy important information, which we as data engineers waste our time trying to reconstruct. But there is an alternative. Graph databases such as Neo4j can serve as a powerful integration layer, and can often take the place of tables. Using a graph as an integration later can make unifying disparate data sources (almost) painless, providing a global view of all your data. I’ll demo one approach to doing this using Neo4j, Kafka, and a simple DSL written in Python.

Speaker Bio:
Zac is a Principal Data Engineer at QuantumBlack. Before that, he was a data engineer at a couple of Chicago startups. Before going into tech, Zac was an associate professor of philosophy, specializing in logic and game theory. And before becoming a professor, he worked for Argonne National Laboratory, where he researched techniques in automated theorem proving.

- - - -
Building Scalable Data Pipelines for Machine Learning

Hasrha Gopu

Companies now need to apply machine learning (ML) techniques on their data in order to remain relevant. Among the new challenges faced by data scientists is the need to build get access to large data sets so that trained models can scale to run with production data.

Aside from dealing with larger data volumes, these pipelines need to be flexible in order to accommodate the variety of data and the high processing velocity required by the new ML applications. Apache Airflow and Spark addresses these challenges by providing a highly scalable technology for autoscaling big data engines.

Speaker Bio:
Harsha is a Solutions Architect at Qubole where he brings 20 years of experience in data analytics and engineering to help customers on their big data journey. He has subject matter expertise in building big data applications with Apache Spark, Presto, Hive and other Open Source technologies such as Airflow, Apache Zeppelin etc. Prior to Qubole, Harsha worked as Data Architecture Lead building various big data applications for retail, transportation and telecom organizations.

This event is sponsored by QuantumBlack, a McKinsey Company.