Skip to content

Case Study: Machine Learning at Scale using Spark and Hive

Photo of Dan Lynn
Hosted By
Dan L.
Case Study: Machine Learning at Scale using Spark and Hive

Details

We are pleased to announce that Alex Sadovsky from Oracle Data Cloud and Ashish Thusoo from Qubole will present a real-world case study for implementing machine learning techniques at scale using Spark and Hive.

Agenda:

• 6:00 – 6:30 - Socialize over food and drink

• 6:30 – 6:45 - Announcements, Upcoming Events

• 6:45 – 8:30 - Alex's and Ashish's presentations

• 8:30 – ??? - Continued socializing

Topics:

Alex Sadovsky will provide an overview of how he and his team utilize Spark and Hive to provide cutting edge machine learning solutions on top of massive amounts of ever growing big data sources. Further, he’ll explain how he utilizes Qubole, a big data platform tool that sits on top of cloud services, to create scalable, resource optimized clusters capable of handling dynamic workloads. Alex will go over how data is aggregated and transformed using Qubole Hive services and then shift to the machine learning capabilities of running Spark MLlib based modeling on the Qubole platform. Ultimately the talk will provide insight into how the ODC models on billions of records, with tens of thousands of variables, to create cutting edge audience targeting products.

Additionally, Ashish Thusoo from Qubole will discuss why Big Data belongs in the cloud, focusing on new ways in which enterprises consume software applications and infrastructure. With a focus on agility and flexibility, the cloud provides companies a mechanism to keep up with change and stay competitive in today’s world of fast moving technology. Ashish will review and evaluate deployments of big data solutions in the cloud vs. on premises: How long should you plan for implementation? How scalable is your infrastructure, up or down? And what are the risks and benefits of either approach for security and compliance?

About Alex Sadovsky:

Alex Sadovsky is the Director of Data Science for audience targeting products at the Oracle Data Cloud. Alex got an early start in Internet technologies by founding a web hosting LLC in high school. He further pursed this interest by obtaining a bachelor’s degree in Computer Science at the University of Michigan in 2007. Taking a detour to explore the interface between computers and biology, Alex obtained a second bachelor’s degree in Molecular Biology while researching the links between the then newly sequenced human genome and cardiovascular disease. Continuing to investigate biological data, Alex completed a PhD in Computational Neuroscience at the University of Chicago in 2013 with a research focus around how neural circuitry in the mammalian brain computes sensory information. Since joining Datalogix in 2014, now the Oracle Data Cloud, Alex focuses on applying machine learning techniques in the big data landscape.

About Ashish Thusoo:

Before co-founding Qubole, Ashish ran Facebook’s Data Infrastructure team; under his leadership the team built one of the largest data processing and analytics platforms in the world. This platform achieved not just the bold aim of making data accessible to analysts, engineers and data scientists, but drove the “big data” revolution. In the process of scaling Facebook’s Big Data infrastructure, he helped drive the creation of a host of tools, technologies and templates that are used industry wide today, including the Apache Hive project.

Photo of Boulder/Denver BigData Meetup group
Boulder/Denver BigData Meetup
See more events
Oracle Data Cloud (Westmoor Technology Park, Building #2)
10075 Westmoor Drive · Westminster, CO