Skip to content

BDAM 12/05: Centralized Metadata; Build Apps w/o Pipelines; Docker & Kubernetes

Photo of Priyanka Nambiar
Hosted By
Priyanka N.
BDAM 12/05: Centralized Metadata; Build Apps w/o Pipelines; Docker & Kubernetes

Details

Big thanks to Google for hosting and sponsoring this meetup event!

AGENDA

6:00 - 6:30 - Socialize over food and beverages
6:30 - 8:00 - Tech Talks

TALKS

#1: Centralized Metadata for Multi Cloud Data Pipelines, by Rohit Sinha, Google

#2: Build Apps Without Pipelines: Shortest Path from Complex Data to Live Apps, by Anirudh Ramanathan, Rockset

#3: Modernizing Big Data Infrastructure with Docker and Kubernetes, by Mark Bayazit, Robin Systems

ABSTRACTS

#1: Centralized Metadata for Multi Cloud Data Pipelines, by Rohit Sinha, Google

Enterprises are increasingly adopting multi-cloud strategies to meet their business requirements. For ETL workloads, running data pipelines in different environments provides flexibility and higher resiliency. With this move, one of the key challenges is to have a centralized control plane that coordinates pipelines across all environments and collects all the metadata in a central place to provide a unified view of all business, technical and operational metadata. In this talk, we will discuss use-cases that can be enabled by centralized control and metadata capabilities for multi-cloud data pipelines.

#2: Build Apps Without Pipelines: Shortest Path from Complex Data to Live Apps, by Anirudh Ramanathan, Rockset

With the vast array of datasets available today, investment management firms have significant opportunity to use non-traditional, alternative data to enhance their research. Efficiently combining and analyzing disparate data streams—real-time and semi-structured—to support investment decisions is critical to remain competitive in this space.

In this talk, we will use Rockset, a serverless search and analytics engine, to demonstrate how developers can easily plug in alternative data and build apps on those data sets. Specifically, we will work with the JSON data stream from Twitter’s Firehose streams API and NASDAQ’s Company Lookup directory exported in CSV format. Using Rockset, we will demonstrate how these two data sets can be loaded, and immediately queried and joined using SQL, without any upfront data preparation or complex data pipelines.

Rockset is founded by former members of Facebook’s online data team, who helped create RocksDB, Facebook’s TAO, Unicorn, and HDFS.

#3: Modernizing Big Data Infrastructure with Docker and Kubernetes, by Mark Bayazit, Robin Systems

Docker is clearly a de-facto choice for enterprises, and many are moving farther with cluster orchestration using Kubernetes. Unfortunately, both were designed for stateless use cases, so running data-centric workloads like NoSQL, Big-Data or RDBMs seems like a stretch. However, Robin has come to the rescue with a Hyper-converged Kubernetes Platform designed with and for a stateful workload in mind. In this talk we will demonstrate a usecase to run your Data Intense Apps-as-a-Service - NoSQL, Big-Data or even RDBMs, with an AppStore experience - simpler, faster, easier.

SPEAKER BIOS

  • Rohit Sinha is a software engineer at Google where he works on open source Big Data Application Platform CDAP (cdap.io). Prior to Google, he worked at Cask where he was responsible for building software fueling the next generation of Big Data applications.

  • Anirudh Ramanathan leads Product Engineering at Rockset. He is an Apache Spark committer and a Kubernetes maintainer. Prior to this, he was on the Kubernetes team at Google where he worked on Google Kubernetes Engine, core controllers, and founded SIG Big Data, a group focused on containerized Big Data and ML workloads (Apache Airflow, Kubeflow, JupyterHub and HDFS).

  • Mark Bayazit is a Senior Solutions Architect at Robin Systems. His long career in technology starting as a DBA and has led him to be a crusader for improving IT processes & automation.

VENUE AND PARKING INFORMATION

Venue: Dan Bricklin Room, 111 Java Drive, Sunnyvale CA 94089

There is plenty of parking available around the building, which is also a short 10min Uber ride from the Sunnyvale Caltrain Station.

Photo of Big Data Application Meetup group
Big Data Application Meetup
See more events
Dan Bricklin Room
Google Building 111 · Sunnyvale, ca