Next Meetup

BDAM 12/05: Centralized Metadata; Build Apps w/o Pipelines; Docker & Kubernetes
Big thanks to Google for hosting and sponsoring this meetup event! AGENDA 6:00 - 6:30 - Socialize over food and beverages 6:30 - 8:00 - Tech Talks TALKS Talk #1: Centralized Metadata for Multi Cloud Data Pipelines, by Rohit Sinha, Google Talk #2: Build Apps Without Pipelines: Shortest Path from Complex Data to Live Apps, by Anirudh Ramanathan, Rockset Talk #3: Modernizing Big Data Infrastructure with Docker and Kubernetes, by Mark Bayazit, Robin Systems ABSTRACTS Talk #1: Centralized Metadata for Multi Cloud Data Pipelines, by Rohit Sinha, Google Enterprises are increasingly adopting multi-cloud strategies to meet their business requirements. For ETL workloads, running data pipelines in different environments provides flexibility and higher resiliency. With this move, one of the key challenges is to have a centralized control plane that coordinates pipelines across all environments and collects all the metadata in a central place to provide a unified view of all business, technical and operational metadata. In this talk, we will discuss use-cases that can be enabled by centralized control and metadata capabilities for multi-cloud data pipelines. Talk #2: Build Apps Without Pipelines: Shortest Path from Complex Data to Live Apps, by Anirudh Ramanathan, Rockset With the vast array of datasets available today, investment management firms have significant opportunity to use non-traditional, alternative data to enhance their research. Efficiently combining and analyzing disparate data streams—real-time and semi-structured—to support investment decisions is critical to remain competitive in this space. In this talk, we will use Rockset, a serverless search and analytics engine, to demonstrate how developers can easily plug in alternative data and build apps on those data sets. Specifically, we will work with the JSON data stream from Twitter’s Firehose streams API and NASDAQ’s Company Lookup directory exported in CSV format. Using Rockset, we will demonstrate how these two data sets can be loaded, and immediately queried and joined using SQL, without any upfront data preparation or complex data pipelines. Rockset is founded by former members of Facebook’s online data team, who helped create RocksDB, Facebook’s TAO, Unicorn, and HDFS. Talk #3: Modernizing Big Data Infrastructure with Docker and Kubernetes, by Mark Bayazit, Robin Systems Docker is clearly a de-facto choice for enterprises, and many are moving farther with cluster orchestration using Kubernetes. Unfortunately, both were designed for stateless use cases, so running data-centric workloads like NoSQL, Big-Data or RDBMs seems like a stretch. However, Robin has come to the rescue with a Hyper-converged Kubernetes Platform designed with and for a stateful workload in mind. In this talk we will demonstrate a usecase to run your Data Intense Apps-as-a-Service - NoSQL, Big-Data or even RDBMs, with an AppStore experience - simpler, faster, easier. SPEAKER BIOS - Rohit Sinha is a software engineer at Google where he works on open source Big Data Application Platform CDAP (cdap.io). Prior to Google, he worked at Cask where he was responsible for building software fueling the next generation of Big Data applications. - Anirudh Ramanathan leads Product Engineering at Rockset. He is an Apache Spark committer and a Kubernetes maintainer. Prior to this, he was on the Kubernetes team at Google where he worked on Google Kubernetes Engine, core controllers, and founded SIG Big Data, a group focused on containerized Big Data and ML workloads (Apache Airflow, Kubeflow, JupyterHub and HDFS). - Mark Bayazit is a Senior Solutions Architect at Robin Systems. His long career in technology starting as a DBA and has led him to be a crusader for improving IT processes & automation. VENUE AND PARKING INFORMATION There will be plenty of parking available around the building. The building is also a short 10min uber ride from the Sunnyvale Caltrain Station.

Hamina Tech Talk Room, Google Cloud Building 4

1190 Bordeaux Drive , Building 4 · Sunnyvale, ca

What we're about

This is a group for everyone interested in building applications using Apache Hadoop and other open-source, big data technologies.

Come and learn how to apply big data technologies to solve real world problems!

Meetup topics are focused on use cases, building end-to-end solutions, and making different technologies work together. The topics include technical presentations from open-source projects, open-source vendors and open-source users building big data applications. Topics include:

• Describing the technology behind a specific use-case (e.g. HBase at Flipboard)

• Making the best use of a project/technology (e.g. Spark performance tuning)

• Integrating different technologies (e.g. Using Apache Kafka as a reliable distributed message queue)

• Introducing new projects/technologies in the space (e.g. Introducing Apache Flink; CDAP is now open-source!)

• Evolution of existing projects/technologies (e.g. What's new in Cassandra 2.0?)

All meetups are recorded, and videos and presentations of the meetups are available here: bdam.io

Please reach out to bigdataappmeetup@gmail.com if you are interested in speaking at, or hosting/sponsoring a future meetup!

See you at the next meetup!

Members (1,850)

Photos (72)