Streaming Data Pipelines with Containers


Details
We have two talks scheduled:
Talk 1: Title: Streaming Data Pipelines with Containers
Abstract:
Pachyderm is a big data analytics platform deployed with Kubernetes and Docker. Pachyderm is inspired by the Hadoop ecosystem but shares no code with it. Instead, we leverage the container ecosystem to provide the broad functionality of Hadoop with the ease of use of Docker. In this talk, we’ll show you how to build streaming data workflows Pachyderm. There are two bold new ideas in Pachyderm: Version Control for data -- view diffs of your data and incrementally process only the new data as it streams in.
Containers as the core primitive for computation -- which means each stage in your workflow can be written using any languages or libraries you want. These ideas lead directly to a system that's much more powerful, flexible and easy to use. Pachyderm is open source so check it out on GitHub.
Bio:
Joe Doliner is the founder and CEO of Pachyderm and an open source aficionado and has been building and running data infrastructure his entire career. Before Pachyderm, he was the first employee and lead engineer at RethinkDB and also did a stint running the Hadoop cluster at Airbnb. There he gained an appreciation for the vast collaboration and dependency management problems that still plague modern data-driven enterprises. He founded Pachyderm in 2014 to solve these issues.
Talk 2 : Building analytics stack with Kubernetes, Spark, Cassandra and Kafka
Abstract: Kubernetes is an open-source system for automating deployment, operations, and scaling of containerized applications. In this talk we will explore how Kubernetes is used to build a scalable (and ops friendly) analytics stack with Kafka, Spark and Cassandra. We will also discuss some challenges and future developments.
Biography:
Sasha Klizhentas is a co-founder and CTO of Gravitational and has been working on systems programming, infrastructure and distributed applications for the last 10 years.
Before Gravitational he was a founding engineer at Mailgun, a YC Company acquired by Rackspace. Sasha is a co-author of open source projects like teleport (http://github.com/gravitational/teleport), vulcand (http://github.com/vulcand/vulcand) and flanker (https://github.com/mailgun/flanker).

Streaming Data Pipelines with Containers