Data Pipelining and Refining at Scale



- 6:00pm - Arrival, mingling, pizza and refreshments
- 6:25pm - Welcome, Introductions and Presentation
- 8:00pm - Evening concludes



Confluent Kafka Python: Integrating Python Apps with your Data Pipeline

Data is the new oil and is a vital component to the success of every modern business. That data must be highly accessible without impacting your core services. With the advent of Apache Kafka and the ability to write messages at high throughput and consume messages in parallel, it is easier then ever to get your data to the right stakeholders.

In this session, I'll provide an introduction to the Apache Kafka protocol and explain how messages are written and read from Apache Kafka. We'll then briefly discuss the message delivery guarantees and how fault tolerance plays a critical role.

We'll then dive into Confluent Kafka Python library, an open source Kafka client for Python applications. We'll first give an introduction to the APIs and then dig into some the best practices as you integrate and deploy it with your services. We will also cover some pitfalls and gotchas to avoid.

Presenter Bio: Mike Trienis loves building data products that scale. That means implementing simple solutions with minimal maintenance through automation and eloquent designs. My software experience spans the full stack; from system level deployment to application implementation. In particular, I have spent quite a bit of time working with streaming technologies such as Apache Kafka and Apache Spark.


Kafka Streams: the easiest way to start with stream processing

Stream processing is getting more & more important in our data-centric systems. In the world of Big Data, batch processing is not enough anymore - everyone needs interactive, real-time analytics for making critical business decisions, as well as providing great features to the customers.

There are many stream processing frameworks available nowadays, but the cost of provisioning infrastructure and maintaining distributed computations is usually very high. Sometimes you just have to satisfy some specific requirements, like using HDFS or YARN.

Apache Kafka is de facto a standard for building data pipelines. Kafka Streams is a lightweight library (available since 0.10) that uses powerful Kafka abstractions internally and doesn't require any complex setup or special infrastructure - you just deploy it like any other regular application.

In this session I want to talk about the goals behind stream processing, basic techniques and some best practices. Then I'm going to explain main fundamental concepts behind Kafka and explore Kafka Streams syntax and streaming features. By the end of the session you'll be able to write stream processing applications in your domain, especially if you already use Kafka as your data pipeline.

Presenter Bio: Yaroslav Tkachenko is a software engineer interested in distributed systems, microservices, functional programming, modern cloud infrastructure and DevOps practices. Currently Yaroslav is a Senior Software Engineer at Demonware (Activision), working on a large-scale data pipeline.

Prior to joining Demonware Yaroslav held various leadership roles in multiple startups. He was responsible for designing, developing, delivering and maintaining platform services and cloud infrastructure for mission critical systems.


Parking Tips:

Visitor parking located on the west side of the building, beneath the overhang. They are labeled Radical Visitor Parking.

Security info (ex. Buzz code or bring ID to check in at lobby etc) :

Elevator will be booked for coming up to 7th floor directly.

Onsite contact name and cell phone (for the night of the event):

Christina Zhang:[masked]