How to Structure your Data Pipeline and New Kafka Features You Might Not Know


Details
Schedule:
- 6:00pm - Arrival, mingling, pizza and refreshments
- 6:25pm - Welcome, Introductions and Presentation
- 8:00pm - Evening concludes
--------------------------------------------------------------------------------
Abstracts:
How to Structure your Data Pipeline
Structuring your data is an important part of your data pipeline as it enables analysts and business groups to freely analyze the data without the burden of communicating with every single group that generated data. This is an N-squared communication overhead problem as there are N producers of data and N consumers of data.
In this session, we'll discuss strategies for structuring your data when it comes from many difference sources in a variety of forms. We'll show you how to treat your schemas as first class citizens by leveraging message envelopes and schema registries as part of your workflow.
We'll then discuss how structured data gives rise to rich use cases such as enrichment and routing that can provide additional operational and business value to your organization.
Presenter Bio: Mike Trienis loves building data products that scale. That means implementing simple solutions with minimal maintenance through automation and eloquent designs. My software experience spans the full stack; from system level deployment to application implementation. In particular, I have spent quite a bit of time working with streaming technologies such as Apache Kafka and Apache Spark.
--------------------------------------------------------------------------------
Apache Kafka: New Features That You Might Not Know About
In the last two years Apache Kafka rapidly introduced new versions, going from 0.10.x to 2.x. It can be hard to keep up with all the updates and a lot of companies still run 0.10.x clusters (or even older ones).
Join this session to learn new exciting features in Kafka introduced in 0.11, 1.0, 1.1 and 2.0 versions including, but not limited to, the new protocol and message headers, transactional support and exactly-only delivery semantics, as well as controller changes that make it possible to shutdown even large clusters in seconds.
Presenter Bio: Yaroslav Tkachenko is a software engineer interested in distributed systems, microservices, functional programming, modern cloud infrastructure and DevOps practices. Currently Yaroslav is a Software Architect at Activision, working on a Big Data platform.
Prior to joining Activision Yaroslav held various leadership roles in multiple startups. He was responsible for designing, developing, delivering and maintaining platform services and cloud infrastructure for mission critical systems.
A lightning talk on some how-to's and lessons learnt for building a pipeline data monitor using Kafka Connect Elasticsearch and Kibana.
Presenter bio: Yi Zhang is a software developer at Demonware working on building scalable data pipeline. Prior to Demonware, she worked at NetApp building distributed object storage system. She has a keen interest in data streaming, mining and analytics.
--------------------------------------------------------------------------------
Parking Tips:
Visitor parking located on the west side of the building, beneath the overhang. They are labeled Radical Visitor Parking.
Security info (ex. Buzz code or bring ID to check in at lobby etc) :
Elevator will be booked for coming up to 7th floor directly.
Onsite contact name and cell phone (for the night of the event):
Christina Zhang: 778-883-4447

How to Structure your Data Pipeline and New Kafka Features You Might Not Know