Skip to content

Stream Processing with Apache Kafka, Samza, and Flink

Photo of Adem Efe Gencer
Hosted By
Adem Efe G. and 2 others
Stream Processing with Apache Kafka, Samza, and Flink

Details

This is a Hybrid Event with an in-person venue and an online Zoom link.

  • Venue [in-person -- starts at 5:30PM]: 700 E Middlefield Rd, Mountain View, Building 4, 1st Floor, Together (Meeting Room), CA 94043
  • Zoom [online -- starts at 6:00PM]: https://linkedin.zoom.us/j/98567639887

5:30 - 6:00: Networking [in-person only] (with catered food)
6:00 - 6:05: Welcome
6:05 - 6:40: Proton: One Single Binary to Tackle Streaming and Historical Analytics
Ken Chen & Ting Wang, Timeplus
Proton is a unified streaming and historical analytic engine which is built on top of ClickHouse code base and is in one single binary. It is the core engine which empowers Timeplus core product and is open source under apache v2 [https://github.com/timeplus-io/proton](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftimeplus-io%2Fproton&data=05%7C02%7Cjqin%40linkedin.com%7C6d80556c4d624d436b6308dbfe905ea0%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C638383668243779046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nu1AVi2GXlsIHfqJRYOKfxuNgnLULUUQxlwA2IeumVU%3D&reserved=0).
In this talk, I will cover its technical internals like watermarking, streaming query state management, its internal streaming store, and how it connects historical data with live streaming etc. In the meantime, some core features like tumble / hop / session window processing, streaming join, aggregation, new designed materialized view etc will be presented as well.

  • Ken Chen is working as Chief Architect in Timeplus and has over 15 years experience in the industry. Prior to Timeplus, he worked for AWS, Splunk and Dell EMC and majorly focused on storage infrastructure and time series data storage, processing and analytics. He is passionate about distributed and database technologies.
  • Ting Wang: Co-founder and CEO of Timeplus.
    As co-founder and CEO of Timeplus, Ting has passion for building streaming analytics faster and actionable. He has 20+ years experience, previously leading product and engineering teams at Splunk and SAP as VP Engineering, developing various industry-leading data platforms.

6:40 - 7:15: Navigating Automatic Scaling in Pubsub Systems
Nick Garvey, LinkedIn
Running a distributed Pubsub system comes with its share of challenges, especially when grappling with repetitive day-to-day operations. In this session, we delve into the journey of streamlining our operational workflow by leveraging the power of Cruise Control for Apache Kafka and implementing intelligent automated workflows.

  • Nick Garvey is a Site Reliability Engineer at LinkedIn.

7:15 - 7:50: Handling Growing Kafka Scale at LinkedIn
Adem Efe Gencer, LinkedIn
Kafka has been widely adopted for streaming data with high-throughput, low-latency, and fault-tolerance – i.e. thousands of LinkedIn products used in production have some dependency on it. However, the rising adoption of Kafka raises challenges. First, the growing cluster sizes, increasing volume and diversity of client traffic, and the need for a sustainable way to carry out infrastructure maintenance through the hardware lifecycle induce an overhead in managing the system. Next, the adverse effects of the metadata pressure in Kafka clusters exacerbate with the growing number of unused topics, yielding significant resource waste, performance degradations, and missed deadlines. Finally, not knowing the scale limits on various dimensions and having a plan for each makes it difficult to gauge the readiness for the time ahead. In this presentation, we will discuss how we address these challenges to handle the growing scale of Kafka at LinkedIn.

  • Adem Efe Gencer manages server and related services of Data Streaming Platform at LinkedIn. Before that, Efe was developing Apache Kafka and the ecosystem around it, including Cruise Control, and supporting their operation at LinkedIn. He holds a PhD in computer science from Cornell University, where his research focused on improving the scalability of blockchain technologies. As an active member of the distributed systems community, Efe regularly volunteers as a PC member for top-tier journals and conferences and is the current Chair of LinkedIn’s Data Infra peer review process.
Photo of Stream Processing with Apache Kafka, Samza, and Flink group
Stream Processing with Apache Kafka, Samza, and Flink
See more events