Skip to content

4 Meetups and a Conference, or the Road to Current 2024 (final session)

Photo of Alice Richardson
Hosted By
Alice R.
4 Meetups and a Conference, or the Road to Current 2024 (final session)

Details

Hello Streamers!

Join us for our farewell to the summer series with a Kafka and Analytics meetup with Elijah Meeks of Confluent and Neha Pawar of StarTree. Space is limited.

Please join us for our fourth and final meetup in this exciting series, an IN-PERSON Apache Kafka® meetup on Thursday, August 22nd from 5:30pm.

Venue:
Confluent-Streams Cafe (across from the Front Desk)
899 W Evelyn Ave
Mountain View, CA

***Please note******: It will be required for all attendees to sign an NDA upon arrival to the meetup**.********

***
🗓 Agenda:

  • 5:30pm: Doors Open
  • 6:00pm-6:45pm: Neha Pawar, Founding Engineer at StarTree
    6:45pm - 7:30pm: Elijah Meeks, Principal Engineer I, Confluent
  • 7:30pm-8:00pm: RAFFLE/Additional Q&A and Networking

***
💡 Speaker for first talk:
Neha Pawar, Founding Engineer at StarTree

Title of Talk:
Speed of Apache Pinot at the Cost of Cloud Object Storage with Tiered Storage in StarTree Cloud

Abstract:
For real-time analytics, you need systems that can provide ultra low latency (ms) and extremely high throughput (1000s of qps). One such system is Apache Pinot, which is excellent for real-time analytics use cases like user-facing analytics and personalization.

The users of Pinot love the speed of Pinot and want to use Pinot for all their use cases - internal analytics, ad hoc analytics, reporting. For such use cases, you typically need to store really long retention data.
You can of course do that today, but it can get expensive to store large amounts of data in a system like Pinot, because of tightly coupled storage & compute. As the total data volume grows, more resources (compute + storage) need to be provisioned, whether or not the corresponding compute resources are utilized, resulting in a high cost to serve.

One option for users is to introduce decoupled systems for historical data analytics. Such systems use cloud object storage, which reduces the cost. But that will take your latencies to the 10s of seconds range and also introduce the overhead of maintaining and operating a new system and federating queries.

To address these challenges, we added Tiered Storage for Apache Pinot in StarTree Cloud, which gives you speed of Apache Pinot, at the cost of cloud storage! In this talk, we will dive deep into how we built an abstraction in Apache Pinot to make it agnostic of where the data is located. We'll talk about how we're able to query data on the cloud directly (not downloading the entire data like lazy-loading) with sub-seconds latencies in StarTree Cloud.

Bio:
Neha Pawar is a Founding Engineer at StarTree (https://www.startree.ai/), which aims to democratize data for all users by providing real-time, user-facing analytics. Prior to this, she was part of LinkedIn's Data Analytics Infrastructure org for 5 years, working on Apache Pinot & ThirdEye. She is passionate about big data technologies and real-time analytics databases. Neha is an Apache Pinot PMC and Committer. She has made numerous impactful contributions to Apache Pinot, with a focus on realtime streaming integrations and ingestion.
***
💡 Speaker for second talk:
Elijah Meeks, Principal Engineer I, Confluent

Title of Talk:
Visualization in Motion: How to Create Effective Data Visualization with Real-Time Data

Abstract:
Is your data visualization optimized for your real-time data? Likely not. Every company needs a real-time data strategy but even when they have one, they often neglect to invest in charting solutions that can handle that data. It's easy enough to show throughput on a line chart or track offsets visually but are those the most effective methods for observing, analyzing, and diagnosing real-time data?

In this session, we’ll start by taking a look at common strategies and technologies for visualizing real-time data. From there, we’ll switch gears and see where we can do better by showcasing more effective forms of data visualization for your Kafka data streams within the context of their schemas and broader workflows. You’ll learn how to use traditional visualizations more effectively, see how to bring new methods like time-inflected distributions into your toolkit, and explore where you can deploy new metrics that encode properties like trajectories and anomalies for greater impact.

***
DISCLAIMER
BY ATTENDING THIS EVENT IN PERSON, you acknowledge that risk includes possible exposure to and illness from infectious diseases including COVID-19, and accept responsibility for this, if it occurs.
NOTE: We are unable to cater for any attendees under the age of 21.
***

COVID-19 safety measures

Event will be outdoor
The event host is instituting the above safety measures for this event. Meetup is not responsible for ensuring, and will not independently verify, that these precautions are followed.
Photo of Bay Area Apache Kafka® Meetup group
Bay Area Apache Kafka® Meetup
See more events