addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrosseditemptyheartexportfacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Big Data Scala: Kafka and OhmData

  • Apr 30, 2014 · 6:00 PM
  • This location is shown only to members

We have an amazing Big Data meetup covering two key pieces of data flows: high-performance queues and high-performance databases.

Jay Kreps: I ♥ Log: Real-time Data and Apache Kafka

This talk will discuss how logs and stream-processing can form a backbone for data flow, ETL, and real-time data processing. It will describe the challenges and lessons learned as LinkedIn built out its real-time data subscription and processing infrastructure. It will also discuss the role of real-time processing and its relationship to offline processing frameworks such as MapReduce.

Jay is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza.

OhmData open-sources C5, a simple, reliable, scalable, open-source HBase-compatible database. Ryan Rawson and Alex Newman, the founders, will talk about architecture and future of big databases.

C5 is a simple, reliable, and scalable open source database which improves on HBase in every way. It is optimized for fast failover and can be used in production for both OLTP and analytics, eliminating a whole class of pipelines and bottlenecks. It is the first HBase successor, fully HBase API-compatible, developed from the ground up to run on the cloud installs with very easy to no maintenance and tuning. Failover happens instantly, which makes it a viable option for APIs, and all the data is accessible to Hadoop analytics flows right away. C5 takes full advantage of high speed IO (SSDs) and is simple to grow to hundreds and thousands of nodes as needed.

Alex and Ryan are veterans of big data OSS before it was big data, Amazon, Google, Cloudera, StumbleUpon, and Drawn to Scale. Ryan is a core committer to HBase.

Doors open at 6, talks begin at 6:30.

Note: we need a video sponsor for this talk.  The sponsor's logo will precede the recording.  Please contact [masked] if you want to help more folks learn about using Scala for data pipelines!

Join or login to comment.

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy