Skip to content

Spark Ecosystem & Spark Streaming Fundamentals

Photo of Denny Lee
Hosted By
Denny L. and 4 others
Spark Ecosystem & Spark Streaming Fundamentals

Details

We are excited to announce that Mike Olson and Hari Shreedharan will be speaking at the March 2015 Seattle Spark Meetup! This will be a joint Seattle Spark Meetup and Pacific Northwest Cloudera User Group session.

Agenda:

6:00pm - 6:30pm: Networking

6:30pm - 6:45pm: Welcome and Introductions, hosted by Anikate Singh from Concur

6:45pm - 7:15pm: Cloudera's Investments into the Spark Ecosystem with Mike Olson

7:15pm - 7:45pm: Spark Streaming Fundamentals by Hari Shreedharan

Attendees will receive a 15 % discount code for Cloudera Public Training!

Spark Streaming Fundamentals:

Abstract

Apache Spark is a flexible, scalable and fault-tolerant data processing framework that specializes in processing large amount of data. Spark Streaming builds on top of the core library to consume data from ingest systems like Apache Kafka, Apache Flume, Amazon Kinesis etc., in real time and processes the incoming data in micro-batches every few seconds.

In this talk, we will talk about the basics of Spark and Spark Streaming. We will discuss the Spark Streaming's basic programming framework, how it can be used to process data in real time. We will also discuss the recent advances in Spark Streaming - the design of several new features that have improved performance and eliminated any possibility of data loss.

Spark Ecosystem:

Abstract

Cloudera has been an active sponsor of, and participant in, the UC Berkeley AMPLab since 2009, and was involved in some of the earliest design discussions for Spark. Matei Zaharia spent two summers as an intern at Cloudera while in graduate school, and we continued to monitor the progress of the project over his years at Berkeley. In 2013, after the formation of Databricks, we negotiated a reseller relationship with Databricks and brought Spark into the Cloudera product, as yet another execution engine, alongside MapReduce, Impala, HBase and others.

We were the first Hadoop vendor to see the potential of Spark for the Hadoop ecosystem and to pull it into our offering. In the years since, most other vendors have followed suit. We are, however, still the major commercial distributor for Spark, and the only company with a Hadoop-based platform to be actively involved in Spark, with contributors and committers on staff. We work closely with the global Spark community to enhance the software and to integrate it with the security, multi-tenancy, data governance and other services in our enterprise big data platform.

In this talk, I will describe Cloudera's strategic commitment to Spark, our practical investment in the community, and how our customers are using the software. While I will touch on some technical features of Spark that make it valuable to Cloudera, this will not be primarily a technical talk.

Photo of Seattle Spark+AI Meetup group
Seattle Spark+AI Meetup
See more events
Concur Technologies
601 108th Ave Ne # 1000 · Bellevue, WA