Real-time Streaming and Data Pipelines with Apache Kafka + Testing Storm components with Groovy and Spock
This time we will have two talks - one lightning talk about testing Storm components BDD style with Spock and Groovy presented by Eugene Dvorkin, architect at WebMD and main talk about Apache Kafka - an open source, distributed publish-subscribe messaging system, presented by Joe Stein, Apache Kafka committer and member of the PMC.
6:00 - Networking, Pizza, Drinks
6:30 - Introduction to Storm User Group, Books raffle
6:40 - Testing Storm components with Groovy and Spock
7:00 - Real-time streaming and data pipelines with Apache Kafka
8:00- closing remarks
8:05 - meeting ends
Please note we will start at 6:00 PM.
Testing Storm components BDD style with Spock and Groovy - Eugene Dvorkin (http://www.linkedin.com/in/eugenedvorkin/) (@edvorkin (https://twitter.com/edvorkin)) of WebMD will talk about using Groovy and Spock to create unit tests for Storm. Spock and Groovy allow developers to create extremely readable and easily writable tests. We will create a unit test of bolt as an example of using Behavior-Driven Development that is applicable to any Java based projects.
Real-time streaming and data pipelines with Apache Kafka - Joe Stein (http://www.linkedin.com/in/charmalloc), (@allthingshadoop (https://twitter.com/allthingshadoop)), Apache Kafka committer, is going to talk about how to get started with Apache Kafka, how replication works and more! Storm is a great system for real-time analytics and stream processing but to get the data into Storm you need to collect your data streams with consistency and availability at high loads and large volumes. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
Apache Kafka http://kafka.apache.org/
* Fast *
A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.
* Scalable *
Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers
* Durable *
Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.
* Distributed by Design *
Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
About Joe Stein
Joe Stein (http://www.linkedin.com/in/charmalloc) is an Apache Kafka committer and member of the PMC and is the Founder and Principal Architect at Big Data Open Source Security LLC http://www.stealth.ly (http://www.stealth.ly/)
DataTorrent- (https://www.datatorrent.com/) DataTorrent is the most powerful real-time computation platform for big data. With unmatched performance, linear scalability and built-in fault tolerance, DataTorrent supports today’s most demanding big data streaming applications—enabling enterprises to monitor, analyze and act on massive amounts of data in real-time. As a native Hadoop platform, DataTorrent allows you to leverage your existing Hadoop infrastructure for real-time computations, alongside your batch operations.
WebMD - for hosting and sponsoring the event. We are hiring (https://careers-webmd.icims.com/jobs/search?ss=1&searchKeyword=&searchLocation=12781-12816-New+York&searchCategory=8730&searchPositionType=2049&searchRadius=20&searchZip=) talented developers and devops.
Please contact organizers If you want to promote your product or service at this meetup.
Propose a Talk ..
We are always looking for new, interesting talks. It can be lightning talk (10-15 min) or full presentation (up to 1h including Q&A) Please check our talk proposal page (http://www.meetup.com/New-York-City-Storm-User-Group/pages/General_Guidelines_for_talk_proposal/) for general guidelines and why you should present at this meetup.