Past Meetup

[PAID] PANCAKE STACK: Spark +TensorFlow +Kafka +Streaming Recommendation Engine

This Meetup is past

166 people went

Location image of event venue


RSVP Here:

Note that we added a second event for South Bay here (Follow Eventbrite RSVP Link):


$319 with 20% Advanced Apache Spark Meetup discount

(Cost covers office space, food, coffee, power, wifi, air conditioning, audio, video, weekend staff, and cloud instances)

Food and Drink

• Breakfast: 3D-Printed Pancakes ( <-- No WAY!? YES WAY!

• Lunch: Pizza for all dietary types

• All Day: Coffee and Water

Relevant Links

• ( (

• ( ( (


Building a Complete, End-to-End, Streaming Data Analytics Pipeline and Recommendation Engine with the PANCAKE STACK!!




Apache Arrow

Apache NiFi

Apache Cassandra


Apache Kafka



Apache Spark





140 Character Summary

Developer of SMACK Stack, Chris Fregly Follows Up With PANCAKE STACK! Global Workshops #ApacheSpark, #TensorFlow

Instructor Bio

Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, and the Original Developer of the SMACK Stack.

Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @

Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.

When Chris isn’t contributing to Spark and other Open Source projects, he’s creating book chapters, slides, and demos to share with his peers through meetups, webinars, workshops, and conferences throughout the world. And he’s very fortunate for this unique opportunity!

Workshop Description

The goal of this workshop is to build an end-to-end, streaming analytics and recommendation pipeline on your local machine using Docker and the latest streaming analytics tools.

First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, NiFi, Kafka, Zeppelin, iPython, and ElasticSearch.

Next, we extend our pipeline to use streaming data to generate personalized recommendation models from using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.

Lastly, we productionize our pipeline and serve live recommendations to our users!

You'll Learn How To

• Create a complete, end-to-end streaming data analytics pipeline

• Interactively analyze, approximate, and visualize streaming data

• Generate machine learning, graph & NLP recommendation models

• Productionize our ML models to serve recommendations in real-time

• Perform a hybrid on-premise and cloud deployment using Docker

• Customize this workshop environment to your specific use cases

Target Audience

• Data Scientists and Analysts interested in learning more about the streaming data pipelines that power their real-time machine learning models and visualizations

• Data Engineers interested in building more intuition about machine learning, graph processing, natural language processing, statistical approximation techniques, and visualizations

• Anyone interested in learning the practical applications of a modern, streaming data analytics and recommendations pipeline

• Anyone who wants to try 3D-Printed PANCAKES!!


• Basic familiarity with Unix/Linux commands

• Experience in SQL, Java, Scala, Python, or R

• Basic familiarity with linear algebra concepts like dot product and matrix multiply

• Laptop with modern browser and ssh capabilities (Mac OSX, Windows, or Linux)

Note: We provide a cloud instance for each attendee to access from your laptop.

At the end of the workshop, you will be able to save your work and copy it locally to your laptop to use at home or at the office!

Agenda (Full Day)

Part 1 (Analytics and Visualizations)

• Analytics and Visualizations (Live Demo!)

• Verify Environment Setup (Docker Machine)

• Notebooks (Zeppelin, Jupyter/iPython)

• Interactive Data Analytics (Spark SQL, Hive, Presto)

• Graph Analytics (Spark Graph, NetworkX, TitanDB)

• Time-series Analytics (Cassandra)

• Visualizations (Kibana, Matplotlib, D3)

• Approximate Queries (Spark, Redis, Algebird)

• Workflow Management (Airflow)

Part 2 (Streaming and Recommendations)

• Streaming and Recommendations (Live Demo!)

• Streaming (NiFi, Kafka, Spark Streaming, Flink)

• Cluster-based Recommendation (Spark ML, Scikit-Learn)

• Graph-based Recommendation (Spark ML, Spark Graph)

• Collaborative-based Recommendation (Spark ML)

• NLP-based Recommendation (CoreNLP, NLTK)

• Geo-based Recommendation (ElasticSearch)

• Hybrid On-Premise+Cloud Auto-scale Deploy (Docker)

• Customize and Save Environment for Your Use Cases

And once again, the PANCAKE STACK! :)

Here is the registration link again:

See you all soon!!