This is a developer-centric meetup focused on Apache Spark, Apache Flink, Apache Kafka, Apache Mesos, related Typesafe and Twitter OSS stacks, and broader distributed Data Science and Machine Learning. We're open to all OSS developers, vendors, consultants, and startups both using the tools and building or supporting them, attending, presenting, and organizing.
How it may be complementary to the original Spark Users, now Bay Area Spark Meetup: Spark in its end-to-end ecosystem -- Mesos, Akka, Kafka, Cassandra, etc., with focus on what works for the final goals of the whole pipeline. We will teach you how to use Scala for Spark to make you more effective, and consider devops options so you can get to production faster. We'll invite projects relevant to or inspired by Apache Spark, such as Apache Storm, Apache Flink, and others, and will be focused on putting together useful OSS as a system.
This is a crosspost from Bay Area AI, please register at
Machine Learning (ML) pipelines are the key building block for productionizing ML code. However, pipelines are often developed as "silos" - features tend not to be easily re-used across pipelines or even within the same pipeline. Silos lead to duplication, unnecessarily re-implementing features and in the worst case correctness problems, if, for example, the features used for training and serving have inconsistent implementations. The Feature Store solves the problem of siloed and ad-hoc machine learning pipelines, by providing a data layer where feature engineering can be separated from the usage of features to generate training data. That is, the Feature Store should provide a clean API separating Data Engineering from Data Science.
In this talk, we will introduce the world's first open-source Feature Store, built on Hopsworks, Apache Spark, and Apache Hive and targeting both TensorFlow/Keras and PyTorch. We will show how ML pipelines can be programmed, end-to-end, in Python, and the role of the Feature Store as a natural interface between Data Engineers and Data Scientists. In an end-to-end pipeline, we will show how the Feature Store works, and how you can write end-to-end ML pipelines in Python only (if you so choose).
Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm. He is the lead architect of Hops, the world's most fastest and most scalable Hadoop distribution and first Hadoop platform with support for GPUs as a resource. He is a regular speaker at AI industry conferences, and blogs at O'Reilly on AI.