Skip to content

Introducing Scio, a Scala API for Google Cloud Dataflow

Photo of Andrew R Kursar
Hosted By
Andrew R K.
Introducing Scio, a Scala API for Google Cloud Dataflow

Details

(Note: Special thanks to our friends at the NYC Data Engineering (https://www.meetup.com/NYC-Data-Engineering/) for partnering with us on this event!)

Talk 1 - Scio, a new Scala API for Google Cloud Dataflow

Neville Li, Spotify

https://github.com/spotify/scio

About the talk:

Learn about Scio, a Scala API for Google Cloud Dataflow (incubated as Apache Beam). Apache Beam offers a simple, unified programming model for both batch and streaming data processing while Scio brings it much closer to the high level API many data engineers are familiar with, e.g. Spark and Scalding. Neville will cover design and implementation of the framework, including features like type safe BigQuery macros, REPL, and serialization. There will also be some live coding demo.

Bio:

Neville is a software engineer at Spotify who works mainly on data infrastructure and tools for machine learning and advanced analytics. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and Parquet. Before that he worked on search quality at Yahoo! and old school distributed systems like MPI.

Talk 2 - Google engineer on Google Cloud Platform (TBD)

Schedule

6:30pm - Doors open & food/drinks
7:00pm - Talks
8:00pm - Q/A
8:15pm - Additional socializing w/ speakers & other awesome data engineers

Photo of New York Hadoop User group group
New York Hadoop User group
See more events
Spotify NYC
45 W 18th St. 7th floor · New York, NY