Past Meetup

Spark + Kubernetes (Google Guy), Tensorflow Serving, Performance Tuning, Airflow

This Meetup is past

368 people went

AdRoll

972 Mission St, San Francisco, CA 94103 · San Francisco, CA

How to find us

1st Floor - Ask front desk for AdRoll

Location image of event venue

Details

Talk 1: Meetup and Technology Updates (Chris Fregly (https://linkedin.com/in/cfregly), Research Scientist @PipelineIO (http://pipeline.io))

• We hit 5000+ Members!!!!

• 1000+ Github (https://github.com/fluxcapacitor/pipeline) Stars!, 6500+ DockerHub Pulls!!

• 50+ Community Events Across the Globe in 2016!!

• Big Data Spain 2016 Keynote Talk: Recent Advancements in ML/AI Data Pipelines (Video (https://www.youtube.com/watch?v=QPI_RtIrO7g)):

• Tensorflow v0.12: HDFS Support and lots of API changes/deprecations

• Tensorflow v1.0a is now available!!

• Optimizing a trained Tensorflow AI Model to prepare for production serving (Blog (https://petewarden.com/2016/12/30/rewriting-tensorflow-graphs-with-the-gtt/))

http://vegas-viz.org : Matplotlib for Scala + Spark (NetflixOSS) (Video (https://www.youtube.com/watch?v=EYisuAkSpns))

• Upcoming O'Reilly Training: High-Performance Tensorflow in Production (Chris Fregly (https://linkedin.com/in/cfregly), Research Scientist @ PipelineIO (http://pipeline.io/))

• Airflow + Kubernetes + Continuous Deployment!!

• Updated AWS + GPU + Docker Environment (https://github.com/fluxcapacitor/pipeline/wiki/AWS-GPU-TensorFlow-Docker)!!

• Finally Using Kubernetes Labels like a BOSS!!

• Next 2 Meetups (Feb (https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/233979595/) and Mar (https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/233979047/)): Lots of Streaming, Lots of Tensorflow, Lots of Streaming + Tensorflow!

• Demos!!

*****

Talk 2: Low-Level CPU Performance Profiling Examples using Apache Spark, Apache Arrow, and Columnar Databases (Tanel Poder (https://www.linkedin.com/in/tanelpoder), Founder and Chief Engineer @ Gluent (http://gluent.com/))

• In this session Tanel Poder will demonstrate some low level performance tools like Linux "perf stat" to measure the memory access traffic and CPU efficiency of different workloads, data structures and programming paradigms.

• We will use a columnar database, a few variations of a Spark job and Apache Arrow data structure iteration as examples.

• This session's goals are to emphasize the importance of using suitable data structures for a task (like a columnar structure for scanning) and that modern CPUs and performance tools give you good visibility into the "CPU-friendliness" of your code.

Speaker:

• Tanel Poder is a co-founder of his current startup Gluent that liberates enterprise data, making it useful across all enterprise.

• Despite holding a CEO title, he was an advanced OS & database systems performance geek for over 20 years and is now hoping to bring some of that skill to the Spark/Big Data world too.

*****

Talk 3: Anirudh Ramanathan (https://www.linkedin.com/in/anirudhrx) (Software Engineer, Kubernetes @ Google)

Title: Spark on Kubernetes

Abstract: Engineers across several organizations are working on support for Kubernetes as a cluster scheduler backend within Spark. While designing this, we have encountered several challenges in translating Spark to use idiomatic Kubernetes constructs natively. This talk is about our high level design decisions and the current state of our work.

Speaker:

Anirudh Ramanathan is a software engineer on the Kubernetes team at Google. His focus is on running stateful and batch workloads. Previously, he worked on GGC (Google Global Cache) and prior to that, on the infrastructure team at NVIDIA."

Related Links

http://www.slideshare.net/SparkSummit/spark-summit-eu-talk-by-luca-canali

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala

http://www.grpc.io/docs/quickstart/java.html

http://www.grpc.io/blog/principles (https://github.com/LogNet/grpc-spring-boot-starter/blob/master/README.adoc)

https://github.com/LogNet/grpc-spring-boot-starter/blob/master/README.adoc

https://medium.com/applied-engineering-reporting-from-the-front/http-load-balancing-on-grpc-services-e3d702db05d7#.x4bw1oa9j

https://medium.com/seldon-open-source-machine-learning/seldon-1-4-adds-grpc-c3812d0f653b#.1rkh4gye7

http://docs.seldon.io/grpc.html (http://docs.seldon.io/grpc.html?utm_source=Seldon+Newsletter&utm_campaign=61b6052bb8-EMAIL_CAMPAIGN_2016_12_24&utm_medium=email&utm_term=0_b9f17aafbb-61b6052bb8-502375117)