[SF] High Performance Spark +Internals +Operations +Committers +Ask Me Anything

Name: [SF] High Performance Spark +Internals +Operations +Committers +Ask Me Anything
Start: 2016-04-21T17:30:00-07:00
End: 2016-04-21T21:00:00-07:00
Location: Galvanize

Hosted by Chris F.

AI Performance Engineering Meetup (San Francisco, Global)

Details

6-Part Agenda Focused on Spark Internals, Operations, Performance Tuning, and Best Practices + Ask Me Anything

Doors open at 5:30pm!

Intros, Book Raffles, and $300 Amazon Gift Card begin @ 5:45pm sharp!!

Notes: Must be signed up at http://advancedspark.com to be eligible for $300 Amazon Gift Card. Gift Card will be emailed to you.

5:45pm-6:00pm: Introductions, Announcements, and Book Raffles!!! (Chris Fregly)

6:00-6:30pm: Holden Karau (Spark Top Contributor, Author, IBM Spark Tech Center)

6:30-7:00pm: Anya Bida (Spark Ops Expert, Alpine Data Labs)

7:00-7:30pm: Umar Farooq Minhas, PhD (Spark Perf Expert, IBM Research)

8:00-8:30pm: Mark Grover (Spark Contributor, Author, Cloudera)

7:30-8:00pm: Mark Hamstra (Spark PMC/Committer, ClearStory)

8:30-9:00pm: Ask Me Anything (All Speakers!!)

Talk 1: Spark Internals with Holden Karau, Top Spark Contributor and Book Author

Speaker: Holden Karau (https://www.linkedin.com/in/holdenkarau)
Principal Software Engineer @ IBM Spark Technology Center
Author of Fast Data Analytics with Spark @ Packt
Author of Learning Spark (http://shop.oreilly.com/product/0636920028512.do) @ O'Reilly
Author of upcoming High Performance Spark (http://shop.oreilly.com/product/0636920046967.do) @ O'Reilly

Discussing excerpts from her upcoming book, High Performance Spark (O'Reilly), Holden will give us a deep dive into Spark internals - as well as provide tips and tricks on writing high performance Spark applications.

Talk 2: Spark Operations

Speaker: Anya Bida, Operations Engineer @ Alpine Data
Running into roadblocks? Anya will present a few best practices for running Spark applications reliably.

http://techsuppdiva.github.io/ (http://techsuppdiva.github.io/)

https://spark-summit.org/east-2016/events/spark-tuning-for-enterprise-system-administrators/ (http://techsuppdiva.github.io/)

Talk 3: Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Speaker: Umar Farooq Minhas, PhD
Researcher @ IBM
PhD Computer Science @ University of Waterloo

We will present the results of an in-depth experimental study that evaluates the major architectural components in MapReduce and Spark frameworks including: shuffle, execution model, and caching. Our experiments show that Spark is about 2.5x, 5x, and 5x faster than MapReduce for Word Count, k-means, and PageRank, respectively. For the Sort workload, MapReduce is 2x faster than Spark. For each of these workloads, we present a detailed analysis and discuss the causes of performance differences attributing them to different architectural components that we study.

Short Bio: Dr. Minhas is currently working as a Research Staff Member at the IBM Almaden Research Center. In his current role, he leads and co-leads various efforts focused on cognitive computing, resource provisioning, scheduling, storage, and next generation platforms. He holds a Ph.D. and an M.S. degree from the University of Waterloo, where he specialized in implementing highly available, fault tolerant, and scalable database systems suited for cloud computing environments.

http://www.vldb.org/pvldb/vol8/p2110-shi.pdf

Talk 4: Top 5 Mistakes When Writing Spark Applications

Speaker: Mark Grover (https://www.linkedin.com/in/grovermark)
Spark Contributor and Software Engineer @ Cloudera
Co-author of Hadoop Application Architectures (http://shop.oreilly.com/product/0636920033196.do)

We're happy to bring one of the BEST talks from Spark Summit East NYC 2016 back to the Bay Area!

This session will go over the top K things that we’ve seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.

Don't miss this! This talk is packed with advanced Spark tuning and configuration tips/tricks.

http://www.slideshare.net/markgrover/top-5-mistakes-when-writing-spark-applications-61072412

Talk 5: Spark Internals with Mark Hamstra, Spark PMC/Committer!!

Speaker: Mark Hamstra
Spark PMC/Committer, Software Engineer @ ClearStory

Free-form Spark Internals Q&A with Spark User and Dev List Legend, Mark Hamstra!

Talk 6: Ask Me Anything

All Speakers

Relevant Links

http://techblog.netflix.com/2016/04/saving-13-million-computational-minutes.html

http://www.brendangregg.com/blog/2015-05-15/ebpf-one-small-step.html

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/

http://www.virdata.com/tuning-spark/

http://blog.madhukaraphatak.com/extending-spark-api/

http://0x0fff.com/spark-architecture/

http://0x0fff.com/spark-architecture-shuffle/

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

http://www.vldb.org/pvldb/vol8/p2110-shi.pdf

http://techsuppdiva.github.io/ (http://techsuppdiva.github.io/)

https://spark-summit.org/east-2016/events/spark-tuning-for-enterprise-system-administrators/ (http://techsuppdiva.github.io/)

AI Performance Engineering Meetup (San Francisco, Global)

[SF] High Performance Spark +Internals +Operations +Committers +Ask Me Anything

AI Performance Engineering Meetup (San Francisco, Global)

Details

Related topics

You may also like