An Evening with Chris Fregly - Spark Author / Contributor

Name: An Evening with Chris Fregly - Spark Author / Contributor
Start: 2016-01-19T18:30:00-06:00
End: 2016-01-19T20:45:00-06:00
Location: HomeAway North @ Domain

Hosted by Austin Data Geeks

Austin Data Geeks

Details

Chris Fregly (https://www.linkedin.com/in/cfregly) is coming to Austin for Data Day Texas, (http://datadaytexas.com/speakers) and has agreed to stay a day and spend an evening with us. This is a joint meetup with the Austin Spark Meetup (https://www.meetup.com/austin-spark-meetup/events/226678379/).

Talks

Chris is going to give two presentations with a 15 minute break in between. Expect an information windtunnel.

Spark Core Performance Improvements

This will be a code-level, deep-dive into the recent spark core performance Improvements that evolved out of the 100TB Daytona GraySort Challenge (Spark 1.1 & 1.2) and Project Tungsten (Spark 1.5 & 1.6). Both the 100TB Daytona GraySort Challenge (Spark 1.1 & 1.2) and Project Tungsten (Spark 1.5 & Spark 1.6) led to significant performance improvements in the Spark Core. These improvements cover all key resources including disk I/O, network I/O, CPU, and memory. Through a series of demos and slides, we'll dive deep into the key performance improvements and understand how they acknowledge mechanical sympathy by reducing disk seeks, minimizing garbage collection, increasing CPU cache locality, saturating the network controller - destroying world performance records along the way.

Approximations and Probabilistic Data Structures with Spark

Code-level, Deep-Dive into approximations and probabilistic data structures such as CountMin Sketch, HyperLogLog, and BloomFilters within Spark Core, Spark Streaming, Spark ML, Spark SQL, BlinkDB, Twitter's Algebird, and Redis.
Abstract
First, we'll discussing the core fundamentals that make up these clever and mathematically-sound probabilistic data structures and algorithms.
Next, we'll discuss various use cases where approximations are useful -and often required!
Lastly, we'll demo these gems in action using Spark Core, Spark Streaming, Spark ML, Spark SQL, BlinkDB, Twitter's Algebird, and Redis.

Speaker Bio

Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, as well as the Organizer of the global Advanced Apache Spark Meetup and Author of the Upcoming Book, Advanced Spark. Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.

When Chris isn’t contributing to Spark and other open source projects, he’s creating book chapters, slides, and demos to share knowledge with his peers at meetups and conferences throughout the world.

Agenda

6:30 Networking

7:00 Featured talks (Don't be late!)

A special thanks to the folks at HomeAway (http://homeaway.com/) for not only hosting, but also picking up the food beverage tab for this meetup. Check out their latest job openings (http://www.homeaway.com/info/about-us/careers-1/career-opportunities).

Austin Data Geeks

An Evening with Chris Fregly - Spark Author / Contributor

Austin Data Geeks

Details

Related topics

You may also like