Skip to content

UPDATE: Data Science at Scale with Spark

Photo of Randy Kirk
Hosted By
Randy K.
UPDATE: Data Science at Scale with Spark

Details

MBDUG Members:

Final Update 4/4/16: Dean just sent the link to his slides . If you want to download so you can more easily follow along during the session, please do!! It promises to be a great session with a great group of members. See you Tuesday night!!

http://deanwampler.github.io/polyglotprogramming/papers/DataScienceAtScaleWithSpark.pdf

Update 3/31/16: One of our upcoming speakers, Mike Segel, was called out of state for work next week. Look for us to bring him up for a future meetup!

Thus, our April 5 meetup will have Dean Wampler as scheduled, presenting Data Science at Scale with Spark. Dean is a founding member of the Chicago Hadoop Users Group which in many ways inspired the formation of our own local group right here in Milwaukee.

Also at our April 5 meetup, your new “organizer team” for the MBDUG will share plans for the group and get feedback from attending members, both for a May meetup and for the group overall.

Update 3/24/16: Free parking is available after 5pm behind our building in MSOE Parking Lot B off N Milwaukee St. just north of E State St. To enter the building, walk around to the 1020 N. Broadway Street entrance and take the elevator to the third floor. The elevator will be unlocked from 5:00pm to 6:15pm. See Tech Center Directions and Map under Files for full details.

Come as early as 5pm for Soda, Beer, Salad, and Pizza. Presentations begin at 6pm and run until about 7 with more time for networking afterward.

Data Science at Scale with Spark

Apache Spark has been blessed as the replacement for MapReduce in Hadoop environments. It also runs in other deployment modes. Spark provides better performance, better user productivity, and it supports a wider range of application scenarios than MapReduce, including event stream processing, ad hoc SQL queries, graph representations and algorithms, and iterative algorithms, such as those commonly used in machine learning.

This talk discusses Spark from a Data Science perspective, its strengths and weaknesses, the Scala, Java, Python, and R APIs it offers for common analytics problems, what's missing, and what's planned. We'll look at the SQL support and stream processing, touch on the support for machine learning and graph processing, and highlight productivity advantages of Spark.

Bio

Dean Wampler, Ph.D., is the Big Data Architect and member of the Office of the CTO at Lightbend. He leads Lightbend's development of data-centric products and services. He works with clients on specific Fast Data challenges. Dean is a frequent conference speaker, O'Reilly author, and the leader of Chicago's Scala and Spark user groups.

~~~~~~~

MBDUG Organization Update

On the heels of our member survey taken about a month ago, we have six new organizers joining the two original organizers who remained. In late March, these organizers came together on a conference call and over a dinner to plan out the future of the Milwaukee Big Data Users Group. At our April 5 meetup, we plan to share some of our early thoughts for the group overall, as well as plans for meetups in general and the specific meetup being planned for May.

Photo of Data Driven MKE group
Data Driven MKE
See more events
Direct Supply
1020 N. Broadway, 3rd Floor · Milwaukee, WI