The Data Scientists' Guide to Apache Spark

Name: The Data Scientists' Guide to Apache Spark
Start: 2015-10-12T18:30:00-07:00
End: 2015-10-12T21:00:00-07:00
Location: Galvanize

Hosted by Tammy L.

SF Data Science

Details

Data Scientist are finding themselves working with increasingly large and complex data in their day to day work. The standard toolset of a data scientist however has not evolved to meet this need. There currently exists a divide in the tools of engineers (such as Java and Hadoop) which have been developed to handle production tasks and those of data scientists (Python and R) which facilitate rapid prototyping and modeling.

While there has been much improvement in the tooling for dealing with data at scale with the development of higher abstractions such as Pig, Hive, Spark, and Scalding, there hasn’t been an equivalent adoption in the workflow of many data scientists. Part of this is due to awareness and part of this is due to availability resources. Due to the fact that most of these tools are in languages the data scientists may not be comfortable with (Java, Scala) there is a perceived high barrier to entry.

This talk will teach the best practices of using Spark for practicing data scientists in the context of a data scientist’s standard workflow. By leveraging Spark’s APIs for Python and R to present practical applications, the technology will be much more accessible by decreasing the barrier to entry.

Prerequisites:
Intermediate

What To Bring:
Laptop

Meet Your Instructor:
Jonathan Dinu is currently the VP of Academic Excellence at Galvanize. Previously, he founded Zipfian Academy, which recently has been acquired by Galvanize. He first discovered his love of all things data while studying Computer Science and Physics at UC Berkeley. In a former life, he worked for Alpine Data Labs developing distributed machine learning algorithms for predictive analytics on Hadoop.

Jonathan has always had a passion for sharing the things he has learned in the most creative ways he can. At Galvanize, he gets to combine his two favorite things: humans and code. When he is not working with students you can find him blogging about data, visualization, and education at http://hopelessoptimism.com

SF Data Science

The Data Scientists' Guide to Apache Spark

SF Data Science

Details

Related topics

You may also like