PySpark Presented by Tim Hopper

Name: PySpark Presented by Tim Hopper
Start: 2015-02-19T18:30:00-05:00
End: 2015-02-19T21:30:00-05:00
Location: Bronto Software Inc.

Hosted by Melinda T.

Research Triangle Analysts

Details

Apache Spark is a next generation cluster computing framework and data processing engine. By combining Spark's primitive operations in a functional style, the user can perform complex computations on large datasets. Though similar to Hadoop, Spark relies much more heavily on RAM (instead of HDFS) and has been demonstrated as running up to 100x faster than Hadoop for some applications. This talk will introduce Spark in general and then show PySpark, the Python wrapper around core Spark, as a tool for rapid, interactive analytics as well as robust, production data pipelines. Finally, we will look at MLlib, Spark's distributed machine learning library.

Bio:Tim Hopper is a software engineer at Parse.ly, a web analytics startup. He has a masters in operations research from North Carolina State University.

Research Triangle Analysts

PySpark Presented by Tim Hopper

Research Triangle Analysts

Details

Related topics

You may also like