pySpark, Ipython Notebook and SparkSQL as a environment for data science

Name: pySpark, Ipython Notebook and SparkSQL as a environment for data science
Start: 2015-05-28T19:00:00-05:00
End: 2015-05-28T22:00:00-05:00
Location: OWS-150 (Owens Science Hall), University of St. Thomas

Hosted by Ryan B.

Twin Cities Spark and Hadoop User Group

Details

Abstract: Data Science on Hadoop can be a daunting journey as you generally are spanning multiple tools and different interfaces. Furthermore, while there are people out there doing data science, worked examples are few and far between.

As part of the Social Security Act, the Center for Medicare and Medicaid Services has begun to publish data detailing the relationship between physicians and medical institutions. This data has been analyzed cursorily in the press, but an in-depth outlier and benford's law analysis hasn't been attempted (to my knowledge).

Casey will present a demo using Spark and Hive to do the above analysis without leaving IPython notebook.

Speaker: Casey Stella is a Principal Architect at HortonWorks and focus' on issues around data science and especially natural language processing at scale. He has domain knowledge in medical/clinical informatics and oil/gas data analysis and signal processing at scale.

Food: Pizza and drinks, first come first serve, starting at 6:30PM.

Map: http://bit.ly/RCtaTI

Twin Cities Spark and Hadoop User Group

pySpark, Ipython Notebook and SparkSQL as a environment for data science

Twin Cities Spark and Hadoop User Group

Details

Related topics

You may also like