Apache Spark Lightning Talks (1.17)

Details
NOTE: RSVP via Eventbrite to this event. Meetup RSVP will not be counted.
This event will feature a series of brief, engaging lightning talks with data scientists discussing Apache Spark.
Speakers keep their presentations under 20 minutes, and allow the audience to ask questions about their presentation for 15 minutes.
Who are these talks for?
These lightning talks are for anyone with a strong personal or professional interest in data science, data engineering, and/or Apache Spark. Beginners are welcome!
Why Spark?
Apache Spark is a powerful, open source processing engine for data distributed across large clusters. Spark is optimized for speed and ease of use; it uses caching and memory to run distributed algorithms up to 100x faster than MapReduce. Spark can be used for batch processing and for processing data in near real-time.
Washington Supreme Court Opinions
David Valpey, Data Scientist
Our legal system depends on knowledge of what came before. Anyone working within our legal system must navigate a large amount of data. David describes his use of Python, Apache Spark, Numpy, NLTK, and BeautifulSoup to extract key features of Washington State Supreme Court opinions and navigate them by similarity.
David is a data scientist with a background in computer science in linguistics.
Album Recommendation Web App
Sal Khan, Data Scientist
Pandora and other popular music services recommend individual songs based on their musical characteristics. Because he prefers listening to entire albums, Sal built an album recommendation system using Spark ML. He also deployed it as a live web application using Flask and UWSGI.
Sal is a data scientist with a background in consulting and business analytics.
Mining Reviews for Product Features
Rob Dalton, Data Scientist
Amazon and other online shopping sites provide information on product quality in the form of customer reviews. Rob Dalton has built a PySpark application that mines Amazon product reviews, extracting the most criticized features and the most praised features of each product. This tool can help customers save time spent reading full reviews, and it can help companies identify potential product defects.
Rob is a data scientist with a background in management consulting and web development.
__________________________
About our Sponsor
Galvanize is the premiere dynamic learning community for technology. With campuses located in booming technology sectors throughout the country, Galvanize provides a community for each the following:
Education - part-time and full-time training in web development, data science, and data engineering
Workspace - whether you’re a freelancer, startup, or established business, we provide beautiful spaces with a community dedicated to support your company’s growth
Networking - events in the tech industry happen constantly in our campuses, ranging from popular Meetups to multi-day international conferences
To learn more about Galvanize, visit galvanize.com (http://galvanize.com/).
NOTE: RSVP via Eventbrite to this event. Meetup RSVP will not be counted. (https://www.eventbrite.com/e/seattle-data-science-apache-spark-lightning-talks-117-tickets-30947542934?aff=DS)
https://www.eventbrite.com/e/seattle-data-science-apache-spark-lightning-talks-117-tickets-30947542934?aff=DS

Apache Spark Lightning Talks (1.17)