RVA Data Hackers: Analyzing Event Streams using Spark and GraphX w/ Myles Baker


Details
Analyzing Wikipedia Event Streams using Spark 2.0 and GraphX
Apache Spark (http://spark.apache.org/) is fast becoming the industry standard for large-scale data processing. If you’ve been following the Spark world, you know a lot has happened since Spark’s first release in 2014. A SQL-like interface, an extensive machine-learning library, an enhanced data frame similar to R and Pandas, and a distributed graph processing framework were all added in just the last two years. The Spark project is moving fast and there’s a lot to learn about this exciting framework for real-time big data analytics.
One of the most exciting use cases for Spark is to analyze event streams in real-time. What’s an event stream? Think about data that is published continuously such as Twitter or stock market tickers. How do you detect events as they’re happening? Event stream analysis is how we understand trends and detect anomalies in this real-time data. This use case is going to be incredibly important as more and more event streams come online with the Internet of Things (IoT) and Spark is uniquely positioned to be the analytics engine for real-time big data analysis.
This talk will showcase some of the new features of Spark 2.0 and the power of the GraphX (http://spark.apache.org/graphx/) processing framework. We’ll use a live data science demonstration of analyzing clickstream (https://en.wikipedia.org/wiki/Clickstream) data from the Wikimedia foundation to showcase Spark’s features, how you might use Spark on your projects, and why you should be excited about using Spark for your next big data project.
A github repository with (offline) instructions for the demo will be made available during the meetup.
About Myles
Myles Baker is a senior data scientist with experience designing and analyzing large-scale distributed processing applications in the healthcare, aerospace engineering, finance, and telecommunications industries. He specializes in predictive analytics and machine learning models, but is also a hands-on big data engineer and architect. Mr. Baker received a B.S. in Applied Mathematics from Baylor University and an M.S. in Computer Science Specializing in Computational Operations Research from the College of William and Mary.
Win a Raspberry Pi 2
To get you going with your own IoT projects (and to have a little old-school fun), tonight’s sponsor, CapTech Consulting (http://www.captechconsulting.com/), is going to raffle off a Raspberry Pi 2 (https://www.raspberrypi.org/products/raspberry-pi-2-model-b/) loaded with game system emulators (and possibly a few roms) during the meetup. The list of emulated platforms includes Nintendo Entertainment System, Super NES, Gameboy Advance, Playstation, Nintendo 64, Sega Genesis, Atari, Commodore 64, etc. If you’ve been hankering to play Super Mario Bros or Legend of Zelda again, join us for your chance to win.
About Data Hackers
RVA Data Hackers is a community of programmers who meet regularly to develop skills and learn about the tools and techniques of Big Data. We discuss how to find, organize, understand and serve data sets large and small. We'll cover anything related to 'big data' -- machine learning, artificial intelligence and architectures to scale Big Data for the Internet. If you're a programmer interested in machine learning algorithms and managing big data, this group is for you. Topics vary from basic concepts to demonstrations of real-world implementations and everything in between. Our mission is to foster a local community of experienced, practicing experts. We're here to have fun, share and learn about an exciting field of computer science.
Our Sponsors
RVAData Hackers is sponsored by UpJump (http://upjump.com), CapTech Consulting (http://captechconsulting.com), Richmond Analytics (http://www.richmondanalytics.com) and 804RVA (http://www.804rva.com).

RVA Data Hackers: Analyzing Event Streams using Spark and GraphX w/ Myles Baker