Talks on Apache Spark - PySpark and MLlib

This is a past event

110 people went

Details

Talk #1: Programming in Spark using PySpark

This session covers how to work with PySpark interface to develop
Spark applications. From loading, ingesting, and applying
transformation on the data. The session covers how to work with
different data sources of data, apply transformation, python best
practices in developing Spark Apps. The demo covers integrating Apache
Spark apps, In memory processing capabilities, working with notebooks,
and integrating analytics tools into Spark Applications.

Speaker Bio: Mostafa Elzoghbi
Mostafa Elzoghbi is a Microsoft Sr. Technical Evangelist based in DC. Mostafa is specialized in Azure architecture and development, Office 365 development and administration, .NET development, and enterprise product development. I am an experienced architect in building cloud based solutions and a data nerd. Prior to joining Microsoft, Mostafa was awarded a Microsoft Most Valuable Professional (MVP) award for five consecutive years. Mostafa holds a M.Sc. of Computer Science and holds several Microsoft certifications. Mostafa can be reached on Twitter @MostafaElzoghbi (https://twitter.com/MostafaElzoghbi) and at http://www.mostafaelzoghbi.com/

Talk #2: Machine Learning in the Cloud with Spark and MLlib

This session expands on using PySpark to develop Spark applications utilizing supervised and unsupervised machine learning algorithms. It includes topics including data selection, feature engineering, transformations, and data science principles necessary to construct (train), validate (test), and interpret machine learning models. Examples of ML topics covered include classification, regression, decision trees/random forests, clustering, and topic modeling. This session provides the groundwork for successfully selecting and applying machine learning models at scale.

Speaker Bio: Dr. Bartley Richardson

Dr. Bartley Richardson has nearly a decade of experience in Data Science, Cloud Computing, Software Development, and Machine Learning. He has served as both Department Chair and Visiting Professor at two universities and has published over 10 articles in journals and conference proceedings. He is currently serving as a data scientist, technical lead, and principal investigator on multiple government sponsored projects, including one DARPA research program at Sotera Defense Solutions. Dr. Richardson’s skills include: sequence techniques, temporal analysis, quick-looks, visual analytics, feature engineering, correlation across overlapping datasets, natural language processing techniques as applied to synthetic/constructed languages, and Bayesian networks. He has extensive experience with PCAP, Netflow, supervised/unsupervised machine learning, ensemble methods, deep learning, and is experienced with various cyber tools and data types, including: PCAP, Netflow, user/host events, raw user/device logs, Wireshark/Tshark, Scapy, Snort, and Moloch.

Agenda:
6:00pm-6:30pm Networking and Food
6:30pm-7:15pm Talk #1: Programming in Spark using PySpark
7:30pm-8:15pm Talk #2: Machine Learning in the Cloud with Spark and MLlib
8:30-9:00 Networking

By NOVA Data Science Meetup and GWU Data Science Program