Skip to content

Two Great Talks on Spark

Photo of Tammy Lee
Hosted By
Tammy L.
Two Great Talks on Spark

Details

*Note: Expedite check in at Galvanize, register here in advance (https://www.eventbrite.com/e/using-spark-graphx-and-zeppelin-to-analyze-clickstream-data-tickets-19892346544).

Agenda:
6:00 pm to 6:40 pm - Food, Drink, Networking
6:40 pm to 7:20 pm - Intro to Spark
7:30 pm to 8:45 pm - Using Spark, GraphX and Zeppelin to analyze clickstream data
9:00 pm - Closing Time

Talk 1: Intro to Spark
An introductory talk on Apache Spark. Aaron Merlob will give a high level overview of how Spark works, describe several of the most important basic functions, and walk through a handful of code samples. After this talk, audience members should understand enough about Spark to build a simple application.

Meet the Speaker:
Aaron Merlob is an instructor for Galvanize's 12 week Data Engineering course (http://www.galvanize.com/courses/data-engineering/).

Talk 2: Using Spark, GraphX and Zeppelin to analyze clickstream data
GraphX is a graph database library in Spark that allows for efficient searching of graphical data. A canonical case is presented for the wikipedia click stream data for February of 2015, where we develop a specialized graphical search for sites that are only a few nodes away and within a probability threshold. Graphical visualizations of these searches can be displayed within the Zeppelin notebook. Additional examples will also be discussed.

Spark SQL is a Spark module for structured data processing. DataFrames, one of the basic bricks of building Spark SQL, will be part of the talk. Using Spark RDDs, Data Frames, functions to clean and structure the data, Spark SQL to generate reports, using zeppelin notebook to run the queries and visualize the results are some of the topics of the talk.

Meet the Speaker:
Sudhakar Thota, Sr. Software Engineer at Spark Technology Center on the Spark Enablement Team, IBM.

"I'm simple, Strategic, multidisciplinary personality with an eye for innovation to make the world a better place. I’ve worked the gamut of clients (Wells Fargo, CA Systems, MapR to name a few) and although my skill set is diversified, my greatest expertise revolve in the worlds of huge data, semi structured social media data, mission critical systems.

Started career as Electrical Engineer, graduated in computer science and transformed into Oracle DB Architect. New challenges with new data forms grew my neck for Big Data. Database engines to small to big(Oracle, mysql, mongo, hadoop, hbase etc) are the tools of my data management. My wish is to combine my knowledge and experience in these areas, to deliver the best creative to my employer’s clients and their audiences. I love to learn and I also love coffee." - Sudhakar

. BS: Electrical Engineering (NIT)

. MS: Computer Science (JNTU)

*Note: Expedite check in at Galvanize, register here in advance

What To Bring:
Your thinking cap! Feel free to bring your laptop or paper and pen.

Photo of SF Data Science group
SF Data Science
See more events
Galvanize
44 Tehama Street · San Francisco, CA