Skip to content

Real-Time Analysis of Uber Data using Apache API's: Kafka, Spark, and HBase

Photo of Slim Baltagi
Hosted By
Slim B. and Jay V.
Real-Time Analysis of Uber Data using Apache API's: Kafka, Spark, and HBase

Details

Please join us for an introduction to building a distributed machine learning Pipeline for real time analysis of Uber data using Apache APIs: Kafka, Spark, and HBase. Our speaker is Carol McDonald (https://www.linkedin.com/in/caroljmcdonald/) from MapR technologies (https://mapr.com/).

Schedule:

6:30 pm - 6:50 pm: Networking, pizza and drinks

6:50 pm - 6:55 pm: Welcome by Black Duck Software (https://www.blackducksoftware.com/)

6:55 pm - 7:00 pm: Introduction by Hexstream (http://www.hexstream.com/)

7:00 pm - 8:00 pm: Talk by Carol McDonald from MapR Technologies (https://mapr.com/)

7:30 pm - 8:00 pm: More networking!

Sponsors:

• Black Duck Software (https://www.blackducksoftware.com/) is hosting the event.

• Hexstream (http://www.hexstream.com/) is offering pizza and beverage.

Description:

In this talk we will look at a solution that combines real-time data streams with machine learning to analyze and visualize popular Uber trip locations in New York City. You will see the end-to-end process required to build this application using Apache APIs for Kafka, Spark, and HBase.

According to Gartner, by 2020, smart cities will be using about 1.39 billion connected cars, IoT sensors and devices. The analysis of behavior patterns within cities will allow optimization of traffic, better planning decisions, and smarter advertising. You may be excited about the possibilities of exploiting data streams to gain actionable insights from continuously produced data in real-time but you may find it difficult to conceptualize how to implement such a solution. In this talk, we will walk you through an architecture that combines data streaming with machine learning to enhance Uber trip data to analyze and visualize the most popular pick-up/drop-off locations by date and time so that drivers’ locations could be optimized and priced according to demand.

The presentation will consist of four sections:

• Introduction to Spark machine learning for developers

• Kafka and Spark Streaming

• Real time dashboard using a micro service framework

• Using the Spark HBase connector for parallel writes and reads

Bio:

https://secure.meetupstatic.com/photos/event/1/e/e/9/600_465427913.jpeg

Carol Mcdonald is a solutions architect at MapR focusing on big data, Apache Kafka, Apache HBase, Apache Drill, Apache Spark, and machine learning in healthcare, finance, and telecom.

Previously, Carol worked as a Technology Evangelist for Sun, an architect/developer on: a large health information exchange, a large loan application for a leading bank, pharmaceutical applications for Roche, telecom applications for HP, messaging applications for IBM, and sigint applications for the NSA.

Carol holds an MS in computer science from the University of Tennessee and a BS in geology from Vanderbilt University.

Photo of Boston Area Advanced Analytics Meetup group
Boston Area Advanced Analytics Meetup
See more events
Black Duck Software
800 District Ave., Suite 201 · Burlington, MA