Skip to content

Document Classification on Apache Spark

Document Classification on Apache Spark

Details

There are copious tutorials, demos and walk-throughs that illustrate how to apply machine learning algorithms to perfectly-manicured data sets. But this doesn’t reflect real-life situations for those who have big opportunities to find big value. What happens when your dataset is massive and unformatted, such as the internet search history for…everyone? Maybe you built some very good models - now what?

This session takes place at the busy intersection of Big Data, Machine Learning and Business Problems. Being exposed to Apache Spark and its quickly maturing set of machine learning tools, you’ll see how to 1) generate powerful modeling features, 2) apply the appropriate ML algorithms and 3) be able to generate value every time. You’ll also leave with the code to reproduce the results and start creating your own value.

Presented by Joe Blue, Data Scientist of MapR

http://photos2.meetupstatic.com/photos/event/4/e/9/c/600_440840124.jpeg

In his role as Data Scientist at MapR, Joe assists customers in solving their big data problems, making efficient use of the Hadoop ecosystem to generate tangible results. Recent projects include debit card fraud & breach detection, lead generation from social data, customer matching through record linkage, lookalike modeling using browser history and real-time product recommendations.
Prior to MapR, Joe was the Chief Scientist for Optum (a division of UnitedHealth) and the principal innovator in analytics for healthcare. As a Sr. Fellow with OptumLabs, he applied machine learning concepts to healthcare issues such as disease prediction from co-morbidities, estimation of PMPY (member cost), physician scoring and treatment pathways. As a leader in the Payment Integrity business, he built anomaly detection engines responsible for saving $100M annually in claim overpayments.

Event will be recorded and posted after editing.

Parking & Security Instructions:

Ponce City Market parking – Once you have arrived onsite, please park in guest parking and bring your license plate number with you (we recommend a photo of your plate). Upon check in on the 6th floor, our receptionist will help you register your vehicle for Cardlytics validated paid parking.

Food will be provided by D.B.A. BBQ

Agenda:

6:30-7:15PM Socialize

7:15-8:00PM Presentation

8:00-8:30PM Q&A

Photo of Big Data ATL group
Big Data ATL
See more events