Jul 21, 2016 · 6:30 PM
This month, we will be joining Tapad for an evening of talks on the intersection of Scala and Data Engineering.
Talk 1 - KNN with Apache Flink
Dan Blazevski, Insight Data Engineering
About the Talk:
We will present some recent progress on Apache Flink's machine learning library, focusing on a new implementation of the k-nearest neighbors (knn) algorithm for Flink. In the spirit of the Kappa Architecture, Apache Flink is a distributed batch and stream processing tool that treats batch as a special case of stream processing. We will discuss a few ways, both exact and approximate, to do distributed knn queries, focusing on using quadtrees to spatially partition the training set and using z-value based hashing to reduce dimensionality.
Dan Blazevski loves distributed computing. He has prior academic/lab work experience at ETH Zurich and Oak Ridge National Laboratory in computational physics and engineering after completing his PhD in Mathematics from UT Austin. Although he still occasionally misses the good 'ol days of Fortran and MPI, he's pretty excited to have made the transition to industry as a Data Engineering Insight Fellow in 2015 where he started working on Flink, and now helps lead the Fellows program in NYC.
Talk 2 - How to find connected components efficiently at scale using a modified hash-to-min algorithm with map-reduce
Yael Elmatad, Tapad
Yael Elmatad is a Data Scientist at Tapad. Prior to Tapad, Dr. Elmatad was a Faculty Fellow and Assistant Professor at NYU Physics Department, specializing in the use of high-performance computing to study model space parameter optimization. Ms. Elmatad holds a PhD in Physical Chemistry from University of California, and BS in Mathematics, Computer Science and Hebrew Language from New York University.
6:30pm - Doors open
7:00pm - Talk 1 and Q&A
7:30pm - Talk 2 and Q&A
8:00pm - Socializing with speakers and attendees
8:30pm - Close