addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Scala & Data!

  • Jul 21, 2016 · 6:30 PM

This month, we will be joining Tapad for an evening of talks on the intersection of Scala and Data Engineering.

Talk 1 - KNN with Apache Flink
Dan Blazevski, Insight Data Engineering

About the Talk:
We will present some recent progress on Apache Flink's machine learning library, focusing on a new implementation of the k-nearest neighbors (knn) algorithm for Flink.  In the spirit of the Kappa Architecture, Apache Flink is a distributed batch and stream processing tool that treats batch as a special case of stream processing.  We will discuss a few ways, both exact and approximate, to do distributed knn queries, focusing on using quadtrees to spatially partition the training set and using z-value based hashing to reduce dimensionality.

Dan Blazevski loves distributed computing.  He has prior academic/lab work experience at ETH Zurich and Oak Ridge National Laboratory in computational physics and engineering after completing his PhD in Mathematics from UT Austin.  Although he still occasionally misses the good 'ol days of Fortran and MPI, he's pretty excited to have made the transition to industry as a Data Engineering Insight Fellow in 2015 where he started working on Flink, and now helps lead the Fellows program in NYC.

Talk 2 - How to find connected components efficiently at scale using a modified hash-to-min algorithm with map-reduce
Yael Elmatad, Tapad


Yael Elmatad is a Data Scientist at Tapad. Prior to Tapad, Dr. Elmatad was a Faculty Fellow and Assistant Professor at NYU Physics Department, specializing in the use of high-performance computing to study model space parameter optimization. Ms. Elmatad holds a PhD in Physical Chemistry from University of California, and BS in Mathematics, Computer Science and Hebrew Language from New York University.

6:30pm - Doors open
7:00pm - Talk 1 and Q&A
7:30pm - Talk 2 and Q&A
8:00pm - Socializing with speakers and attendees
8:30pm - Close

Join or login to comment.

  • Nathaniel G.

    Fascinating talks last night!

    July 22

  • Mark C.

    Wow, what an amazing presentation!

    July 21

  • Ian R.

    will a video recording be available for those who can't make it?

    July 21

    • Toby M.

      Yes, we will record the event.

      1 · July 21

    • Ian R.

      Thanks much

      July 21

Our Sponsors

  • Hakka Labs

    Growing the largest community of data engineers and data scientists

  • Spotify

    Big thanks to Spotify for helping support & host NYC Data Eng!

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy