Scalable and Flexible Machine Learning With Scala @ LinkedIn


Machine learning (ML) turns data into predictions about the real world in an almost magical fashion. In this talk we'll show why Scala is a great language for machine learning practitioners and show the audience of Scala programmers how easy it is to start performing machine learning magic themselves.

Machine learning enthusiasts have historically preferred high-level programming languages due to their ability to concisely describe models and algorithms, but this has often come at the price of performance and production readiness. For example, the ease of rapid prototyping and syntactic sugar of Python have made it a popular choice for ML developers. Scala combines this flexibility with the strengths of the JVM such as performance and seamless interoperability with the mature ecosystem of existing Java software. In this talk we will show how Scala DSLs can improve the effectiveness of machine learning practitioners and even make machine learning capabilities accessible to people without a Scala background.

Scalding has a Hadoop-based DSL that allows code using regular Scala collections to be run as Hadoop jobs with almost no modification. We will show how machine learning code written in terms of operations over Scala collections can therefore be made to work immediately on giant compute clusters over terabytes of data. We will also briefly discuss how Scalding works in order to show how DSLs can be designed in Scala.

About the Speakers
Chris and Vitaly are jointly presenting this talk.

Chris Severs works in the Search Science applied research group at eBay. Chris fell in love with Scala at first sight and has been one of the main drivers of Scala adoption at eBay. He has contributed to the Scalding and Scoobi open source projects and authored an addition to Scalding to provide support for Apache Avro. Prior to joining eBay he was a postdoctoral researcher at The Mathematical Sciences Research Institute in Berkeley and then at Reykjavík University in Iceland.

Vitaly Gordon is a senior data scientist on the LinkedIn Product Data Science team where he develops data products that most of you use every day. Prior to LinkedIn, Vitaly founded the data science team at LivePerson and worked in the elite 8200 unit (the Israeli equivalent of the NSA), leading a team of researchers in developing algorithms to fight terrorism. His contributions have been recognized through a number of awards including the “Life Source” award, an award given each year deemed most high-impact in saving lives. Vitaly holds a B.Sc in Computer Science and an MBA from the Israeli Institute of Technology.

As usual, the schedule for the event is:

6:30 - networking, food

7:00 - announcements

7:15 - Chris and Vitaly

9:00 - More networking

We will give away two copies of IntelliJ IDEA Ultimate 12 for personal use, and two copies of JRebel. Two e-copies of "Monadic Design Patterns (", published by Artima will also be given away. If you did not register with your full name you will not be eligible to win anything. You can edit your name by going to Members / My Profile / Edit Profile. Tweet to #scalasv to be eligible (