Scala: The Unpredicted Lingua Franca for Data Science


Details
About the lecture: It was true that until relatively recently, the languages of choice for data scientists to manipulate and extract meaning from data were primarily either Python, R or Matlab. The limitations of these languages, however, lead to divergence within data science communities and subsequent upsurges of attention to languages offering a similar set of functionality.
Although it was foreseen that data scientists could utilize elements of these communities, an unexpected development affected the amount of available data and the distributed technologies to handle them. Distributed technologies accelerated, using a convenient and easy-to-deploy platform, the JVM. In this talk, guest lecturer Andy Petrella will show how data scientists are now part of a heterogeneous team facing many problems and having to work towards a global solution together. This endeavor includes a new responsibility to be both agile and productive in order to have work integrated into the platform. For this reason, technologies like Apache Spark are increasingly of paramount importance and are gaining traction from different communities. Even though some degree of attachment to the legacy languages remains, all the creativity and new processes for analyzing the data has to be done in Scala. Based on this development, the second part of Andy’s talk will introduce and summarize new methodologies and scientific advances in machine learning with Scala as the main language. He’ll demonstrate that for data scientists, using the right tooling enables interactivity, live reactivity, charting capabilities and robustness in Scala— elements still missing from the legacy languages. As such, the examples will be provided and shown in a fully productive and reproducible environment combining the Spark Notebook and Docker.

Scala: The Unpredicted Lingua Franca for Data Science