The Polyglot Data Scientist & Learning Algorithms with Neural Turing Machines

Details

This month we have a double whammy from Europe with Jeroen Janssens from the Netherlands (and formerly NY) discussing doing data science in multiple languages and Tristan Deleu from France talking about external memory for neural networks.

Thank you to eBay (http://www.ebaynyc.com/) for hosting us and providing pizza. And thank you to Snowflake (http://www.snowflake.net/) for sponsoring this month's meetup.

About The Polyglot Data Scientist:

It’s generally good practice to stick to one programming language or one computing environment. The code will most likely be more consistent, more stable, and easier to maintain. However, sometimes, especially for exploratory data science projects, it can be more effective or efficient to mix and match. For instance, consider the situation where you want to make use of a fast machine-learning library. It turns out that this library is written in C++, but you work in R, and there are no language bindings available yet. Or consider the situation where you previously solved a certain problem in R, and now you need to solve that same problem again for a Python-based project.

In this talk I discuss three approaches to become a polyglot data scientist. First, I demonstrate Beaker Notebook, which allows you to use multiple languages (Python, R, JavaScript, Julia, etc.) in one notebook. Second, I show several ways of combining programming languages (e.g., how to load R data into MATLAB, how to use a MATLAB package in Python, and how to call Python functions from R). This list of combinations is not exhaustive, but it will give you a good idea of the possibilities. Third, I explain how to write your own reusable command-line tools and employ command-line tools directly from Python and R. The command line is language agnostic, which means that you can combine tools written in just about any language. With a few simple steps, it’s possible to turn your existing code into a command-line tool.

About Jeroen:

Jeroen Janssens is an assistant professor of data science at Tilburg University. As an independent consultant and trainer, Jeroen helps organizations make sense of their data. Previously, he was a data scientist at Elsevier in Amsterdam and the startups YPlan and Outbrain in New York City. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. He is the author of Data Science at the Command Line, published by O’Reilly Media. He blogs at jeroenjanssens.com (http://jeroenjanssens.com/) and tweets as @jeroenhjanssens (https://twitter.com/jeroenhjanssens).

About Learning Algorithms with Neural Turing Machines:

The idea of adding an external memory to neural networks has been increasingly popular in Deep Learning over the past 2 years. In this talk, I will present you one of the earliest example of this family of models: the Neural Turing Machines. I will introduce NTM-Lasagne (https://github.com/snipsco/ntm-lasagne), a library based on the open-source project Lasagne (https://github.com/Lasagne/Lasagne), to create Neural Turing Machine components as part of your Deep Learning models. I will also show you how this model works, its early applications to algorithm learning as well as recent developments in language understanding.

About Tristan:

Tristan is a Research Fellow at Snips (https://snips.ai/), a french startup developing a privacy-preserving Artificial Intelligence for connected devices. He works on probabilistic models and Deep Learning applied to Natural Language Processing. Tristan holds a MS.c in Machine Learning from the Ecole Normale Superieure.

Pizza (http://www.jaredlander.com/2012/09/pizza-polls/) begins at 6:30, the talks at 7 and then we'll head to a local bar.