To start the new year we have Jeroen Janssens from YPlan discussing how the command line can be used for data science.
About the talk:
We data scientists love to create exciting data visualizations and insightful models. However, before we get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data.
The *nix command line, although invented decades ago, remains a powerful environment for such data science tasks. It provides a read-eval-print loop (REPL) that is often much more convenient for exploratory data analysis than the edit-compile-run-debug cycle associated with scripts or even programs. Even if you're already comfortable processing data with, for example, R or Python, being able to also leverage the power of the command line can make you a more efficient data scientist.
In this one-hour presentation we'll look at the following subjects:
• Essential concepts of the *nix command line;
• Setting up an efficient environment;
• Filters such as cut, gre, sed, and awk;
• Scraping websites using curl, scrape, xml2json, and jq;
• Managing your data science workflow using drake;
• Parallelizing and distributing data-intensive pipelines; and
• Turning one-liners and existing code into reusable command-line tools.
The main goal of this presentation is to give you have an understanding of why, when, and how you could use the command line for your next data science project.
Jeroen Janssens is a senior data scientist at YPlan, tonight's going out app, where he's responsible for making event recommendations more personal. Jeroen holds an M.Sc. in Artificial Intelligence from Maastricht University and a Ph.D. in Machine Learning from Tilburg University. He is authoring a book called "Data Science at the Command Line", which will be published by O'Reilly in summer 2014. Jeroen enjoys biking the Brooklyn Bridge, building tools, and blogging at http://jeroenjanssens.com. He can be found on Twitter @jeroenhjanssens.
As per usual, pizza begins at 6:30, the speaker at 7, and then the bar whenever he finishes.