Time has come for a presentation of applied machine learning. Recorded Future will talk about the innovative way they made use of MLlib, a part of the Spark stack. Apache Spark is a framework for distributed computations. It powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming. We will prepare a short intro on SVM in case some of you are not familiar with it.
A machine learning pipeline for event detection on Spark
Every day we send more than 144 billion emails, 340 million tweets and every minute we make more than 2 million Google search queries. What they have in common is that they consist of text. Text based communications is a crucial part of our every day experience. Event detection and event argument extraction consists of trying to structure the wealth of unstructured data about events we have available on the web. In order to detect events and their arguments we employ a multitude of different tools developed by the natural language processing community. We will describe a pipeline dealing with the task of event detection and argument extraction, give some background on natural language processing and describe state-of-the-art methods.
Daniel Langkilde works as a Machine Learning Engineer at Recorded Future, and is also a visiting scholar in the AMPLab at Berkeley. Before this he studied Engineering Mathematics at Chalmers University of Technology, Sweden. He is broadly interested in information retrieval, natural language processing and machine learning.
Recorded Future are a startup company headquartered in Cambridge, MA with offices in Arlington, VA and Gothenburg, Sweden. Their team includes computer scientists, statisticians, linguists, and technical business people with deep expertise in areas such as intelligence and security. They’re committed to organizing the web in a radically new and useful way.