Text Classification using Spark Machine Learning


Details
The goal of text classification is the classification of text documents into a fixed number of predefined categories. Text classification has a number of applications ranging from email spam detection to providing news feed content to users based on user preferences.
In this session, we will explore how to perform text classification using Spark’s Machine Learning Library (MLlib). We will see how MLlib provides a set of high-level APIs for constructing, evaluating and tuning a machine learning workflow. We will explore how Spark represents a workflow as a pipeline, which consists of a sequence of stages to be run in a specific order. The pipeline for our text classification use case will utilize transformer stages to prepare the raw text documents for classification and estimator stages to learn a machine learning model that can be used to classify documents. Tuning the model for best fit will also be illustrated.
In the session, we'll walk through a detailed demo of text classification running in a Spark notebook
Although a document classification use case will be specifically explored, many of the principles demonstrated in the session can be employed in a variety of other machine learning use cases.
There is no charge of this event. It's just an opportunity for us to explore the use case and learn together. Pizza and drinks will be served starting at 6 PM. The technical session will start around 6:20.

Sponsors
Text Classification using Spark Machine Learning