Skip to content

Details

The goal of text classification is the classification of text documents into a fixed number of predefined categories. Text classification has a number of applications ranging from email spam detection to providing news feed content to users based on user preferences.

In this session, we will explore how to perform text classification using Spark’s Machine Learning Library (MLlib). We will see how MLlib provides a set of high-level APIs for constructing, evaluating and tuning a machine learning workflow. We will explore how Spark represents a workflow as a pipeline, which consists of a sequence of stages to be run in a specific order. The pipeline for our text classification use case will utilize transformer stages to prepare the raw text documents for classification and estimator stages to learn a machine learning model that can be used to classify documents. Tuning the model for best fit will also be illustrated.

In the session, we'll walk through a detailed demo of text classification running in a Spark notebook

Although a document classification use case will be specifically explored, many of the principles demonstrated in the session can be employed in a variety of other machine learning use cases.

There is no charge of this event. It's just an opportunity for us to explore the use case and learn together. Pizza and drinks will be served starting at 6 PM. The technical session will start around 6:20.

Related topics

Sponsors

Cognitive Classes.ai

Cognitive Classes.ai

Free on-line courses. Join 100,000+. Put your career on the right track.

IBM

IBM

Hands on experiences with IBM big data solutions

You may also like