Spark Meetup November@Pivotal


Details
Real-time website classification using Apache Spark
Amith from Pivotal will present on an end to end application for solving website categorisation problem. The presentation shows how Spark had been incorporated into a modern microservice architecture.
While building an app which has a Data Science component as part of the overall solution, we normally see that the Data Science component does not fit into the development and deployment model used for the rest of the application and often involves manual intervention, like:
- Build the model in a different environment2) Ship the model so that it can be "Production'ised" in a completely different environment.
Much of the problem stems from the fact that these models do not have a way to be interfaced by other systems. And are often built, tested and worked upon in isolation.
This talk presents some architectural ideas to bridge this gap and shows how Data Science can fit seamlessly into the overall solution. We will go through the end to end design and architecture of an app which classifies website URL's into categories using Apache Spark's MLlib. Also, ideas on how to continuously enhance the model as we get more training data.
Bio: Amith is a software enthusiast currently in the role of a Solutions Architect with the Data team at Pivotal Software. Amith likes to research new technologies to provide elegant solutions to business problems.
https://www.linkedin.com/in/amithnambiar

Spark Meetup November@Pivotal