Globally Scalable Web Document Classification Using Word2Vec

Main Talk:

Extracting information from unstructured web documents is a common problem for many applications and determining which category they belong to can be especially challenging at planetary scale.

In this talk, we will show how SmartNews achieves globally scalable, real-time web document classification using new machine learning techniques, especially Word2Vec's ( extended distributed representation model. We will also discuss the pros and cons for using distributed representation from a real-world, operational standpoint, as well as new classification approaches being used in Japan.


Kohei Nakaji is software engineer at SmartNews, one of Japan’s hottest startups with 10M+ users worldwide. SmartNews news discovery platform uniquely uses machine learning to extract, categorize, target, rank and deliver culturally relevant news to 150+ countries. Kohei’s research and engineering focus is machine learning and natural language processing.

Lightning Talk Title: ND4J: A scientific computing framework for the JVM

Lightning Talk Speaker: Adam Gibson (

Lightning Talk Description:

In this talk, we will present the ND4J framework with an iScala notebook. Combined with Spark's dataframes, this is making real data science viable in Scala. ND4J is "Numpy for Java." It works with multiple architectures (or backends) that allow for run-time-neutral scientific computing as well as chip-specific optimizations -- all while writing the same code. Algorithm developers and scientific engineers can write code for a Spark, Hadoop, or Flink cluster while keeping underlying computations that are platform-agnostic. A modern runtime for the JVM with the capability to work with GPUs lets engineers leverage the best parts of the production ecosystem without having to pick which scientific library to use.


Due to technical difficulties, Adam didn't get a chance to present at the last meetup. So we're having him speak at this one.

Tentative Schedule:

6:30pm-7:00pm -- socializing

7:00pm-7:15pm -- lightning talk

7:20pm - 8:20pm -- main talk

8:20pm - 9:00pm -- socializing