addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

NLP & ML in Scala

  • Jun 18, 2014 · 6:30 PM
  • This location is shown only to members

ANNOUNCEMENT: Scala By the Bay registration is now open!

We're putting together a great meetup about Natural Language Processing and Machine Learning in Scala.

David Hall of Berkeley will talk about Scala NLP technologies such as Breeze and Epic.

Andrew McCallum will present FACTORIE, a probabilistic reasoning framework using Scala to achieve high expressiveness and high performance for machine learning tasks.

Andrew McCallum, FACTORIE: A Scala Library for Machine Learning, NLP and Knowledge Base Construction 

Practitioners in natural language processing, information integration, computer vision and other areas have achieved great empirical success using graphical models with repeated, relational structure.  As researchers explore increasingly complex structures, there has been growing interest in new programming languages or toolkits that make it easier to implement such models in a flexible, yet scalable way.

Our contribution to this goal is FACTORIE, a Scala library that combines (1) focus on factor graphs as a lingua franca for statistical modeling, (2) speed and scalability, with demonstrated success on problems with billions of variables and factors, and distributed
processing, (3) object-oriented definitions of random variables, factors, inference and learning methods---enabling easy modification through subclassing, as well as straightforward descent in layers of abstraction, (4) flexibility, supporting multiple modeling and
inference paradigms.

In this talk I will introduce FACTORIE, explain its basic
organizational structure, describe its modular approach to inference and learning, relate it to several other toolkits (such as GraphLab, scikit-learn, and alernative NLP toolkits), introduce its extensive natural language processing facilities, show several code examples, give a live demo, and answer your questions.

Andrew McCallum is a Professor and Director of the Information Extraction and Synthesis Laboratory in the School of Computer Science at University of Massachusetts Amherst.  This summer he is a Visiting
Research Scientist at Google.  He has published over 250 papers in many areas of AI, including natural language processing, machine learning, data mining and reinforcement learning, and his work has received over 35,000 citations.  He obtained his PhD from University of Rochester in 1995 with Dana Ballard and a postdoctoral fellowship from CMU with Tom Mitchell and Sebastian Thrun.  In the early 2000's he was Vice President of Research and Development at at WhizBang Labs, a 170-person start-up company that used machine learning for
information extraction from the Web.  He is a AAAI Fellow, the recipient of the UMass Chancellor's Award for Research and Creative Activity, the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and research awards from Google, IBM and
Microsoft.  He was the General Chair for the International Conference on Machine Learning (ICML) 2012, and is president-elect of the International Machine Learning Society, as well as member of the editorial board of the Journal of Machine Learning Research.  For the past ten years, McCallum has been active in research on statistical
machine learning applied to text, especially information extraction, entity resolution, semi-supervised learning, topic models, and social network analysis.  Work on probabilistic programming can be found at  Work on open peer review can be found at  McCallum's web page is

David Hall, ScalaNLP Epic

I'll introduce ScalaNLP Epic, which is a natural language processing library with models available for eight languages. I'll show how to use the library, and then drill down into how you can extend the system to build your own models while introducing some of the theory of machine learning for natural language processing. Along the way, I'll describe the relevant parts of the Breeze numerical computing library, and how Breeze and Scala make building these kinds of systems easier. 

David Hall is a Ph.D. student in EECS at UC Berkeley, where he works with Professor Dan Klein. He is the creator of the Breeze, Epic, and Puck libraries. His research interests are in natural language processing and machine learning, particularly syntactic parsing and computational historical linguistics. He has a B.S. and M.S. from Stanford University, both in Symbolic Systems. He is the recipient of the 2012 Google Ph.D. Fellowship in Natural Lanuguage Processing, the 2011 EECS Outstanding Graduate Student Instructor award, and a distinguished paper at EMNLP 2012. 

We need a video sponsor for this event.  We record every meetup ourselves and publish it on  We include sponsor logos in the recordings.  If you are in the NLP/ML space, this is a great opportunity to connect with the community -- contact [masked] for sponsorship.

Join or login to comment.

  • Soma Shekar O.

    Is the talk recorded?

    June 20, 2014

    • Steve C.

      Couldn't make it to this talk. Would love to see the recordings on functional tv. Thanks!

      June 23, 2014

    • Sze Ki P.

      It would be nice just upload the talk as is even in parts at this point

      August 30, 2014

  • Joseph T.

    As I mentioned, UPSHOT is hiring engineers who want to work on or learn NLP using Scala. We are building a semantic parser for English.

    You can reach me at joseph at upshotdata dot com

    Thanks again for the wonderful presentations.

    1 · June 19, 2014

  • David H.

    Hi everyone. I uploaded my slides here: Thanks for coming! And thanks to Tagged for hosting!

    4 · June 18, 2014

    • Jonathan E.

      Thanks for posting the slide deck. Breeze and Epic look great and I am definitely going to look for ways to include them in my projects.

      June 19, 2014

  • Louis C.

    top notch, highly useful talks. Many thanks to the speakers as well as organizers+Tagged for putting it together :)

    June 19, 2014

  • sherry-lynn l.

    very interesting presentation

    June 18, 2014

  • Kim

    Any news on recording?

    2 · June 18, 2014

  • Spondon S.

    Hi folks! Sorry, cant make it to this talk! Would love it if the speakers have any slides to share here. Thanks!

    June 18, 2014

  • A former member
    A former member

    Full details for both talks published -- David Hall will present Epic, and Andrew McCallum will present FACTORIE!

    June 16, 2014

  • David H.

    Hi everyone, I'm working on the talk and would like to solicit opinions on direction. Which of the following formats sounds best?

    1) Focus on Breeze (which is a more general API similar to numpy and scipy)

    2) Pick out an NLP task (probably named entity recognition), showing how to solve that task in Epic, and show how that solution is implemented (which will go through a lot of the APIs in Breeze as well as Epic.)

    3) focus on using pre-trained NLP models to do something interesting.

    4) something else?

    I'm leaning towards (2), but would love to hear your thoughts!

    June 7, 2014

    • David H.

      #2 it is. Thanks everyone!

      1 · June 7, 2014

    • Vineel Y.

      David. probably question-answering could be a topic of discussion here.

      June 14, 2014

Our Sponsors

  • AI By the Bay

    Review of AI technology an strategy, March 6-8, San Francisco

  • Twitter

    Awesome venue, food and drinks for our meetups!

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy