ANNOUNCEMENT: Scala By the Bay registration is now open!
We're putting together a great meetup about Natural Language Processing and Machine Learning in Scala.
David Hall of Berkeley will talk about Scala NLP technologies such as Breeze and Epic.
Andrew McCallum will present FACTORIE, a probabilistic reasoning framework using Scala to achieve high expressiveness and high performance for machine learning tasks.
Andrew McCallum, FACTORIE: A Scala Library for Machine Learning, NLP and Knowledge Base Construction
Practitioners in natural language processing, information integration, computer vision and other areas have achieved great empirical success using graphical models with repeated, relational structure. As researchers explore increasingly complex structures, there has been growing interest in new programming languages or toolkits that make it easier to implement such models in a flexible, yet scalable way.
Our contribution to this goal is FACTORIE, a Scala library that combines (1) focus on factor graphs as a lingua franca for statistical modeling, (2) speed and scalability, with demonstrated success on problems with billions of variables and factors, and distributed
processing, (3) object-oriented definitions of random variables, factors, inference and learning methods---enabling easy modification through subclassing, as well as straightforward descent in layers of abstraction, (4) flexibility, supporting multiple modeling and
In this talk I will introduce FACTORIE, explain its basic
organizational structure, describe its modular approach to inference and learning, relate it to several other toolkits (such as GraphLab, scikit-learn, and alernative NLP toolkits), introduce its extensive natural language processing facilities, show several code examples, give a live demo, and answer your questions.
Andrew McCallum is a Professor and Director of the Information Extraction and Synthesis Laboratory in the School of Computer Science at University of Massachusetts Amherst. This summer he is a Visiting
Research Scientist at Google. He has published over 250 papers in many areas of AI, including natural language processing, machine learning, data mining and reinforcement learning, and his work has received over 35,000 citations. He obtained his PhD from University of Rochester in 1995 with Dana Ballard and a postdoctoral fellowship from CMU with Tom Mitchell and Sebastian Thrun. In the early 2000's he was Vice President of Research and Development at at WhizBang Labs, a 170-person start-up company that used machine learning for
information extraction from the Web. He is a AAAI Fellow, the recipient of the UMass Chancellor's Award for Research and Creative Activity, the UMass NSM Distinguished Research Award, the UMass Lilly Teaching Fellowship, and research awards from Google, IBM and
Microsoft. He was the General Chair for the International Conference on Machine Learning (ICML) 2012, and is president-elect of the International Machine Learning Society, as well as member of the editorial board of the Journal of Machine Learning Research. For the past ten years, McCallum has been active in research on statistical
machine learning applied to text, especially information extraction, entity resolution, semi-supervised learning, topic models, and social network analysis. Work on probabilistic programming can be found at
http://factorie.cs.umass.edu. Work on open peer review can be found at http://openreview.net. McCallum's web page is http://www.cs.umass.edu/~mccallum.
David Hall, ScalaNLP Epic
I'll introduce ScalaNLP Epic, which is a natural language processing library with models available for eight languages. I'll show how to use the library, and then drill down into how you can extend the system to build your own models while introducing some of the theory of machine learning for natural language processing. Along the way, I'll describe the relevant parts of the Breeze numerical computing library, and how Breeze and Scala make building these kinds of systems easier.
David Hall is a Ph.D. student in EECS at UC Berkeley, where he works with Professor Dan Klein. He is the creator of the Breeze, Epic, and Puck libraries. His research interests are in natural language processing and machine learning, particularly syntactic parsing and computational historical linguistics. He has a B.S. and M.S. from Stanford University, both in Symbolic Systems. He is the recipient of the 2012 Google Ph.D. Fellowship in Natural Lanuguage Processing, the 2011 EECS Outstanding Graduate Student Instructor award, and a distinguished paper at EMNLP 2012.
We need a video sponsor for this event. We record every meetup ourselves and publish it on functional.tv. We include sponsor logos in the recordings. If you are in the NLP/ML space, this is a great opportunity to connect with the community -- contact [masked] for sponsorship.