7:00 An Introduction to ANTLR
7:45 Automatically Linking Structured and Unstructured Data
8:30 Open Discussion
An Introduction to ANTLR
President, Jazillian Inc
Automatic machine translation and language parsing. ANTLR, Another Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages. The SPARQL query language grammar for ANTLR v3 was recently updated to version 1.1 and provides an implementation of the W3C SPARQL grammar specification.
Automatically Linking Structured and Unstructured Data: Connecting Databases to Text
President, Alias-i Inc
Natural language processing for text analytics, text data mining and search. LingPipe is a state-of-the-art suite of natural language processing tools written in Java that performs tokenization, sentence detection, named entity detection, coreference resolution, classification, clustering, part-of-speech tagging, general chunking, fuzzy dictionary matching. These general tools support a range of applications.
Breck will discuss the thorny problem of linking entities in a database to text mentions of those entities.
The challenges are:
- The John Smith problem: You have a text mention of "John Smith" and many possible John Smiths in the database. How to pick?
- The name variant problem: Your database has an incomplete list of aliases for a gene. Serpina3 has the alias 'ACT', but is also called 'AACT' in the literature but you don't know that.
- The new entity problem: You want to discover new performers when they show up in your entertainment text sources. Those new performers are not in your database yet, how is that handled?
Breck will discuss how you can approach these problems using the LingPipe suite of tools in context of entertainment news and bioinformatics.