Location visible to members
The New York Times Company invites you, the New York Semantic Web Meetup to join Evan Sandhaus, Semantic Technologist, at The New York Times Building. Evan will present his work on The New York Times Annotated Corpus.
The New York Times Annotated Corpus is a collection of over 1.8 million articles annotated with rich metadata published by The New York Times between January 1, 1987 and July 19, 2007. With over 650,000 individually written summaries and 1.5 million manually tagged articles, it is my hope that The New York Times Annotated Corpus will prove to be a valuable resource for a number of natural language processing research areas, including document summarization, document categorization and automatic content extraction.
The New York Times is releasing the data in conjunction with the Linguistic Data Consortium. Details for obtaining the corpus can be found on their website at:
The New York Times
Marco Neumann, KONA
Introduction to R&D at the New York Times
Gregg Fenton - Director of Emerging Platforms at The New York Times
The New York Times Corpus
Evan Sandhaus - Semantic Technologist
Kristi Reilly - Information Architect
The use of Microformats in Production at the New York Times
Andrei Scheinkman - Software Engineer
Go to Attendee List