CLD: Software for Language Documentation

This is a past event

14 people went


Dr. Steven Abney, Linguistics Professor at the University of Michigan, will be presenting on software he created for language documentation.

"I'd like to describe an application for language documentation,
specifically, entry of text and audio, transcription, and
translation. It provides tight integration of audio, text, and
lexicon; in fact, the lexicon is automatically constructed from the

In some ways the real question is not what CLD is, but why I wrote it.
I describe a linguistic approach that I call "inductive general
grammar," or, more casually, "linguistics with a computational
attitude," in which the big question is how one can automatically
learn a complete language. The first order of business is the
collection of a large training set -- standard operating procedure in
computational linguistics but a novel idea for linguistics. In
this case, though, the items in the training set are entire languages.

The current rate of language loss gives the matter urgency. The
Universal Dependencies treebanks are a terrific resource, but they
only touch 1% of the world's languages. How can we accelerate the
collection of data? The idea behind CLD is to do so via a mutually
beneficial collaboration with speaker communities. Transmission to
the next generation is a major issue, and CLD aims to provide a
self-study complement to immersion learning.

This will not be the typical A2D-NLP talk: it will be mostly about
linguistics and a user interface. But the next steps pose some
fascinating challenges for computational linguistics, such as
automatic phonetic transcription and automatic translation-lexicon