Increasing The Pace Of Annotations – AI With A Human In The Loop


Details
We know that dataset design has a huge impact on the quality of our models. We also know that obtaining enough high-quality labeled data is difficult, time consuming, and oftentimes expensive. When it comes to manual labeling, it’s essential to put in the effort where it counts. In this talk, we present a system we developed at Gong for interactive labeling and simultaneous training of text classifiers. To make every label count, we leverage Unsupervised Retrieval and Active Learning algorithms: Active learning methods aim to reduce the sample complexity by selecting which samples to label. Sentence embeddings are used for effective retrieval based on semantic similarity to ‘increase the signal-to-noise ratio’, i.e., retrieve a pool of samples that are likely to be associated with the positive class. Finally, by incorporating efficient sampling methods we boost diversity in the dataset. This system is used internally to build our text classifiers. Moreover, it is efficient enough in low-resource settings that it allows our users, who are typically not data savvy, to build their own classifiers. This framework can be implemented in other domains, such as the medical domain.

Increasing The Pace Of Annotations – AI With A Human In The Loop