Doorgaan naar de inhoud

SEA: Preventing Harmful Language Generation

Foto van Maartje ter Hoeve
Hosted By
Maartje ter H. en Maurits B.
SEA: Preventing Harmful Language Generation

Details

*** IMPORTANT: You will be able to view the Zoom link once you 'attend' the meetup on this page. ***

** 17:00 - 17:30 - Emily Sheng, University of Southern California **

Reducing Harms in Language Generation

Technology for natural language generation (NLG) has advanced rapidly, spurred by advancements in pre-training large models on massive amounts of data and the need for intelligent agents to communicate in a natural manner. While techniques can effectively generate fluent text, they can also produce undesirable societal biases that can have a disproportionately negative impact on already marginalized populations. In this talk, I emphasize the need for techniques to make language generation applications more fair and inclusive.

Specifically, I focus on ad hominem attacks in dialogue responses, which are those that target some feature of a person's character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person's credibility. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient n-gram similarity as a soft constraint for top-k sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.

** 17:30 - 18:00 - Emily Dinan, Facebook AI Research **
Safety for Open-Domain Dialogue Agents

Models trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which include offensive or otherwise toxic behavior and unwanted biases. In this talk, I will discuss some recent work investigating methods for mitigating these issues in the context of open-domain generative dialogue models. Among other methods, I will introduce new human-and-model-in-the-loop framework for both training safer models and for evaluating them. Finally, I will discuss some limitations of this work, as well as next steps for this line of research.

Photo of SEA: Search Engines Amsterdam group
SEA: Search Engines Amsterdam
Meer evenementen bekijken