Paris NLP Season 5 Meetup #5

Julien LAUNAY - LightOn


Extreme-scale models with billions of parameters (GPT-3, T5, etc.) have garnered increased interest over the past year, demonstrating unique few-shot learning abilities.
We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained to date for the French language.
We also explore some of the motivations behind extreme-scale models: from the surprising finding of scaling laws that larger models are more efficient, to recent research on prompt-tuning questioning the established fine-tuning paradigm.

