Skip to content

LLama 3.1 Release Reading Group

Photo of Cosmin Negruseri
Hosted By
Cosmin N.
LLama 3.1 Release Reading Group

Details

# The Llama 3 Herd of Models

Abstract:
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

This Wednesday we're doing an emergency reading group to go over the LLama 3.1 paper. We'll cover the main advances in model architecture, data cleaning, pre training, post training recipes, infrastructure improvements and others. It's going to be an informal discussion, skim the paper ahead of time and feel free to contribute during the meeting.

Cosmin Negruseri (currently founder at stealth startup and perviously Staff ML Engineer on Pinterest Search will lead the conversation. The paper is brand new so we'll all share our early thoughts.

Resources:
Blog link

Photo of Deep Learning Study Group (San Francisco) group
Deep Learning Study Group (San Francisco)
See more events