Skip to content

LLama 3.1 Reading Group - Round 2 and Apple Foundation Models

Photo of Cosmin Negruseri
Hosted By
Cosmin N.
LLama 3.1 Reading Group - Round 2 and Apple Foundation Models

Details

# The Llama 3 Herd of Models - Round 2 and Apple Foundation Models

This Thursday we're doing a second round of review of the LLama 3.1 paper. We'll cover in more depth the post training part. In particular we'll look into math, code, reasoning, tool usage and dive in the cited papers for the respective sections.

Also Apple released a LLM training paper this week that covers a similar methodology.

Preparation:
There will not be a prepared presentation but more of a structured conversation.
Spend some time (an hour or two) ahead of the meeting diving into the resources so we can have a productive conversation.

Resources:
LLama 3.1 paper link
Apple paper https://machinelearning.apple.com/papers/apple_intelligence_foundation_language_models.pdf
Deepseek-coder-v2: Breaking the barrier of closed-source models in code intelligence, 2024. https://arxiv.org/abs/2406.11931
OpenAI Let's verify step by step https://arxiv.org/pdf/2305.20050
ReACT https://arxiv.org/pdf/2210.03629
Microsoft: Tora: A tool-integrated reasoning agent for mathematical problem solving. https://arxiv.org/pdf/2309.17452
Math-shepherd: Verify and reinforce llms step-by-step without human annotations. https://arxiv.org/abs/2312.08935
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning https://arxiv.org/pdf/2405.00451
Tool Verifier https://arxiv.org/abs/2402.14158
API Bank https://arxiv.org/pdf/2304.08244
Self-refine: Iterative refinement with self-feedback https://arxiv.org/abs/2303.17651
Generating sequences by learning to self-correct https://arxiv.org/pdf/2211.00053
Webgpt: Browser-assisted question-answering with human feedback https://arxiv.org/pdf/2112.09332
Toolformer: Language models can teach themselves to use tools https://arxiv.org/abs/2302.04761
DataComp-LM: In search of the next generation of training sets for language models https://arxiv.org/pdf/2406.11794
Gsm-plus: A comprehensive benchmark for evaluating the robustness of llms as mathematical problem solvers https://arxiv.org/pdf/2402.19255
Transformers Can Do Arithmetic with the Right Embeddings https://arxiv.org/pdf/2405.17399
Common 7b language models already possess strong math capabilities https://arxiv.org/pdf/2403.04706
Tora: A tool-integrated reasoning agent for mathematical problem solving https://arxiv.org/abs/2309.17452
PAL: Program-aided Language Models https://proceedings.mlr.press/v202/gao23f/gao23f.pdf
Evaluating Large Language Models Trained on Code https://arxiv.org/abs/2107.03374
Unsupervised Evaluation of Code LLMs with Round-Trip Correctness https://arxiv.org/abs/2402.08699
(Cohere) Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models https://arxiv.org/pdf/2404.18796
Training verifiers to solve math word problems https://arxiv.org/pdf/2110.14168

Photo of Deep Learning Study Group (San Francisco) group
Deep Learning Study Group (San Francisco)
See more events