Generative AI Paper Reading Log Linear Attention

Name: Generative AI Paper Reading Log Linear Attention
Start: 2025-07-14T17:30:00-07:00
End: 2025-07-14T19:30:00-07:00

Hosted By

Matt W.

Generative AI Paper Reading Log Linear Attention

Details

Join us for a paper discussion on "Log-Linear Attention" presented by Evelyn
Exploring a new attention mechanism that balances efficiency and expressiveness for long-sequence modeling
Featured Paper:
"Log-Linear Attention" (Guo et al., 2024)
arXiv Paper
Discussion Topics:
Motivation & Background

Standard softmax attention in Transformers: quadratic compute, linear memory—limits scalability for long sequences
Linear/state-space models: enable linear-time, constant-memory, but rely on a fixed-size hidden state (RNN-like), limiting context modeling
Need for an approach that is both efficient and expressive, especially for long-context tasks

Log-Linear Attention Mechanism

Maintains a set of hidden states that grows logarithmically with sequence length (vs. fixed-size in linear attention)
Uses Fenwick tree–based (hierarchical) partitioning to summarize past context at multiple temporal scales
Enables O(T log T) compute and O(log T) memory for decoding; supports parallel, matmul-rich training
Generalizes existing linear attention models and can be applied to architectures like Mamba-2 and Gated DeltaNet

Implementation & Algorithm

Chunkwise parallel scan algorithm for efficient training
Hierarchical masking matrix structure (quasi-H matrix) enables low-rank, blockwise computation
Custom Triton kernel implementation outperforms FlashAttention-2 for long sequences

Performance Benchmarks

| Model/Variant | Throughput (tokens/s) | Training Runtime (ms) | Memory Usage |
| ------------- | --------------------- | --------------------- | ------------ |
| FlashAttention-2 | Baseline | O(T²) | O(T) |
| Mamba-2 | Linear, O(T) | O(T) | O(1) |
| Log-Linear Mamba-2 | O(T log T) | O(log T) | O(log T) |
| Gated DeltaNet | Linear, O(T) | O(T) | O(1) |
| Log-Linear Gated DeltaNet | O(T log T) | O(log T) | O(log T) |

Log-linear variants consistently outperform linear counterparts on synthetic recall tasks, language modeling (perplexity), and long-context retrieval
Improved per-position loss and recall on "Needle-In-A-Haystack" and real-world benchmarks at long sequence lengths

Implementation Challenges

Efficient hierarchical memory management for chunked computation
Balancing expressiveness (multi-scale context) with computational cost
Integrating log-linear attention into diverse model architectures

Key Technical Features

Logarithmic growth of hidden states with sequence length
Matmul-friendly parallelization for hardware efficiency
Less than 3% parameter increase over baseline models
Compatible with modern accelerators (GPU/TPU) and existing linear attention frameworks

Future Directions

Applying log-linear attention to other state-space and convolutional models
Further optimizing hierarchical memory structures for even longer contexts
Exploring applications in domains requiring efficient long-sequence modeling (e.g., genomics, document understanding)

---
Silicon Valley Generative AI has two meeting formats.

1. Paper Reading - Every second week we meet to discuss machine learning papers. This is a collaboration between Silicon Valley Generative AI and Boulder Data Science.

2. Talks - Once a month we meet to have someone present on a topic related to generative AI. Speakers can range from industry leaders, researchers, startup founders, subject matter experts and those with an interest in a topic and would like to share. Topics vary from technical to business focused. They can be on how the latest in generative models work and how they can be used, applications and adoption of generative AI, demos of projects and startup pitches or legal and ethical topics. The talks are meant to be inclusive and for a more general audience compared to the paper readings.

If you would like to be a speaker please contact:
Matt White

Events in

Silicon Valley Generative AI – A GenAI Collective Member

See more events

Silicon Valley Generative AI – A GenAI Collective Member

public group

Every 2 weeks on Monday

Online event

Link visible for attendees

Silicon Valley Generative AI – A GenAI Collective Member

public group

Generative AI Paper Reading Log Linear Attention

FREE