[Paper Reading]: DeepSeek-OCR: Contexts Optical Compression

Name: [Paper Reading]: DeepSeek-OCR: Contexts Optical Compression
Start: 2025-11-05T19:00:00-08:00
End: 2025-11-05T21:00:00-08:00

Hosted by Samantha D.

Meet the group

SupportVectors: Generative AI, LLMs, Machine Learning

No reviews yet

Details

This week, we will walk through and discuss the paper: DeepSeek-OCR: Contexts Optical Compression [https://www.arxiv.org/pdf/2510.18234]

Abstract of the Paper:
We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs. Beyond this, DeepSeek-OCR also demonstrates high practical value. On OmniDocBench, it surpasses GOT-OCR2.0 (256 tokens/page) using only 100 vision tokens, and outperforms MinerU2.0 (6000+ tokens per page on average) while utilizing fewer than 800 vision tokens. In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G). Codes and model weights are publicly accessible at https://github.com/deepseek-ai/DeepSeek-OCR

-----------------
We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every Wednesday evening at PM PST, we hold our research paper reading seminar covering an AI topic. One member carefully explains the paper, making it more accessible to a broader audience. Then, we follow this reading with a more informal discussion and socializing.
You are welcome to join this in person or over Zoom. SupportVectors is an AI training lab located in Fremont, CA, close to Tesla and easily accessible by road and BART. We follow the weekly sessions with snacks, soft drinks, and informal discussions.
If you want to attend by Zoom, the Zoom registration link will be visible once you RSVP. Note that we have had to change and add security to the Zoom link to prevent Zoom bombing.

Events in Fremont, CA

Artificial Intelligence

Machine Intelligence

Machine Learning

[Paper Reading]: DeepSeek-OCR: Contexts Optical Compression

SupportVectors: Generative AI, LLMs, Machine Learning

Details

Members are also interested in