Research

DeepSeek-AI Releases New 3-Billion-Parameter OCR Model

The new vision-language model, DeepSeek-OCR, was released and specializes in efficiently processing documents through optical compression.

Olivia Sharp 1 min read 657 views
Free
Chinese research lab DeepSeek-AI released DeepSeek-OCR, a new 3-billion-parameter vision-language model that specializes in high-performance document processing and optical character recognition.

A Specialized Vision-Language Model

Chinese research lab DeepSeek-AI released DeepSeek-OCR, a new 3-billion-parameter vision-language model designed for "contexts optical compression." The model specializes in efficiently processing images of documents, such as PDFs and scans, by compressing their visual details into a small number of "vision tokens."

This approach allows a language model to extract text, layouts, and formulas with high accuracy while using fewer computational resources than comparable systems. According to the company's technical report, the system can achieve 97% decoding precision at a 10x compression ratio.

Technical Components

The DeepSeek-OCR model consists of two primary components that …

Archive Access

This article is older than 24 hours. Create a free account to access our 7-day archive.

Share this article

Related Articles