DeepSeek-AI Releases New 3-Billion-Parameter OCR Model
The new vision-language model, DeepSeek-OCR, was released and specializes in efficiently processing documents through optical compression.
A Specialized Vision-Language Model
Chinese research lab DeepSeek-AI released DeepSeek-OCR, a new 3-billion-parameter vision-language model designed for "contexts optical compression." The model specializes in efficiently processing images of documents, such as PDFs and scans, by compressing their visual details into a small number of "vision tokens."
This approach allows a language model to extract text, layouts, and formulas with high accuracy while using fewer computational resources than comparable systems. According to the company's technical report, the system can achieve 97% decoding precision at a 10x compression ratio.
Technical Components
The DeepSeek-OCR model consists of two primary components that …
Archive Access
This article is older than 24 hours. Create a free account to access our 7-day archive.