DeepSeek is experimenting with an OCR model, demonstrating that compressed images are more memory-friendly for GPU calculations than numerous text tokens.
Many company documents, available as PDFs, are often scanned, making it challenging to convert them to text while preserving their complex structure.
Images, tables, and graphics are common sources of errors, prompting a surge in OCR software relying on large language models (LLMs).
Chinese AI developer DeepSeek is releasing an experimental OCR model under the MIT license, following their Reasoning Model R1.
Even though it sounds simple, these documents can often only be converted to text with great effort.
DeepSeek's entry into the OCR field may be surprising, as it was not their core competence until now.
Author's summary: DeepSeek experiments with OCR model for large language models.