Unlimited OCR: One-shot long-horizon parsing(github.com)
475 points by ingve 1 day ago | 108 comments
tl;dr: Baidu released Unlimited-OCR, a document parsing model that extends DeepSeek-OCR for one-shot long-horizon parsing of single images, multi-page documents, and PDFs up to 32k tokens. It supports two image configurations (gundam and base) and ships with both Hugging Face Transformers and SGLang inference paths, including a batch script with concurrent requests against an OpenAI-compatible API. The model is available on Hugging Face and ModelScope, with an accompanying arXiv paper.
HN Discussion:
  • Technical appreciation for the architectural approach to solving KV cache memory issues in long document OCR
  • Enthusiasm for multi-page single-pass VLM OCR with interest in the attention mechanism design
  • Skepticism about OCR hallucinations and whether this model avoids inventing artifacts
  • Questioning how this compares to other OCR tools like Infinity Parser 2 or Mistral's offering
  • Curiosity about why companies open-source genuinely valuable software like this