rednote-hilab/dots.ocr.base · Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README

Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README

by nielsr HF Staff - opened 11 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+29

-24

Files changed (1) hide show

README.md +29 -24

README.md CHANGED Viewed

@@ -1,6 +1,10 @@
 ---
 license: mit
-library_name: dots_ocr
 pipeline_tag: image-text-to-text
 tags:
 - image-to-text
@@ -11,10 +15,6 @@ tags:
 - formula
 - transformers
 - custom_code
-language:
-- en
-- zh
-- multilingual
 ---
 <div align="center">
@@ -27,14 +27,17 @@ language:
 dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
 </h1>
-[![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
 [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
 <div align="center">
-  <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
   <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
-  <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
 </div>
 </div>
@@ -138,6 +141,7 @@ print(output_text)
 ## News
 * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
@@ -433,7 +437,6 @@ print(output_text)
 <td>0.100</td>
 <td>0.185</td>
 </tr>
-<tr>
 <td rowspan="5"><strong>General<br>VLMs</strong></td>
 <td>GPT4o</td>
@@ -1113,28 +1116,23 @@ pip install -e .
 > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
 ```shell
 python3 tools/download_model.py
 ```
 ## 2. Deployment
 ### vLLM inference
-We highly recommend using vllm for deployment and inference. All of our evaluations results are based on vllm version 0.9.1.
-The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
 ```shell
-# You need to register model to vllm at first
-python3 tools/download_model.py
-export hf_model_path=./weights/DotsOCR  # Path to your downloaded model weights, Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
-export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
-sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
-from DotsOCR import modeling_dots_ocr_vllm' `which vllm`  # If you downloaded model weights by yourself, please replace `DotsOCR` by your model saved directory name, and remember to use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`)
-# launch vllm server
-CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95  --chat-template-content-format string --served-model-name model --trust-remote-code
-# If you get a ModuleNotFoundError: No module named 'DotsOCR', please check the note above on the saved model directory name.
-# vllm api demo
 python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
 ```
@@ -1226,6 +1224,10 @@ print(output_text)
 </details>
 ## 3. Document Parse
 **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
 ```bash
@@ -1234,7 +1236,7 @@ print(output_text)
 # Parse a single image
 python3 dots_ocr/parser.py demo/demo_image1.jpg
 # Parse a single PDF
-python3 dots_ocr/parser.py demo/demo_pdf1.pdf  --num_threads 64  # try bigger num_threads for pdf with a large number of pages
 # Layout detection only
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
@@ -1246,6 +1248,9 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
 ```
 <details>
 <summary><b>Output Results</b></summary>

 ---
+language:
+- en
+- zh
+- multilingual
+library_name: transformers
 license: mit
 pipeline_tag: image-text-to-text
 tags:
 - image-to-text
 - formula
 - transformers
 - custom_code
 ---
 <div align="center">
 dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
 </h1>
+[![Paper](https://img.shields.io/badge/Paper-2512.02498-b31b1b.svg)](https://huggingface.co/papers/2512.02498)
+[![Code](https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr)
+[![Project Page](https://img.shields.io/badge/Project_Page-Live_Demo-blue)](https://dotsocr.xiaohongshu.com)
 [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
+[![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
 <div align="center">
   <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
+  <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
+  <a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
 </div>
 </div>
 ## News
+* ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
 * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
 <td>0.100</td>
 <td>0.185</td>
 </tr>
 <td rowspan="5"><strong>General<br>VLMs</strong></td>
 <td>GPT4o</td>
 > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
 ```shell
 python3 tools/download_model.py
+# with modelscope
+python3 tools/download_model.py --type modelscope
 ```
 ## 2. Deployment
 ### vLLM inference
+We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
 ```shell
+# Launch vLLM model server
+vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
+# vLLM API Demo
+# See dots_ocr/model/inference.py for details on parameter and prompt settings
+# that help achieve the best output quality.
 python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
 ```
 </details>
+### Hugginface inference with CPU
+Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536)
 ## 3. Document Parse
 **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
 ```bash
 # Parse a single image
 python3 dots_ocr/parser.py demo/demo_image1.jpg
 # Parse a single PDF
+python3 dots_ocr/parser.py demo/demo_pdf1.pdf  --num_thread 64  # try bigger num_threads for pdf with a large number of pages
 # Layout detection only
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
 ```
+**Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
+> Notice: transformers is slower than vllm, if you want to use demo/* with transformers，just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
 <details>
 <summary><b>Output Results</b></summary>