Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +29 -24
README.md CHANGED
@@ -1,6 +1,10 @@
1
  ---
 
 
 
 
 
2
  license: mit
3
- library_name: dots_ocr
4
  pipeline_tag: image-text-to-text
5
  tags:
6
  - image-to-text
@@ -11,10 +15,6 @@ tags:
11
  - formula
12
  - transformers
13
  - custom_code
14
- language:
15
- - en
16
- - zh
17
- - multilingual
18
  ---
19
 
20
  <div align="center">
@@ -27,14 +27,17 @@ language:
27
  dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
28
  </h1>
29
 
30
- [![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
 
 
31
  [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
 
32
 
33
 
34
  <div align="center">
35
- <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
36
  <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
37
- <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
 
38
  </div>
39
 
40
  </div>
@@ -138,6 +141,7 @@ print(output_text)
138
 
139
 
140
  ## News
 
141
  * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
142
 
143
 
@@ -433,7 +437,6 @@ print(output_text)
433
  <td>0.100</td>
434
  <td>0.185</td>
435
  </tr>
436
- <tr>
437
 
438
  <td rowspan="5"><strong>General<br>VLMs</strong></td>
439
  <td>GPT4o</td>
@@ -1113,28 +1116,23 @@ pip install -e .
1113
  > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
1114
  ```shell
1115
  python3 tools/download_model.py
 
 
 
1116
  ```
1117
 
1118
 
1119
  ## 2. Deployment
1120
  ### vLLM inference
1121
- We highly recommend using vllm for deployment and inference. All of our evaluations results are based on vllm version 0.9.1.
1122
- The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
1123
 
1124
  ```shell
1125
- # You need to register model to vllm at first
1126
- python3 tools/download_model.py
1127
- export hf_model_path=./weights/DotsOCR # Path to your downloaded model weights, Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
1128
- export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
1129
- sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
1130
- from DotsOCR import modeling_dots_ocr_vllm' `which vllm` # If you downloaded model weights by yourself, please replace `DotsOCR` by your model saved directory name, and remember to use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`)
1131
-
1132
- # launch vllm server
1133
- CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95 --chat-template-content-format string --served-model-name model --trust-remote-code
1134
 
1135
- # If you get a ModuleNotFoundError: No module named 'DotsOCR', please check the note above on the saved model directory name.
1136
-
1137
- # vllm api demo
1138
  python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
1139
  ```
1140
 
@@ -1226,6 +1224,10 @@ print(output_text)
1226
 
1227
  </details>
1228
 
 
 
 
 
1229
  ## 3. Document Parse
1230
  **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
1231
  ```bash
@@ -1234,7 +1236,7 @@ print(output_text)
1234
  # Parse a single image
1235
  python3 dots_ocr/parser.py demo/demo_image1.jpg
1236
  # Parse a single PDF
1237
- python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_threads 64 # try bigger num_threads for pdf with a large number of pages
1238
 
1239
  # Layout detection only
1240
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
@@ -1246,6 +1248,9 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
1246
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
1247
 
1248
  ```
 
 
 
1249
 
1250
  <details>
1251
  <summary><b>Output Results</b></summary>
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
+ - multilingual
6
+ library_name: transformers
7
  license: mit
 
8
  pipeline_tag: image-text-to-text
9
  tags:
10
  - image-to-text
 
15
  - formula
16
  - transformers
17
  - custom_code
 
 
 
 
18
  ---
19
 
20
  <div align="center">
 
27
  dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
28
  </h1>
29
 
30
+ [![Paper](https://img.shields.io/badge/Paper-2512.02498-b31b1b.svg)](https://huggingface.co/papers/2512.02498)
31
+ [![Code](https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr)
32
+ [![Project Page](https://img.shields.io/badge/Project_Page-Live_Demo-blue)](https://dotsocr.xiaohongshu.com)
33
  [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
34
+ [![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
35
 
36
 
37
  <div align="center">
 
38
  <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
39
+ <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
40
+ <a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
41
  </div>
42
 
43
  </div>
 
141
 
142
 
143
  ## News
144
+ * ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
145
  * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
146
 
147
 
 
437
  <td>0.100</td>
438
  <td>0.185</td>
439
  </tr>
 
440
 
441
  <td rowspan="5"><strong>General<br>VLMs</strong></td>
442
  <td>GPT4o</td>
 
1116
  > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
1117
  ```shell
1118
  python3 tools/download_model.py
1119
+
1120
+ # with modelscope
1121
+ python3 tools/download_model.py --type modelscope
1122
  ```
1123
 
1124
 
1125
  ## 2. Deployment
1126
  ### vLLM inference
1127
+ We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
 
1128
 
1129
  ```shell
1130
+ # Launch vLLM model server
1131
+ vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
 
 
 
 
 
 
 
1132
 
1133
+ # vLLM API Demo
1134
+ # See dots_ocr/model/inference.py for details on parameter and prompt settings
1135
+ # that help achieve the best output quality.
1136
  python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
1137
  ```
1138
 
 
1224
 
1225
  </details>
1226
 
1227
+ ### Hugginface inference with CPU
1228
+ Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536)
1229
+
1230
+
1231
  ## 3. Document Parse
1232
  **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
1233
  ```bash
 
1236
  # Parse a single image
1237
  python3 dots_ocr/parser.py demo/demo_image1.jpg
1238
  # Parse a single PDF
1239
+ python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_thread 64 # try bigger num_threads for pdf with a large number of pages
1240
 
1241
  # Layout detection only
1242
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
 
1248
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
1249
 
1250
  ```
1251
+ **Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
1252
+
1253
+ > Notice: transformers is slower than vllm, if you want to use demo/* with transformers,just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
1254
 
1255
  <details>
1256
  <summary><b>Output Results</b></summary>