nielsr HF Staff commited on
Commit
e6a6435
·
verified ·
1 Parent(s): ca7fe48

Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README

Browse files

This PR significantly improves the model card for `rednote-hilab/dots.ocr` by:

* **Updating `library_name`**: Changed the `library_name` in the metadata from `dots_ocr` to `transformers`. This is crucial as the model uses `transformers.AutoModelForCausalLM` and `transformers.AutoProcessor`, enabling the "How to use" widget on the Hub for easier adoption.
* **Adding prominent links**: Introduced new badges at the top for the paper, GitHub repository, and live demo (project page) for better discoverability. The existing live demo link in the text has been replaced by the badge. The `X` (Twitter) link from the GitHub README has also been added.
* **Syncing content with GitHub README**:
* Updated the "News" section with the latest release information.
* Revised the "Download Model Weights" section to include the ModelScope option.
* Refreshed the "vLLM inference" instructions under "Deployment" to reflect official vLLM integration (v0.11.0+) and simplified usage.
* Added a new "Huggingface inference with CPU" section.
* Updated the "Document Parse" section with the correct `--num_thread` argument and instructions for Transformers-based parsing.

These changes ensure the model card is up-to-date, more accurate, and more user-friendly, providing clearer guidance for researchers and users.

Files changed (1) hide show
  1. README.md +29 -24
README.md CHANGED
@@ -1,6 +1,10 @@
1
  ---
 
 
 
 
 
2
  license: mit
3
- library_name: dots_ocr
4
  pipeline_tag: image-text-to-text
5
  tags:
6
  - image-to-text
@@ -11,10 +15,6 @@ tags:
11
  - formula
12
  - transformers
13
  - custom_code
14
- language:
15
- - en
16
- - zh
17
- - multilingual
18
  ---
19
 
20
  <div align="center">
@@ -27,14 +27,17 @@ language:
27
  dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
28
  </h1>
29
 
30
- [![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
 
 
31
  [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
 
32
 
33
 
34
  <div align="center">
35
- <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
36
  <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
37
- <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
 
38
  </div>
39
 
40
  </div>
@@ -138,6 +141,7 @@ print(output_text)
138
 
139
 
140
  ## News
 
141
  * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
142
 
143
 
@@ -433,7 +437,6 @@ print(output_text)
433
  <td>0.100</td>
434
  <td>0.185</td>
435
  </tr>
436
- <tr>
437
 
438
  <td rowspan="5"><strong>General<br>VLMs</strong></td>
439
  <td>GPT4o</td>
@@ -1113,28 +1116,23 @@ pip install -e .
1113
  > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
1114
  ```shell
1115
  python3 tools/download_model.py
 
 
 
1116
  ```
1117
 
1118
 
1119
  ## 2. Deployment
1120
  ### vLLM inference
1121
- We highly recommend using vllm for deployment and inference. All of our evaluations results are based on vllm version 0.9.1.
1122
- The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
1123
 
1124
  ```shell
1125
- # You need to register model to vllm at first
1126
- python3 tools/download_model.py
1127
- export hf_model_path=./weights/DotsOCR # Path to your downloaded model weights, Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
1128
- export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
1129
- sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
1130
- from DotsOCR import modeling_dots_ocr_vllm' `which vllm` # If you downloaded model weights by yourself, please replace `DotsOCR` by your model saved directory name, and remember to use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`)
1131
-
1132
- # launch vllm server
1133
- CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95 --chat-template-content-format string --served-model-name model --trust-remote-code
1134
 
1135
- # If you get a ModuleNotFoundError: No module named 'DotsOCR', please check the note above on the saved model directory name.
1136
-
1137
- # vllm api demo
1138
  python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
1139
  ```
1140
 
@@ -1226,6 +1224,10 @@ print(output_text)
1226
 
1227
  </details>
1228
 
 
 
 
 
1229
  ## 3. Document Parse
1230
  **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
1231
  ```bash
@@ -1234,7 +1236,7 @@ print(output_text)
1234
  # Parse a single image
1235
  python3 dots_ocr/parser.py demo/demo_image1.jpg
1236
  # Parse a single PDF
1237
- python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_threads 64 # try bigger num_threads for pdf with a large number of pages
1238
 
1239
  # Layout detection only
1240
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
@@ -1246,6 +1248,9 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
1246
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
1247
 
1248
  ```
 
 
 
1249
 
1250
  <details>
1251
  <summary><b>Output Results</b></summary>
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
+ - multilingual
6
+ library_name: transformers
7
  license: mit
 
8
  pipeline_tag: image-text-to-text
9
  tags:
10
  - image-to-text
 
15
  - formula
16
  - transformers
17
  - custom_code
 
 
 
 
18
  ---
19
 
20
  <div align="center">
 
27
  dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
28
  </h1>
29
 
30
+ [![Paper](https://img.shields.io/badge/Paper-2512.02498-b31b1b.svg)](https://huggingface.co/papers/2512.02498)
31
+ [![Code](https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr)
32
+ [![Project Page](https://img.shields.io/badge/Project_Page-Live_Demo-blue)](https://dotsocr.xiaohongshu.com)
33
  [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
34
+ [![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
35
 
36
 
37
  <div align="center">
 
38
  <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
39
+ <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
40
+ <a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
41
  </div>
42
 
43
  </div>
 
141
 
142
 
143
  ## News
144
+ * ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
145
  * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
146
 
147
 
 
437
  <td>0.100</td>
438
  <td>0.185</td>
439
  </tr>
 
440
 
441
  <td rowspan="5"><strong>General<br>VLMs</strong></td>
442
  <td>GPT4o</td>
 
1116
  > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
1117
  ```shell
1118
  python3 tools/download_model.py
1119
+
1120
+ # with modelscope
1121
+ python3 tools/download_model.py --type modelscope
1122
  ```
1123
 
1124
 
1125
  ## 2. Deployment
1126
  ### vLLM inference
1127
+ We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
 
1128
 
1129
  ```shell
1130
+ # Launch vLLM model server
1131
+ vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
 
 
 
 
 
 
 
1132
 
1133
+ # vLLM API Demo
1134
+ # See dots_ocr/model/inference.py for details on parameter and prompt settings
1135
+ # that help achieve the best output quality.
1136
  python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
1137
  ```
1138
 
 
1224
 
1225
  </details>
1226
 
1227
+ ### Hugginface inference with CPU
1228
+ Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536)
1229
+
1230
+
1231
  ## 3. Document Parse
1232
  **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
1233
  ```bash
 
1236
  # Parse a single image
1237
  python3 dots_ocr/parser.py demo/demo_image1.jpg
1238
  # Parse a single PDF
1239
+ python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_thread 64 # try bigger num_threads for pdf with a large number of pages
1240
 
1241
  # Layout detection only
1242
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
 
1248
  python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
1249
 
1250
  ```
1251
+ **Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
1252
+
1253
+ > Notice: transformers is slower than vllm, if you want to use demo/* with transformers,just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
1254
 
1255
  <details>
1256
  <summary><b>Output Results</b></summary>