Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README
Browse filesThis PR significantly improves the model card for `rednote-hilab/dots.ocr` by:
* **Updating `library_name`**: Changed the `library_name` in the metadata from `dots_ocr` to `transformers`. This is crucial as the model uses `transformers.AutoModelForCausalLM` and `transformers.AutoProcessor`, enabling the "How to use" widget on the Hub for easier adoption.
* **Adding prominent links**: Introduced new badges at the top for the paper, GitHub repository, and live demo (project page) for better discoverability. The existing live demo link in the text has been replaced by the badge. The `X` (Twitter) link from the GitHub README has also been added.
* **Syncing content with GitHub README**:
* Updated the "News" section with the latest release information.
* Revised the "Download Model Weights" section to include the ModelScope option.
* Refreshed the "vLLM inference" instructions under "Deployment" to reflect official vLLM integration (v0.11.0+) and simplified usage.
* Added a new "Huggingface inference with CPU" section.
* Updated the "Document Parse" section with the correct `--num_thread` argument and instructions for Transformers-based parsing.
These changes ensure the model card is up-to-date, more accurate, and more user-friendly, providing clearer guidance for researchers and users.
|
@@ -1,6 +1,10 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: mit
|
| 3 |
-
library_name: dots_ocr
|
| 4 |
pipeline_tag: image-text-to-text
|
| 5 |
tags:
|
| 6 |
- image-to-text
|
|
@@ -11,10 +15,6 @@ tags:
|
|
| 11 |
- formula
|
| 12 |
- transformers
|
| 13 |
- custom_code
|
| 14 |
-
language:
|
| 15 |
-
- en
|
| 16 |
-
- zh
|
| 17 |
-
- multilingual
|
| 18 |
---
|
| 19 |
|
| 20 |
<div align="center">
|
|
@@ -27,14 +27,17 @@ language:
|
|
| 27 |
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
| 28 |
</h1>
|
| 29 |
|
| 30 |
-
[](https://huggingface.co/rednote-hilab/dots.ocr)
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
<div align="center">
|
| 35 |
-
<a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
|
| 36 |
<a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
| 37 |
-
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
|
|
|
|
| 38 |
</div>
|
| 39 |
|
| 40 |
</div>
|
|
@@ -138,6 +141,7 @@ print(output_text)
|
|
| 138 |
|
| 139 |
|
| 140 |
## News
|
|
|
|
| 141 |
* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
|
| 142 |
|
| 143 |
|
|
@@ -433,7 +437,6 @@ print(output_text)
|
|
| 433 |
<td>0.100</td>
|
| 434 |
<td>0.185</td>
|
| 435 |
</tr>
|
| 436 |
-
<tr>
|
| 437 |
|
| 438 |
<td rowspan="5"><strong>General<br>VLMs</strong></td>
|
| 439 |
<td>GPT4o</td>
|
|
@@ -1113,28 +1116,23 @@ pip install -e .
|
|
| 1113 |
> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1114 |
```shell
|
| 1115 |
python3 tools/download_model.py
|
|
|
|
|
|
|
|
|
|
| 1116 |
```
|
| 1117 |
|
| 1118 |
|
| 1119 |
## 2. Deployment
|
| 1120 |
### vLLM inference
|
| 1121 |
-
We highly recommend using
|
| 1122 |
-
The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
|
| 1123 |
|
| 1124 |
```shell
|
| 1125 |
-
#
|
| 1126 |
-
|
| 1127 |
-
export hf_model_path=./weights/DotsOCR # Path to your downloaded model weights, Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1128 |
-
export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
|
| 1129 |
-
sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
|
| 1130 |
-
from DotsOCR import modeling_dots_ocr_vllm' `which vllm` # If you downloaded model weights by yourself, please replace `DotsOCR` by your model saved directory name, and remember to use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`)
|
| 1131 |
-
|
| 1132 |
-
# launch vllm server
|
| 1133 |
-
CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95 --chat-template-content-format string --served-model-name model --trust-remote-code
|
| 1134 |
|
| 1135 |
-
#
|
| 1136 |
-
|
| 1137 |
-
#
|
| 1138 |
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
| 1139 |
```
|
| 1140 |
|
|
@@ -1226,6 +1224,10 @@ print(output_text)
|
|
| 1226 |
|
| 1227 |
</details>
|
| 1228 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1229 |
## 3. Document Parse
|
| 1230 |
**Based on vLLM server**, you can parse an image or a pdf file using the following commands:
|
| 1231 |
```bash
|
|
@@ -1234,7 +1236,7 @@ print(output_text)
|
|
| 1234 |
# Parse a single image
|
| 1235 |
python3 dots_ocr/parser.py demo/demo_image1.jpg
|
| 1236 |
# Parse a single PDF
|
| 1237 |
-
python3 dots_ocr/parser.py demo/demo_pdf1.pdf --
|
| 1238 |
|
| 1239 |
# Layout detection only
|
| 1240 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
|
@@ -1246,6 +1248,9 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
|
|
| 1246 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
|
| 1247 |
|
| 1248 |
```
|
|
|
|
|
|
|
|
|
|
| 1249 |
|
| 1250 |
<details>
|
| 1251 |
<summary><b>Output Results</b></summary>
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- zh
|
| 5 |
+
- multilingual
|
| 6 |
+
library_name: transformers
|
| 7 |
license: mit
|
|
|
|
| 8 |
pipeline_tag: image-text-to-text
|
| 9 |
tags:
|
| 10 |
- image-to-text
|
|
|
|
| 15 |
- formula
|
| 16 |
- transformers
|
| 17 |
- custom_code
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
<div align="center">
|
|
|
|
| 27 |
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
| 28 |
</h1>
|
| 29 |
|
| 30 |
+
[](https://huggingface.co/papers/2512.02498)
|
| 31 |
+
[](https://github.com/rednote-hilab/dots.ocr)
|
| 32 |
+
[](https://dotsocr.xiaohongshu.com)
|
| 33 |
[](https://huggingface.co/rednote-hilab/dots.ocr)
|
| 34 |
+
[](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
|
| 35 |
|
| 36 |
|
| 37 |
<div align="center">
|
|
|
|
| 38 |
<a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
| 39 |
+
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
|
| 40 |
+
<a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
|
| 41 |
</div>
|
| 42 |
|
| 43 |
</div>
|
|
|
|
| 141 |
|
| 142 |
|
| 143 |
## News
|
| 144 |
+
* ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
|
| 145 |
* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
|
| 146 |
|
| 147 |
|
|
|
|
| 437 |
<td>0.100</td>
|
| 438 |
<td>0.185</td>
|
| 439 |
</tr>
|
|
|
|
| 440 |
|
| 441 |
<td rowspan="5"><strong>General<br>VLMs</strong></td>
|
| 442 |
<td>GPT4o</td>
|
|
|
|
| 1116 |
> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1117 |
```shell
|
| 1118 |
python3 tools/download_model.py
|
| 1119 |
+
|
| 1120 |
+
# with modelscope
|
| 1121 |
+
python3 tools/download_model.py --type modelscope
|
| 1122 |
```
|
| 1123 |
|
| 1124 |
|
| 1125 |
## 2. Deployment
|
| 1126 |
### vLLM inference
|
| 1127 |
+
We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
|
|
|
|
| 1128 |
|
| 1129 |
```shell
|
| 1130 |
+
# Launch vLLM model server
|
| 1131 |
+
vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1132 |
|
| 1133 |
+
# vLLM API Demo
|
| 1134 |
+
# See dots_ocr/model/inference.py for details on parameter and prompt settings
|
| 1135 |
+
# that help achieve the best output quality.
|
| 1136 |
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
| 1137 |
```
|
| 1138 |
|
|
|
|
| 1224 |
|
| 1225 |
</details>
|
| 1226 |
|
| 1227 |
+
### Hugginface inference with CPU
|
| 1228 |
+
Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536)
|
| 1229 |
+
|
| 1230 |
+
|
| 1231 |
## 3. Document Parse
|
| 1232 |
**Based on vLLM server**, you can parse an image or a pdf file using the following commands:
|
| 1233 |
```bash
|
|
|
|
| 1236 |
# Parse a single image
|
| 1237 |
python3 dots_ocr/parser.py demo/demo_image1.jpg
|
| 1238 |
# Parse a single PDF
|
| 1239 |
+
python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_thread 64 # try bigger num_threads for pdf with a large number of pages
|
| 1240 |
|
| 1241 |
# Layout detection only
|
| 1242 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
|
|
|
| 1248 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
|
| 1249 |
|
| 1250 |
```
|
| 1251 |
+
**Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
|
| 1252 |
+
|
| 1253 |
+
> Notice: transformers is slower than vllm, if you want to use demo/* with transformers,just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
|
| 1254 |
|
| 1255 |
<details>
|
| 1256 |
<summary><b>Output Results</b></summary>
|