Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,6 +1,10 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: mit
|
| 3 |
-
library_name: dots_ocr
|
| 4 |
pipeline_tag: image-text-to-text
|
| 5 |
tags:
|
| 6 |
- image-to-text
|
|
@@ -11,10 +15,6 @@ tags:
|
|
| 11 |
- formula
|
| 12 |
- transformers
|
| 13 |
- custom_code
|
| 14 |
-
language:
|
| 15 |
-
- en
|
| 16 |
-
- zh
|
| 17 |
-
- multilingual
|
| 18 |
---
|
| 19 |
|
| 20 |
<div align="center">
|
|
@@ -27,14 +27,17 @@ language:
|
|
| 27 |
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
| 28 |
</h1>
|
| 29 |
|
| 30 |
-
[](https://huggingface.co/rednote-hilab/dots.ocr)
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
<div align="center">
|
| 35 |
-
<a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
|
| 36 |
<a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
| 37 |
-
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
|
|
|
|
| 38 |
</div>
|
| 39 |
|
| 40 |
</div>
|
|
@@ -138,6 +141,7 @@ print(output_text)
|
|
| 138 |
|
| 139 |
|
| 140 |
## News
|
|
|
|
| 141 |
* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
|
| 142 |
|
| 143 |
|
|
@@ -433,7 +437,6 @@ print(output_text)
|
|
| 433 |
<td>0.100</td>
|
| 434 |
<td>0.185</td>
|
| 435 |
</tr>
|
| 436 |
-
<tr>
|
| 437 |
|
| 438 |
<td rowspan="5"><strong>General<br>VLMs</strong></td>
|
| 439 |
<td>GPT4o</td>
|
|
@@ -1113,28 +1116,23 @@ pip install -e .
|
|
| 1113 |
> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1114 |
```shell
|
| 1115 |
python3 tools/download_model.py
|
|
|
|
|
|
|
|
|
|
| 1116 |
```
|
| 1117 |
|
| 1118 |
|
| 1119 |
## 2. Deployment
|
| 1120 |
### vLLM inference
|
| 1121 |
-
We highly recommend using
|
| 1122 |
-
The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
|
| 1123 |
|
| 1124 |
```shell
|
| 1125 |
-
#
|
| 1126 |
-
|
| 1127 |
-
export hf_model_path=./weights/DotsOCR # Path to your downloaded model weights, Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1128 |
-
export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
|
| 1129 |
-
sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
|
| 1130 |
-
from DotsOCR import modeling_dots_ocr_vllm' `which vllm` # If you downloaded model weights by yourself, please replace `DotsOCR` by your model saved directory name, and remember to use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`)
|
| 1131 |
-
|
| 1132 |
-
# launch vllm server
|
| 1133 |
-
CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95 --chat-template-content-format string --served-model-name model --trust-remote-code
|
| 1134 |
|
| 1135 |
-
#
|
| 1136 |
-
|
| 1137 |
-
#
|
| 1138 |
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
| 1139 |
```
|
| 1140 |
|
|
@@ -1226,6 +1224,10 @@ print(output_text)
|
|
| 1226 |
|
| 1227 |
</details>
|
| 1228 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1229 |
## 3. Document Parse
|
| 1230 |
**Based on vLLM server**, you can parse an image or a pdf file using the following commands:
|
| 1231 |
```bash
|
|
@@ -1234,7 +1236,7 @@ print(output_text)
|
|
| 1234 |
# Parse a single image
|
| 1235 |
python3 dots_ocr/parser.py demo/demo_image1.jpg
|
| 1236 |
# Parse a single PDF
|
| 1237 |
-
python3 dots_ocr/parser.py demo/demo_pdf1.pdf --
|
| 1238 |
|
| 1239 |
# Layout detection only
|
| 1240 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
|
@@ -1246,6 +1248,9 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
|
|
| 1246 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
|
| 1247 |
|
| 1248 |
```
|
|
|
|
|
|
|
|
|
|
| 1249 |
|
| 1250 |
<details>
|
| 1251 |
<summary><b>Output Results</b></summary>
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- zh
|
| 5 |
+
- multilingual
|
| 6 |
+
library_name: transformers
|
| 7 |
license: mit
|
|
|
|
| 8 |
pipeline_tag: image-text-to-text
|
| 9 |
tags:
|
| 10 |
- image-to-text
|
|
|
|
| 15 |
- formula
|
| 16 |
- transformers
|
| 17 |
- custom_code
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
<div align="center">
|
|
|
|
| 27 |
dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
|
| 28 |
</h1>
|
| 29 |
|
| 30 |
+
[](https://huggingface.co/papers/2512.02498)
|
| 31 |
+
[](https://github.com/rednote-hilab/dots.ocr)
|
| 32 |
+
[](https://dotsocr.xiaohongshu.com)
|
| 33 |
[](https://huggingface.co/rednote-hilab/dots.ocr)
|
| 34 |
+
[](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
|
| 35 |
|
| 36 |
|
| 37 |
<div align="center">
|
|
|
|
| 38 |
<a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
|
| 39 |
+
<a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
|
| 40 |
+
<a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
|
| 41 |
</div>
|
| 42 |
|
| 43 |
</div>
|
|
|
|
| 141 |
|
| 142 |
|
| 143 |
## News
|
| 144 |
+
* ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
|
| 145 |
* ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
|
| 146 |
|
| 147 |
|
|
|
|
| 437 |
<td>0.100</td>
|
| 438 |
<td>0.185</td>
|
| 439 |
</tr>
|
|
|
|
| 440 |
|
| 441 |
<td rowspan="5"><strong>General<br>VLMs</strong></td>
|
| 442 |
<td>GPT4o</td>
|
|
|
|
| 1116 |
> 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
|
| 1117 |
```shell
|
| 1118 |
python3 tools/download_model.py
|
| 1119 |
+
|
| 1120 |
+
# with modelscope
|
| 1121 |
+
python3 tools/download_model.py --type modelscope
|
| 1122 |
```
|
| 1123 |
|
| 1124 |
|
| 1125 |
## 2. Deployment
|
| 1126 |
### vLLM inference
|
| 1127 |
+
We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
|
|
|
|
| 1128 |
|
| 1129 |
```shell
|
| 1130 |
+
# Launch vLLM model server
|
| 1131 |
+
vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1132 |
|
| 1133 |
+
# vLLM API Demo
|
| 1134 |
+
# See dots_ocr/model/inference.py for details on parameter and prompt settings
|
| 1135 |
+
# that help achieve the best output quality.
|
| 1136 |
python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
|
| 1137 |
```
|
| 1138 |
|
|
|
|
| 1224 |
|
| 1225 |
</details>
|
| 1226 |
|
| 1227 |
+
### Hugginface inference with CPU
|
| 1228 |
+
Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536)
|
| 1229 |
+
|
| 1230 |
+
|
| 1231 |
## 3. Document Parse
|
| 1232 |
**Based on vLLM server**, you can parse an image or a pdf file using the following commands:
|
| 1233 |
```bash
|
|
|
|
| 1236 |
# Parse a single image
|
| 1237 |
python3 dots_ocr/parser.py demo/demo_image1.jpg
|
| 1238 |
# Parse a single PDF
|
| 1239 |
+
python3 dots_ocr/parser.py demo/demo_pdf1.pdf --num_thread 64 # try bigger num_threads for pdf with a large number of pages
|
| 1240 |
|
| 1241 |
# Layout detection only
|
| 1242 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
|
|
|
|
| 1248 |
python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
|
| 1249 |
|
| 1250 |
```
|
| 1251 |
+
**Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
|
| 1252 |
+
|
| 1253 |
+
> Notice: transformers is slower than vllm, if you want to use demo/* with transformers,just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
|
| 1254 |
|
| 1255 |
<details>
|
| 1256 |
<summary><b>Output Results</b></summary>
|