Update model card: Correct `library_name`, add paper/code/project links, and sync with GitHub README

This PR significantly improves the model card for `rednote-hilab/dots.ocr` by:

* **Updating `library_name`**: Changed the `library_name` in the metadata from `dots_ocr` to `transformers`. This is crucial as the model uses `transformers.AutoModelForCausalLM` and `transformers.AutoProcessor`, enabling the "How to use" widget on the Hub for easier adoption.
* **Adding prominent links**: Introduced new badges at the top for the paper, GitHub repository, and live demo (project page) for better discoverability. The existing live demo link in the text has been replaced by the badge. The `X` (Twitter) link from the GitHub README has also been added.
* **Syncing content with GitHub README**:
* Updated the "News" section with the latest release information.
* Revised the "Download Model Weights" section to include the ModelScope option.
* Refreshed the "vLLM inference" instructions under "Deployment" to reflect official vLLM integration (v0.11.0+) and simplified usage.
* Added a new "Huggingface inference with CPU" section.
* Updated the "Document Parse" section with the correct `--num_thread` argument and instructions for Transformers-based parsing.

These changes ensure the model card is up-to-date, more accurate, and more user-friendly, providing clearer guidance for researchers and users.

Files changed (1) hide show

README.md +29 -24

README.md CHANGED Viewed

@@ -1,6 +1,10 @@
 ---
 license: mit
-library_name: dots_ocr
 pipeline_tag: image-text-to-text
 tags:
 - image-to-text
@@ -11,10 +15,6 @@ tags:
 - formula
 - transformers
 - custom_code
-language:
-- en
-- zh
-- multilingual
 ---
 <div align="center">
@@ -27,14 +27,17 @@ language:
 dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
 </h1>
-[![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
 [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
 <div align="center">
-  <a href="https://dotsocr.xiaohongshu.com" target="_blank" rel="noopener noreferrer"><strong>🖥️ Live Demo</strong></a> |
   <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
-  <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a>
 </div>
 </div>
@@ -138,6 +141,7 @@ print(output_text)
 ## News
 * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
@@ -433,7 +437,6 @@ print(output_text)
 <td>0.100</td>
 <td>0.185</td>
 </tr>
-<tr>
 <td rowspan="5"><strong>General<br>VLMs</strong></td>
 <td>GPT4o</td>
@@ -1113,28 +1116,23 @@ pip install -e .
 > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
 ```shell
 python3 tools/download_model.py
 ```
 ## 2. Deployment
 ### vLLM inference
-We highly recommend using vllm for deployment and inference. All of our evaluations results are based on vllm version 0.9.1.
-The [Docker Image](https://hub.docker.com/r/rednotehilab/dots.ocr) is based on the official vllm image. You can also follow [Dockerfile](https://github.com/rednote-hilab/dots.ocr/blob/master/docker/Dockerfile) to build the deployment environment by yourself.
 ```shell
-# You need to register model to vllm at first
-python3 tools/download_model.py
-export hf_model_path=./weights/DotsOCR  # Path to your downloaded model weights, Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
-export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH
-sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\
-from DotsOCR import modeling_dots_ocr_vllm' `which vllm`  # If you downloaded model weights by yourself, please replace `DotsOCR` by your model saved directory name, and remember to use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`)
-# launch vllm server
-CUDA_VISIBLE_DEVICES=0 vllm serve ${hf_model_path} --tensor-parallel-size 1 --gpu-memory-utilization 0.95  --chat-template-content-format string --served-model-name model --trust-remote-code
-# If you get a ModuleNotFoundError: No module named 'DotsOCR', please check the note above on the saved model directory name.
-# vllm api demo
 python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
 ```
@@ -1226,6 +1224,10 @@ print(output_text)
 </details>
 ## 3. Document Parse
 **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
 ```bash
@@ -1234,7 +1236,7 @@ print(output_text)
 # Parse a single image
 python3 dots_ocr/parser.py demo/demo_image1.jpg
 # Parse a single PDF
-python3 dots_ocr/parser.py demo/demo_pdf1.pdf  --num_threads 64  # try bigger num_threads for pdf with a large number of pages
 # Layout detection only
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
@@ -1246,6 +1248,9 @@ python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_ocr
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
 ```
 <details>
 <summary><b>Output Results</b></summary>

 ---
+language:
+- en
+- zh
+- multilingual
+library_name: transformers
 license: mit
 pipeline_tag: image-text-to-text
 tags:
 - image-to-text
 - formula
 - transformers
 - custom_code
 ---
 <div align="center">
 dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model
 </h1>
+[![Paper](https://img.shields.io/badge/Paper-2512.02498-b31b1b.svg)](https://huggingface.co/papers/2512.02498)
+[![Code](https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr)
+[![Project Page](https://img.shields.io/badge/Project_Page-Live_Demo-blue)](https://dotsocr.xiaohongshu.com)
 [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/rednote-hilab/dots.ocr)
+[![Blog](https://img.shields.io/badge/Blog-View_on_GitHub-333.svg?logo=github)](https://github.com/rednote-hilab/dots.ocr/blob/master/assets/blog.md)
 <div align="center">
   <a href="https://raw.githubusercontent.com/rednote-hilab/dots.ocr/master/assets/wechat.jpg" target="_blank" rel="noopener noreferrer"><strong>💬 WeChat</strong></a> |
+  <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c" target="_blank" rel="noopener noreferrer"><strong>📕 rednote</strong></a> |
+  <a href="https://x.com/rednotehilab" target="_blank" rel="noopener noreferrer"><strong>🐦 X</strong></a>
 </div>
 </div>
 ## News
+* ```2025.10.31 ``` 🚀 We release [dots.ocr.base](https://huggingface.co/rednote-hilab/dots.ocr.base), foundation VLM focus on OCR tasks, also the base model of [dots.ocr](https://github.com/rednote-hilab/dots.ocr). Try it out!
 * ```2025.07.30 ``` 🚀 We release [dots.ocr](https://github.com/rednote-hilab/dots.ocr), — a multilingual documents parsing model based on 1.7b llm, with SOTA performance.
 <td>0.100</td>
 <td>0.185</td>
 </tr>
 <td rowspan="5"><strong>General<br>VLMs</strong></td>
 <td>GPT4o</td>
 > 💡**Note:** Please use a directory name without periods (e.g., `DotsOCR` instead of `dots.ocr`) for the model save path. This is a temporary workaround pending our integration with Transformers.
 ```shell
 python3 tools/download_model.py
+# with modelscope
+python3 tools/download_model.py --type modelscope
 ```
 ## 2. Deployment
 ### vLLM inference
+We highly recommend using vLLM for deployment and inference. All of our evaluations results are based on vLLM 0.9.1 via out-of-tree model registration. **Since vLLM version 0.11.0, Dots OCR has been officially integrated into vLLM with verified performance** and you can use vLLM docker image directly (e.g, `vllm/vllm-openai:v0.11.0`) to deploy the model server.
 ```shell
+# Launch vLLM model server
+vllm serve rednote-hilab/dots.ocr --trust-remote-code --async-scheduling --gpu-memory-utilization 0.95
+# vLLM API Demo
+# See dots_ocr/model/inference.py for details on parameter and prompt settings
+# that help achieve the best output quality.
 python3 ./demo/demo_vllm.py --prompt_mode prompt_layout_all_en
 ```
 </details>
+### Hugginface inference with CPU
+Please refer to [CPU inference](https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536)
 ## 3. Document Parse
 **Based on vLLM server**, you can parse an image or a pdf file using the following commands:
 ```bash
 # Parse a single image
 python3 dots_ocr/parser.py demo/demo_image1.jpg
 # Parse a single PDF
+python3 dots_ocr/parser.py demo/demo_pdf1.pdf  --num_thread 64  # try bigger num_threads for pdf with a large number of pages
 # Layout detection only
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_layout_only_en
 python3 dots_ocr/parser.py demo/demo_image1.jpg --prompt prompt_grounding_ocr --bbox 163 241 1536 705
 ```
+**Based on Transformers**, you can parse an image or a pdf file using the same commands above, just add `--use_hf true`.
+> Notice: transformers is slower than vllm, if you want to use demo/* with transformers，just add `use_hf=True` in `DotsOCRParser(..,use_hf=True)`
 <details>
 <summary><b>Output Results</b></summary>