2bit quantization

by adnanPBI - opened 28 days ago

Discussion

adnanPBI

28 days ago

Can you provide me small guide to make the paddleocr-vl half billion model into 2bit Q2 quantized model as onnx format?

Liyulingyue

Owner 27 days ago

You can find the command for generating a Q2 gguf file in https://github.com/Liyulingyue/CreativeProjects/blob/main/PaddleOCR-VL-GGUF/README.md.
The README.md file use Q4_K_M to generate Q4 quantized model, you only need replace the Q4_K_M by Q2_k to make a Q2 gguf file.
Then, you can try to use fst2onnx to convert the gguf file into onnx format.

adnanPBI

27 days ago

Thanks for your reply.

Just let me know whether the 2-bit Q2 quantized paddleocr-vl 0. 5b model shall work in the mobile cpu.

Liyulingyue

Owner 27 days ago

I test the Q4 gguf model on RDK X5(8x A55@1.5GHz, 4G内存版本), it cost 64s for a 64×64 image. So, I guess the Q2 model can work on mobile cpu well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment