2bit quantization

#2
by adnanPBI - opened

Can you provide me small guide to make the paddleocr-vl half billion model into 2bit Q2 quantized model as onnx format?

You can find the command for generating a Q2 gguf file in https://github.com/Liyulingyue/CreativeProjects/blob/main/PaddleOCR-VL-GGUF/README.md.
The README.md file use Q4_K_M to generate Q4 quantized model, you only need replace the Q4_K_M by Q2_k to make a Q2 gguf file.
Then, you can try to use fst2onnx to convert the gguf file into onnx format.

Thanks for your reply.

Just let me know whether the 2-bit Q2 quantized paddleocr-vl 0. 5b model shall work in the mobile cpu.

I test the Q4 gguf model on RDK X5(8x A55@1.5GHz, 4G内存版本), it cost 64s for a 64×64 image. So, I guess the Q2 model can work on mobile cpu well.

Sign up or log in to comment