2bit quantization
#2
by
adnanPBI
- opened
Can you provide me small guide to make the paddleocr-vl half billion model into 2bit Q2 quantized model as onnx format?
You can find the command for generating a Q2 gguf file in https://github.com/Liyulingyue/CreativeProjects/blob/main/PaddleOCR-VL-GGUF/README.md.
The README.md file use Q4_K_M to generate Q4 quantized model, you only need replace the Q4_K_M by Q2_k to make a Q2 gguf file.
Then, you can try to use fst2onnx to convert the gguf file into onnx format.
Thanks for your reply.
Just let me know whether the 2-bit Q2 quantized paddleocr-vl 0. 5b model shall work in the mobile cpu.
I test the Q4 gguf model on RDK X5(8x A55@1.5GHz, 4G内存版本), it cost 64s for a 64×64 image. So, I guess the Q2 model can work on mobile cpu well.