marksverdhei/GLM-4.7-Flash-FP8
Note: If my PR to vLLM isn't merged yet you might have to use my fork. Cheers! ๐ค
./llama-server -m /models/mistralai_Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf --jinja --chat-template-file /models/Mistral-Small-3.2-24B-Instruct-2506.jinjaI don't love the period in the name since I don't like using it for purposes other than the file extension
I don't love the underscore either for what it's worth, but period feels wrong haha
- is probably ideal but then those are used in both author and model names already so the distinction between the two becomes blurred
author_model-nameNo it does not include the XS, the reason Q4_0 and IQ4_NL work i think is because they don't do any clever packing of the scaling factors, that's why K quants and IQ4_XS (which is like NL but with some K quant logic) don't work yet
oh, yeah, of course.. I added all the ARM quants but then not Q4_0 which is now the only one that would work haha..
I'll go any make a Q4_0 for it I suppose ! just this once
Don't love adding more formats but if your results are accurate it does seem worth including
I've updated it to "Legacy format, offers online repacking for ARM and AVX CPU inference.", it is still overall legacy but with the online repacking is worth considering for speed
I'm hoping that IQ4_NL gets a few more packing options in the near future
hell yeah. wish we could still offline compile, i get why it's not sustainable in the future but also until there's better support and more options would be nice to keep it around
oh right sorry, forgot to include that PR, i'll add it above but it's here:
https://github.com/ggerganov/llama.cpp/pull/10541
I think the inference engines will just need to update to the newer versions and they'll get the repacking logic for free, if that's what you meant then yes
This makes perfect sense, average users definitely don't need to be uploading that much stuff privately, great for testing but if it's not worth releasing publicly it's not worth storing on servers for free :)
Great update !