If you having issues with this Model use either ONNX versions or convert this model.

#34
by brentrynn - opened

This model works best if you use ONNX for inference. If you’re having issues, try running an ONNX version—it seems to work a lot better. I thought I’d post this here since a lot of people are struggling to get this model working.

However, it won’t work out-of-the-box unless you patch the build script to allow conversion. I’m uploading my scripts so anyone can use them. You’ll also need to clean up all the arrays in the JSON files (basically, just convert nested arrays to flat arrays).

I had to rebuild with this command to make inference work:

python -m onnxruntime_genai.models.builder
-m Phi
-o models
-p fp16
-e cuda
--extra_options num_attention_heads=24 num_key_value_heads=8

Not sure if this will help anyone else, but you can find all the scripts and fixes on my GitHub:

https://github.com/brentrynn/phi-onnx

I should add there is a ONNX model on here you can use and that works out of the box.

Sign up or log in to comment