Use text and image encoder separately with onnxruntime

#57

by Frayin - opened Aug 20

Aug 20

Problem

Hey thanks for sharing the model. I want to use your clip model with onnxruntime on a cpu but it seems that the model is exported with both text and image inputs.
I want to use the text and image encoder separately for inference (like how we can use encode_text and encode_image separately), so I tried to export it myself but the model fails to export to ONNX format.

What I tried

Standard torch.onnx.export() with various configurations
Dynamo-based export (dynamo=True)
Different opset versions (11, 12, 14)
Static shapes (no dynamic_axes)
Custom wrapper classes to isolate the text encoder

Error

All export attempts fail with:
IndexError: Argument passed to at() was not in the map.
This occurs during TorchScript's peephole optimization pass (_C._jit_pass_peephole).

Question

The only reason I'm doing all of this is because I wish to use the text and image encoder separately with onnxruntime, so if you could point me to how I can achieve this, that'll be great. Otherwise, could you share some insights on how I can go about exporting text and image encoder separately to ONNX? Thank you very much.

pySilver

Sep 3

@Frayin did you manage to solve it? I am facing the same issue.

pySilver

Sep 3

@Frayin here how it needs to be done https://huggingface.co/jinaai/jina-clip-v2/discussions/12#67445e1ae8ad555f8d307322

thibaut-orn

Sep 12

Facing the same issue here. We don't really want to load in memory the full model, and need to export our own ONNX files for text and vision. Did you managed to solve it ?

tzhfavx256

Oct 29

@thibaut-orn I have managed to compile it separately. Essentially, you would want to run everything once to download all the files, and then find jinaai/jina-embeddings-v3, and change its config.json to use jinaai/xlm-roberta-flash-implementation-onnx instead of jinaai/xlm-roberta-flash-implementation. there are some explanation on the former project's page on why the latter could not be exported to ONNX which could account to your problems. You should also use hf cli to download the project, and perform some manual patches according to the errors that torch would product on export (since the project appears to be largely behind the standard implementation). After all these, you would want to export all these things with float32 precision, as ONNX runtime has REALLY bad support (I just found out that it doesn't even support bfloat16 matrix multiplication, and that's probably why the official ONNX release only contains a float32 version). The exported model would have not been normalized, so if you want normalized vectors, you would have to add a custom wrapper module outside so that the exported ONNX would produce normalized vectors instead of (or along with) the raw vectors.

tzhfavx256

Oct 29

Probably exporting directly with torch_tensorrt would product working TensorRT model on NVidia GPUs but anyway I will try it first.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment