ObjEmbed
Collection
ObjEmbed: Towards Universal Multimodal Object Embeddings
•
3 items
•
Updated
ObjEmbed is a multimodal embedding model that decomposes an input image into multiple regional embeddings, each corresponding to an individual object, along with global embeddings. It is designed to bridge the gap between global image-text alignment and fine-grained region-phrase alignment.
ObjEmbed enjoys three key properties:
If you find our work helpful for your research, please consider citing our paper:
@article{fu2026objembed,
title={ObjEmbed: Towards Universal Multimodal Object Embeddings},
author={Fu, Shenghao and Su, Yukun and Rao, Fengyun and LYU, Jing and Xie, Xiaohua and Zheng, Wei-Shi},
journal={arXiv preprint arXiv:2602.01753},
year={2026}
}