GIT: A Generative Image-to-text Transformer for Vision and Language
Paper
• 2205.14100 • Published
• 1
GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering.