Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
utter-project
/
TowerVideo-9B
like
2
Follow
UTTER - Unified Transcription and Translation for Extended Reality
341
Video-Text-to-Text
Transformers
Safetensors
18 languages
llava_onevision
image-to-text
multimodal
multilingual
vlm
translation
arxiv:
2510.21849
License:
cc-by-nc-sa-4.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
TowerVideo-9B
/
README.md
Commit History
Update README.md
b5cc963
verified
GuilhermeNunes
commited on
Oct 28
Update README.md
36f0ca3
verified
GuilhermeNunes
commited on
Oct 23
Update README.md
7904306
verified
GuilhermeNunes
commited on
Oct 15
Update README.md
6207e88
verified
GuilhermeNunes
commited on
Oct 15
Update README.md
006a7b1
verified
SaulSantos
commited on
Oct 14
initial commit
3d270ab
verified
SaulSantos
commited on
Oct 14