noamrot
/

FuseCap_Image_Captioning

image-captioning

Model card Files Files and versions

noamrot commited on Jun 1, 2023

Commit

6540118

·

1 Parent(s): 1a4b1f8

add python example

Files changed (1) hide show

README.md +30 -8

README.md CHANGED Viewed

@@ -14,19 +14,41 @@ A framework designed to generate semantically rich image captions.
 - 🚀 **Demo**: Try out our BLIP-based model [demo](https://huggingface.co/spaces/noamrot/FuseCap) trained using FuseCap, hosted on Huggingface Spaces.
 ## Upcoming Updates
-The official codebase and trained models for this project will be released soon.
 ## BibTeX
 ``` Citation
-@misc{rotstein2023fusecap,
-      title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions},
-      author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel},
-      year={2023},
-      eprint={2305.17718},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
 }
 ```

 - 🚀 **Demo**: Try out our BLIP-based model [demo](https://huggingface.co/spaces/noamrot/FuseCap) trained using FuseCap, hosted on Huggingface Spaces.
+#### Running the model
+Our BLIP-based model can be run using the following code,
+```python
+import requests
+from PIL import Image
+from transformers import BlipProcessor, BlipForConditionalGeneration
+import torch
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+processor = BlipProcessor.from_pretrained("noamrot/FuseCap")
+model = BlipForConditionalGeneration.from_pretrained("noamrot/FuseCap").to(device)
+img_url = 'https://huggingface.co/spaces/noamrot/FuseCap/resolve/main/bike.jpg'
+raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
+text = "a picture of "
+inputs = processor(raw_image, text, return_tensors="pt").to(device)
+out = model.generate(**inputs, num_beams = 3)
+print(processor.decode(out[0], skip_special_tokens=True))
+```
 ## Upcoming Updates
+The official codebase, datasets and trained models for this project will be released soon.
 ## BibTeX
 ``` Citation
+@article{rotstein2023fusecap,
+  title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions},
+  author={Rotstein, Noam and Bensaid, David and Brody, Shaked and Ganz, Roy and Kimmel, Ron},
+  journal={arXiv preprint arXiv:2305.17718},
+  year={2023}
 }
 ```