Modular Pipelines
Collection
Diffusers Modular Pipeline repositories
•
5 items
•
Updated
A custom Modular Diffusers block that uses Florence-2 for image annotation tasks like segmentation, object detection, and captioning.
import torch
from diffusers import ModularPipeline
from diffusers.utils import load_image
# Load the block
image_annotator = ModularPipeline.from_pretrained(
"diffusers/Florence2-image-Annotator",
trust_remote_code=True
)
image_annotator.load_components(torch_dtype=torch.bfloat16)
image_annotator.to("cuda")
# Load an image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg")
image = image.resize((1024, 1024))
# Generate a segmentation mask
output = image_annotator(
image=image,
annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
annotation_prompt="the car",
annotation_output_type="mask_image",
)
output.mask_image[0].save("car-mask.png")
from diffusers import ModularPipeline
# Load the annotator
image_annotator = ModularPipeline.from_pretrained(
"diffusers/Florence2-image-Annotator",
trust_remote_code=True
)
# Get an inpainting workflow and insert the annotator
# repo_id = .. # you can use SDXL/flux/qwen any pipeline support Inpaint
inpaint_blocks = ModularPipeline.from_pretrained(repo_id).blocks.get_workflow("inpainting")
inpaint_blocks.sub_blocks.insert("image_annotator", image_annotator.blocks, 0)
# Initialize the combined pipeline
pipe = inpaint_blocks.init_pipeline()
pipe.load_components(torch_dtype=torch.float16, device="cuda")
# Inpaint with automatic mask generation
output = pipe(
prompt=prompt,
image=image,
annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
annotation_prompt="the car",
annotation_output_type="mask_image",
num_inference_steps=30,
output="images"
)
output[0].save("inpainted-car.png")
| Task | Description |
|---|---|
<OD> |
Object detection |
<REFERRING_EXPRESSION_SEGMENTATION> |
Segment specific objects based on text |
<CAPTION> |
Generate image caption |
<DETAILED_CAPTION> |
Generate detailed caption |
<MORE_DETAILED_CAPTION> |
Generate very detailed caption |
<DENSE_REGION_CAPTION> |
Caption different regions |
<CAPTION_TO_PHRASE_GROUNDING> |
Ground phrases to regions |
<OPEN_VOCABULARY_DETECTION> |
Detect objects from open vocabulary |
| Type | Description |
|---|---|
mask_image |
Black and white mask image |
mask_overlay |
Mask overlaid on original image |
bounding_box |
Bounding boxes drawn on image |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
image |
PIL.Image |
Yes | - | Image to annotate |
annotation_task |
str |
No | <REFERRING_EXPRESSION_SEGMENTATION> |
Task to perform |
annotation_prompt |
str |
Yes | - | Text prompt for the task |
annotation_output_type |
str |
No | mask_image |
Output format |
| Parameter | Type | Description |
|---|---|---|
mask_image |
PIL.Image |
Generated mask (when output type is mask_image) |
image |
PIL.Image |
Annotated image (when output type is mask_overlay or bounding_box) |
annotations |
dict |
Raw annotation predictions |
This block uses the following models from florence-community/Florence-2-base-ft:
image_annotator: Florence2ForConditionalGenerationimage_annotator_processor: AutoProcessor