Florence-2 Image Annotator

A custom Modular Diffusers block that uses Florence-2 for image annotation tasks like segmentation, object detection, and captioning.

Usage

Basic Usage

import torch
from diffusers import ModularPipeline
from diffusers.utils import load_image

# Load the block
image_annotator = ModularPipeline.from_pretrained(
    "diffusers/Florence2-image-Annotator",
    trust_remote_code=True
)
image_annotator.load_components(torch_dtype=torch.bfloat16)
image_annotator.to("cuda")

# Load an image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg")
image = image.resize((1024, 1024))

# Generate a segmentation mask
output = image_annotator(
    image=image,
    annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
    annotation_prompt="the car",
    annotation_output_type="mask_image",
)
output.mask_image[0].save("car-mask.png")

Compose with Inpainting Pipeline

from diffusers import ModularPipeline

# Load the annotator
image_annotator = ModularPipeline.from_pretrained(
    "diffusers/Florence2-image-Annotator",
    trust_remote_code=True
)

# Get an inpainting workflow and insert the annotator
# repo_id = .. # you can use SDXL/flux/qwen any pipeline support Inpaint
inpaint_blocks = ModularPipeline.from_pretrained(repo_id).blocks.get_workflow("inpainting")
inpaint_blocks.sub_blocks.insert("image_annotator", image_annotator.blocks, 0)

# Initialize the combined pipeline
pipe = inpaint_blocks.init_pipeline()
pipe.load_components(torch_dtype=torch.float16, device="cuda")

# Inpaint with automatic mask generation
output = pipe(
    prompt=prompt,
    image=image,
    annotation_task="<REFERRING_EXPRESSION_SEGMENTATION>",
    annotation_prompt="the car",
    annotation_output_type="mask_image",
    num_inference_steps=30,
    output="images"
)
output[0].save("inpainted-car.png")

Supported Tasks

Task	Description
`<OD>`	Object detection
`<REFERRING_EXPRESSION_SEGMENTATION>`	Segment specific objects based on text
`<CAPTION>`	Generate image caption
`<DETAILED_CAPTION>`	Generate detailed caption
`<MORE_DETAILED_CAPTION>`	Generate very detailed caption
`<DENSE_REGION_CAPTION>`	Caption different regions
`<CAPTION_TO_PHRASE_GROUNDING>`	Ground phrases to regions
`<OPEN_VOCABULARY_DETECTION>`	Detect objects from open vocabulary

Output Types

Type	Description
`mask_image`	Black and white mask image
`mask_overlay`	Mask overlaid on original image
`bounding_box`	Bounding boxes drawn on image

Inputs

Parameter	Type	Required	Default	Description
`image`	`PIL.Image`	Yes	-	Image to annotate
`annotation_task`	`str`	No	`<REFERRING_EXPRESSION_SEGMENTATION>`	Task to perform
`annotation_prompt`	`str`	Yes	-	Text prompt for the task
`annotation_output_type`	`str`	No	`mask_image`	Output format

Outputs

Parameter	Type	Description
`mask_image`	`PIL.Image`	Generated mask (when output type is `mask_image`)
`image`	`PIL.Image`	Annotated image (when output type is `mask_overlay` or `bounding_box`)
`annotations`	`dict`	Raw annotation predictions

Components

This block uses the following models from florence-community/Florence-2-base-ft:

image_annotator: Florence2ForConditionalGeneration
image_annotator_processor: AutoProcessor

Learn More

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including diffusers/Florence2-image-Annotator

Modular Pipelines

Collection

Diffusers Modular Pipeline repositories • 5 items • Updated about 9 hours ago