PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

The model presented in PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling.

🌟 Overview

PaCo-RL is a comprehensive framework for consistent image generation through reinforcement learning, addressing challenges in preserving identities, styles, and logical coherence across multiple images for storytelling and character design applications.

Key Components

PaCo-Reward: A pairwise consistency evaluator with task-aware instruction and CoT reasoning.
PaCo-GRPO: Efficient RL optimization with resolution-decoupled training and log-tamed multi-reward aggregation

Example Usage

import torch
from diffusers import FluxPipeline
from peft import PeftModel

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)

pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer,
    'X-GenGroup/PaCo-FLUX.1-dev-Lora'
)
main_prompt = "THREE-PANEL Images with a 1x3 grid layout Joker-themed posters inspired by Joaquin Phoenix's portrayal, unified through minimalist aesthetics. All posters use a minimalist style with bold outlines, textured muted green backgrounds, grunge effects, distressed yellow/red/blue/green accents, and include the header 'JOAQUIN PHOENIX' in small white capitals."
sub_prompts = [
    "[LEFT]: A poster dominated by oversized, distressed yellow 'JOKER' text spanning the upper half. The letters have jagged edges and subtle cracks, contrasting sharply against the muted green grunge background. Minimal supporting elements ensure the title commands full visual attention.",
    "[MIDDLE]: A poster symmetrically framed by 'OCTOBER 4' on the left and 'PUT ON A HAPPY FACE' on the right in crisp white text. Both phrases are aligned vertically with balanced spacing, flanking a central void filled only with faint grunge textures. Red and blue accents subtly underline the text blocks."
    "[RIGHT]: A poster centered on a stylized profile of the Joker's face with an exaggerated, sharp-edged smile. White base makeup contrasts with vivid red lips and blue triangular eye accents. His dark green hair merges with the background, while a red suit collar and yellow vest peek from below, rendered in flat minimalist shapes."
]
prompt = main_prompt + " " + " ".join(sub_prompts)
image = pipe(
    prompt,
    height=512,
    width=1536,
    guidance_scale=3.5,
    num_inference_steps=20,
    max_sequence_length=512,
    generator=torch.Generator("cuda").manual_seed(42)
).images[0]
image.save("joker_posters.png")

🎁 Model Zoo

Model	Type	HuggingFace
PaCo-Reward-7B	Reward Model	🤗 Link
PaCo-Reward-7B-Lora	Reward Model (LoRA)	🤗 Link
PaCo-FLUX.1-dev	T2I Model (LoRA)	🤗 Link
PaCo-FLUX.1-Kontext-dev	Image Editing Model (LoRA)	🤗 Link
PaCo-QwenImage-Edit	Image Editing Model (LoRA)	🤗 Link

⭐ Citation

@misc{ping2025pacorladvancingreinforcementlearning,
      title={PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling}, 
      author={Bowen Ping and Chengyou Jia and Minnan Luo and Changliang Xia and Xin Shen and Zhuohang Dang and Hangwei Qian},
      year={2025},
      eprint={2512.04784},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.04784}, 
}

_{⭐ Star us on GitHub if you find PaCo-RL helpful!}

Downloads last month: 29

Collection including X-GenGroup/PaCo-FLUX.1-dev-Lora

PaCo-RL

Collection

Data and Model collection for PaCo-RL • 9 items • Updated 1 day ago • 7