A major recent paper on erasing is OmniEraser [2]. They open-sourced an evaluation dataset [3] (and I'm using it for the evaluation of our LBM-Eraser ๐).
It's not a big dataset (70 samples), but it's good quality pairs, and that's what matters !
When repurposing a T2I model into a pure I2I model, thereโs always that orphaned text path โ what do we do with it? ๐ค
You can reuse it as learnable embeddings in multi-task setups [2], freeze an empty text prompt, distillate or prune the corresponding part.
In LBM, they take a clever route โ zeroing [3] and reshaping [4] the text-related cross-attentions into self-attentions. This gives you fresh weights for I2I computation, nicely integrated into your SD architecture.
In LBM paper, the noise and the conditioning image are merged into a single composite image.
Unlike other inpainting methods (which typically grey-mask the missing area), LBM replaces the masked region with uniformly sampled random pixels.
Intuitively, since LBM is trained from a text-to-image (T2I) model, those random pixels act as a strong signal to the pretrained model โ essentially saying: โThis is where you can do your generative magic.โ
We (finegrain) have trained this new model in partnership with Nfinite and some of their synthetic data, the resulting model is incredibly accurate ๐. Itโs all open source under the MIT license (finegrain/finegrain-box-segmenter), complete with a test set tailored for e-commerce (finegrain/finegrain-product-masks-lite). Have fun experimenting with it!
reacted to 1aurent's
post with ๐ฅover 1 year ago