I did two fine-tunings of the model: Starting f

"Collapse" of regularization image class in fine-tuned models about dreambooth-stable-diffusion HOT 5 OPEN

xavierxiao commented on August 26, 2024

"Collapse" of regularization image class in fine-tuned models

from dreambooth-stable-diffusion.

Comments (5)

robertsehlke commented on August 26, 2024

Not too surprisingly "man" and "woman" classes seem to be entangled. Here is what I get for "a photo of a woman" with the three checkpoints:

(original)

(1)

(2)

from dreambooth-stable-diffusion.

XavierXiao commented on August 26, 2024

Well I think indeed you should use the original model for producing reg images, as fine tuning will do some mysterious thing on the model (remember the model is also fine-tuned on the reg images). I have some thoughts: given that SD cannot generate realistic photo of human, why not use some random images of human you obtained online for regularization? Maybe you can try that as well. I feel like the original SD model will always produce some black-white faint human images with prompt like "photo of a man/woman", so maybe using external, diverse set of human photos serves as better regularization.

from dreambooth-stable-diffusion.

robertsehlke commented on August 26, 2024

The first round of regularization images (from the untuned model) are pretty good/usable, so that should be fine.

I was just wondering why generating images with the regularization class noun after fine-tuning leads to such strong drift/collapse for the noun, also since you mentioned in the readme that it looks like they generate regularization images on the fly in the paper.

I'll try using more regularization images. Upon reading more closely the paper does mention that
∼200 × N “a [class noun]” samples are generated, with N being the size of the subject dataset. So we're looking at 800-1000 recommended.

from dreambooth-stable-diffusion.

TingTingin commented on August 26, 2024

Well I think indeed you should use the original model for producing reg images, as fine tuning will do some mysterious thing on the model (remember the model is also fine-tuned on the reg images). I have some thoughts: given that SD cannot generate realistic photo of human, why not use some random images of human you obtained online for regularization? Maybe you can try that as well. I feel like the original SD model will always produce some black-white faint human images with prompt like "photo of a man/woman", so maybe using external, diverse set of human photos serves as better regularization.

have you tested that does it produce better results?

from dreambooth-stable-diffusion.

robertsehlke commented on August 26, 2024

I've now tried it with more regularization images (~300, mix of curated images from the original model + internet photos, at the default 800 steps) - it seems to help a little bit with preserving diversity, but the class prior is still degraded.

Pretty impressive how well the ad hoc regularization works to generate/edit one intended new concept, but this issue limits it a bit. Not sure if they completely solved it in the Dreambooth paper either (though they're clearly aware) or just staved it off with far more regularization images.

from dreambooth-stable-diffusion.

"Collapse" of regularization image class in fine-tuned models about dreambooth-stable-diffusion HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent