Comments (16)
I summed up the experience above and trained big-lama like this. If I made any mistakes, please correct me.
1.modified pytorch_lightning/trainer/connectors/checkpoint_connector.py Line 106:
https://github.com/PyTorchLightning/pytorch-lightning/blob/f9f4853f3663404362c7de8614a504b0403c25b8/pytorch_lightning/trainer/connectors/checkpoint_connector.py#L106
# restore training state
self.restore_training_state(checkpoint)
to
# restore training state
try:
self.restore_training_state(checkpoint)
except KeyError:
rank_zero_warn(
"File at `resume_from_checkpoint` Trying to restore training state but checkpoint contains only the model."
)
2.modified lama-main/saicinpainting/training/trainers/base.py Line 109:
if self.config.losses.get("resnet_pl", {"weight": 0})['weight'] > 0:
self.loss_resnet_pl = ResNetPL(**self.config.losses.resnet_pl)
to
if self.config.losses.get("sege_pl", {"weight": 0})['weight'] > 0:
self.loss_sege_pl = ResNetPL(**self.config.losses.sege_pl)
3.run
python bin/train.py -cn big-lama location=my_dataset data.batch_size=10 +trainer.kwargs.resume_from_checkpoint=abspath\\to\\big-lama-with-discr-remove-loss_segm_pl.ckpt
https://drive.google.com/file/d/1YTiKZ1hQnKvTEbXIxFXjGg61pBAch_N7/view?usp=sharing
model shared by @Liang-Sen
from lama.
@windj007
Thanks for your reply.
I just remove the "loss_segm_pl" from the checkpoint and its worked.
Share the remove_checkpoint here:
https://drive.google.com/file/d/1YTiKZ1hQnKvTEbXIxFXjGg61pBAch_N7/view?usp=sharing
from lama.
Could you also share the training log or time of big-lama? Thanks so much.
from lama.
Is the big-lama model trained on places-challenge dataset?
Not exactly Places Challenge - it was trained on a subset of 157 categories from Places Challenge. Please refer to supp.mat for exact list of these categories.
Whether it performs greatly better than a big-lama trained with places2-standard?
The difference is pretty noticeable by a naked eye, but the improvement from standard -> subset-of-challenge is less than the most important contributions from our paper (e.g. masks, architecture and segm-pl).
from lama.
Could you also share the training log or time of big-lama? Thanks so much.
It took approximately 12 days to train this big-lama on 8xV100 32GB with total batch size of 120 (8 gpus x 15 samples).
from lama.
Is it possible to release the full checkpoints of the big-lama model, so we can finetune it on other data?
I've just uploaded full checkpoint to https://disk.yandex.ru/d/wJ2Ee0f1HvasDQ subfoler big-lama-with-discr
- unlike other checkpoints, this one has discriminator and SegmPL weights included.
Please share your experience with finetuning - does it help and how dramatically.
from lama.
Thanks so much! That is super helpful!
from lama.
I'll close that issue for now - feel free to reopen if you have any issies with fine-tuning
from lama.
Hello,
I am having some issues loading the big-lama-with-discr
for finetuning. Please correct me if I am wrong but I notice that the SegmPL weights are loss_segm_pl.impl...
in the .ckpt, but the current trainer loads it as loss_resnet_pl.impl...
https://github.com/saic-mdal/lama/blob/ede702b19b027ad2c0380419b2b71a90fe90a14f/saicinpainting/training/trainers/base.py#L110
After modifying this, I get the following error:
'Trying to restore training state but checkpoint contains only the model.'
KeyError: 'Trying to restore training state but checkpoint contains only the model. This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`.'
@yzhouas did you have any success with this? I am wondering if it is just me.
from lama.
Apparently, this is a known issue in Pytorch Lightning, and the problem for the suggested Pytorch Lightning 1.2.9 seems to be here:
# restore training state
self.restore_training_state(checkpoint)
So, a very ugly hack would be to bypass it as:
# restore training state
try:
self.restore_training_state(checkpoint)
except KeyError:
rank_zero_warn(
"File at `resume_from_checkpoint` Trying to restore training state but checkpoint contains only the model."
)
from lama.
Hi @affromero !
Yeah, I forgot that we changed the name of this variable already after training big lama... Another possible solution is to just strip loss_segm_pl.impl...
from the checkpoint altogether - anyway it is initialized from a fixed ade20k checkpoint.
Trying to restore training state but checkpoint contains only the model.
I have not faced this issue yet. Have you resolved it?
from lama.
Hi @windj007,
I looked into the Supplementary Material but I was not able to find what categories from Places Challenge were used for training Big-Lama. Could you please list these categories? Also, why haven't you used the entire Places Challenge for training Big-Lama?
Thank you
from lama.
Hi @windj007 ,
I am having the some issue loading the big-lama-with-discr for finetuning, please correct me if I made any mistake.
I run this command:
python bin/train.py -cn big-lama location=my_dataset data.batch_size=10 +trainer.kwargs.resume_from_checkpoint=path\\to\\big-lama-with-discr\\best.ckpt
and got this error message:
RuntimeError: Error(s) in loading state_dict for DefaultInpaintingTrainingModule:
Missing key(s) in state_dict: "loss_resnet_pl.impl.conv1.weight", "loss_resnet_pl......
Unexpected key(s) in state_dict: "loss_segm_pl.impl.conv1.weight", "loss_segm_pl.impl....
I modified base.py Line 109:
From:
if self.config.losses.get("resnet_pl", {"weight": 0})['weight'] > 0: self.loss_resnet_pl = ResNetPL(**self.config.losses.resnet_pl)
To:
if self.config.losses.get("sege_pl", {"weight": 0})['weight'] > 0: self.loss_sege_pl = ResNetPL(**self.config.losses.sege_pl)
And and Missing key error is disappeared, but still have the Unexpected key error message:
Unexpected key(s) in state_dict: "loss_segm_pl.impl.conv1.weight", "loss_segm_pl.impl...._
Do you have any suggestion for this?
from lama.
@marcelsan The list is there, on page 5.
why haven't you used the entire Places Challenge for training Big-Lama?
Bigger datasets need bigger models - and smaller models work better when the dataset is more focused. And Big-LaMa is not that big in terms of number of trainable parameters.
from lama.
And and Missing key error is disappeared, but still have the Unexpected key error message:
The quick solution is a couple of comments above:
Another possible solution is to just strip loss_segm_pl.impl... from the checkpoint altogether - anyway it is initialized from a fixed ade20k checkpoint.
I should have fixed and reupploaded the checkpoint, but have not found time yet...
from lama.
@Liang-Sen thank you!
from lama.
Related Issues (20)
- Hi, I have made a iOS App with your great model!
- Prediction failed due to Missing key visualizer
- Can I fine-tune the model? HOT 2
- why is tensorflow necessary?
- ImportError: cannot import name 'DualIAATransform' from 'albumentations' HOT 1
- About the training command 2 HOT 1
- Created single-file version of LaMa
- Question about generating validation and eval data
- Can I separate the Feature Refinement to Improve the High-Resolution Image Inpainting technique
- A simple ckpt to pt model convertor
- Repeated Refinement?
- Error finetuning the big-lama-with-discr model HOT 7
- Data set training problem HOT 1
- After executing the training command, it has been stuck at this point without any progress in the training. HOT 1
- Inpaint a NEW thing? HOT 3
- Refinement with Multiple Images
- How to draw a loss function curve
- Dataset is empty if configuring img_suffix: .jpg in default.yaml
- ONNX Model done HOT 10
- Output Error: No inpainted in the output_dir HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lama.