Hi I am in the process of retraining the encoder and was wondering w

Pausing for testing during training about pixel2style2pixel HOT 7 CLOSED

eladrich commented on May 27, 2024

Pausing for testing during training

from pixel2style2pixel.

Comments (7)

yuval-alaluf commented on May 27, 2024 1

Ok that makes more sense. You (most likely) won't be able to run training and inference on the same gpu.
Can I ask why you are running inference.py during training? During training we output logs on the training and test images showing the input, target, and output images during each validation interval. This should help you follow the training progress.
Moreover, during training we output the training and testing logs using tensorboard that will help you understand how the model performs on the test data over time. This should help you determine which checkpoint seems to be best. I would then run inference.py on the best checkpoint you found.

from pixel2style2pixel.

yuval-alaluf commented on May 27, 2024 1

All good.
Regarding your question, by defaulting we run for 500,000 steps, but that is more than we used or you that you will probably need.
We plot the training and test losses during training using tensorboard. I would recommend connecting to tensorboard to see when the model stops improving. When you see the model has converged (i.e. the test losses stop decreasing), you can stop training and use the best checkpoint obtained to perform inference.
I hope that helps answer your question.

from pixel2style2pixel.

yuval-alaluf commented on May 27, 2024

Hi. If I understand you correctly, calling self.validate() in Coach() results in out of memory?
If that is the case, I wouldn't recommend pausing training or terminating and restarting because the current code does not keep the state of the optimizer.
If you find that you are unable to run validation during training I would recommend setting the save interval to say 5000 and validation interval to the maximum number of steps you're training for. In doing so, you will not run validation during training, but save a checkpoint every 5000 intervals. After training, you have a bunch of checkpoints that you can validate using inference.py and our metrics scripts.
(5000 was used an example, feel free to change depending on how often you think you should save checkpoints).

from pixel2style2pixel.

spamfold3r commented on May 27, 2024

I'm not sure whether this is the same thing as what you're saying, but I am trying to run inference.py in a different terminal instance whilst training is happening, which results in an error relating to insufficient CUDA memory.

I'm just trying to test the checkpoints using inference.py, but am unsure of how to do this. I am using the recommended setting listed in the training section of the documentation.

from pixel2style2pixel.

spamfold3r commented on May 27, 2024

Ahhhhhh I see, thank you. I was not aware of the visualizations throughout the training process, hence why I thought of running inference.py. My apologies.

As a newcomer, another question - at what point should training stop? Will it stop at a designated point or is it a matter of monitoring the scores and terminating when you see fit?

from pixel2style2pixel.

spamfold3r commented on May 27, 2024

Yes that's just what I was after! Once again, thank you for your help. 😎

from pixel2style2pixel.

yuval-alaluf commented on May 27, 2024

Happy to help!

from pixel2style2pixel.

Pausing for testing during training about pixel2style2pixel HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent