Comments (7)
Ok that makes more sense. You (most likely) won't be able to run training and inference on the same gpu.
Can I ask why you are running inference.py
during training? During training we output logs on the training and test images showing the input, target, and output images during each validation interval. This should help you follow the training progress.
Moreover, during training we output the training and testing logs using tensorboard that will help you understand how the model performs on the test data over time. This should help you determine which checkpoint seems to be best. I would then run inference.py
on the best checkpoint you found.
from pixel2style2pixel.
All good.
Regarding your question, by defaulting we run for 500,000 steps, but that is more than we used or you that you will probably need.
We plot the training and test losses during training using tensorboard. I would recommend connecting to tensorboard to see when the model stops improving. When you see the model has converged (i.e. the test losses stop decreasing), you can stop training and use the best checkpoint obtained to perform inference.
I hope that helps answer your question.
from pixel2style2pixel.
Hi. If I understand you correctly, calling self.validate()
in Coach() results in out of memory?
If that is the case, I wouldn't recommend pausing training or terminating and restarting because the current code does not keep the state of the optimizer.
If you find that you are unable to run validation during training I would recommend setting the save interval to say 5000
and validation interval to the maximum number of steps you're training for. In doing so, you will not run validation during training, but save a checkpoint every 5000 intervals. After training, you have a bunch of checkpoints that you can validate using inference.py
and our metrics scripts.
(5000 was used an example, feel free to change depending on how often you think you should save checkpoints).
from pixel2style2pixel.
I'm not sure whether this is the same thing as what you're saying, but I am trying to run inference.py
in a different terminal instance whilst training is happening, which results in an error relating to insufficient CUDA memory.
I'm just trying to test the checkpoints using inference.py
, but am unsure of how to do this. I am using the recommended setting listed in the training section of the documentation.
from pixel2style2pixel.
Ahhhhhh I see, thank you. I was not aware of the visualizations throughout the training process, hence why I thought of running inference.py
. My apologies.
As a newcomer, another question - at what point should training stop? Will it stop at a designated point or is it a matter of monitoring the scores and terminating when you see fit?
from pixel2style2pixel.
Yes that's just what I was after! Once again, thank you for your help. 😎
from pixel2style2pixel.
Happy to help!
from pixel2style2pixel.
Related Issues (20)
- Use the pre-trained model for training HOT 2
- do a huggingface demo HOT 1
- How to train on paired data? HOT 1
- latent image editing HOT 1
- About celebs_seg_to_face HOT 1
- multiple GPUs HOT 4
- Using my own pretrained model from vanilla StyleGAN2 HOT 3
- Is it possible video2anime train? like this project? HOT 3
- A problem about how to get diverse images? HOT 1
- the output image become brown HOT 3
- Need help with running the code on CPU HOT 1
- loading pretrained weights, STACK_GLOBAL requires str HOT 1
- How to train pSp on Z+ space? HOT 2
- Is it possible to create output images in profile (side-on) perspective using sketch to face? HOT 1
- Single channel input with Moco loss not working. HOT 2
- Some error reporting problems encountered during operation HOT 1
- How to use other identity loss rather than Arcface or Moco HOT 2
- How to use psp for Face beautification
- loss jump problem HOT 1
- Retraining
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pixel2style2pixel.