Comments (6)
Are you training on the GPU? I'm not sure what "stuck" means here, but it sounds like training could just be proceeding very slowly, and TF using the CPU for training is a very common reason for that.
from google-research.
I second Jon's analysis of this: it looks like CPU is being used here, which can be extremely slow and appear stuck. After waiting a long time (say, an hour) do you see anything written to your model directory?
I would recommend training with a GPU if that is possible, as it will be much faster. The last log is relating to Intel's OpenMP* thread mapping, which is probably because Intel's MKL-DNN is being used. But I have not seen those logs while training before, and see no reason why this would cause stalling.
from google-research.
I just tried training out the current code, and it seems to produce model checkpoints as output. @aasharma90 , can you confirm that model checkpoints aren't being produced when you run this? It's a little confusing because training doesn't produce loss/epoch print statements, but that seems to be a visualization issue, and not a correctness issue.
from google-research.
To print loss in the terminal during training, add tf.logging.set_verbosity(tf.logging.INFO)
to set a high enough verbosity to see the training metrics. You can add this line right before the call to tf.estimator.train_and_evaluate(...)
in train.py.
By default, Estimator will log every 100 steps. You can change this by modifying the config in train.py:
config = tf.estimator.RunConfig(FLAGS.model_dir, log_step_count_steps=[num of steps])
You may find it easier to use TensorBoard to visualize training progress, which can be done by running tensorboard --logdir=[path to model dir]
in a separate terminal during or after training, and opening the printed URL in a web browser.
from google-research.
Thanks Tim, CL is in flight, I'll closes this issue once it's landed.
from google-research.
Hi @timothybrooks and @jonbarron
Regarding your questions -
-
I thought the default setting would be to be run it on GPU? Sorry, I'm a very new to TF so not much aware. Could you please let me know how that can be done? You can have a look at my default command I used for training in my original post above.
-
I added Tim's suggestions in
train.py
...
config = tf.estimator.RunConfig(FLAGS.model_dir, log_step_count_steps=1)
...
tf.logging.set_verbosity(tf.logging.INFO)
The simulation is still at the same point I mentioned.
- Launching
tensorboard
, I can see the model graph, but I cannot see the training profile.
from google-research.
Related Issues (20)
- Can't get TiDE to handle future covariates
- Possible error in MobileBert embedding convolution
- fvlm: cannot import name 'clip_utils' from 'utils'
- kws_streaming needs updates to be usable on latest tensorflow
- About the Synthetic Repetition HOT 3
- i3d model's input range for FVD calculation
- Finetuning deplot
- Redundancy of jnp.where operation in d3pm p_logits.
- IABOT
- [email protected]
- Proposal for Collaborative Research on New Supervised Contrastive Learning Loss Functions
- TFT is not sorting based on dates automatically.
- I want to know the AI basics
- your code
- Your code made me vomit
- Meth Whores down to fuckb in the greater flint michigan
- [HITNET] How can I train my own models?
- AttributeError: module 'tensorflow.compat.v1' has no attribute 'BinaryCrossentropy'
- rep
- flint town
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from google-research.