Comments (13)
Hi @pedropalb,
This fix is included in the newly released Trains v0.14.1 :)
$ pip install trains==0.14.1
from clearml.
Hi @pedropalb , we were able to reproduce the bug.
Working on a solution :) I'll update here once an RC is available.
BTW:
Awesome workaround :)
If you just want to disable the Tensorboard auto-magic you can do (ref):
task = Task.init('examples', 'Keras callbacks', auto_connect_frameworks={'tensorflow': False, })
from clearml.
If I understand you correctly, you have a single Tensorboard file, with two loss graphs, one "val" one "train".
This scenario should have had a single output graph in TRAINS ("loss") with two series, one "val" one "train", or two graphs ("val" and "train") with a single series per graph, named "loss".
If this is not what you are getting, could you please send a short code snippet to reproduce the problem?
from clearml.
Thanks for the quick reply.
In our scenario we have a single experiment with two tensorboard event writer instances, and two different file-names.
One for train and one for validation.
We track both of them simultaneously.
In TensorBoard two "events" files are created, one for each file-name.
The reason for the problem is that all the scalar, plots and graphs have the exact same name for train and validation. This is done on purpose.
The problem is that the latest call to EventWriter will override the previous one, because the exact same hierarchy-names are used for both train and validation.
TrainsAI does not use the events file-name, thus the overriding happens.
Is that clearer?
from clearml.
@daganesh if I understand correctly, you are creating two EventWriters, writing two protobuf log files (Tensorboard log files).
Is this correct?
If this is indeed the case, then currently Trains does not support multiple tensorboard writers in the same experiment. If this is a popular use case, we will add it.
from clearml.
Correct.
I do believe this is a popular use case.
Please refer also to the mnist code sample in your example projects.
You could see two event writers, one for train and one for validation.
In this example you can also see the problem:
- I guess you meant to comment out the train writer, but in one place it was not commented out properly.
- So you get two active writers simultaneously.
- The results is that in the server plots you can, for every 100 plots points:
- 99 training sample points
- 1 validation point (the 100th). This point overrides the train point.
from clearml.
@daganesh thank you for noticing this bug is also evident in the examples.
If you like, you can already test the fix in the latest release-candidate :)
$ pip install trains==0.10.3rc0
The implemented solution logic is: if you only have a single Tensorboard writer, everything stays the same.
If you have an additional writer, using the same combination of title/series, then it will have a prefix/suffix based on the logdir.
You can test it with this example code
I this example since the first used writer is the "testing" writer, then the "train" writer will have a prefix of "train".
We didn't want to force two prefixes, since it is not always that you create Tensorboard writers before you use them, so there is no way to know in advance that there will be two Tensorboard writers...
from clearml.
Thanks for the quick fix.
We'll give it a try
from clearml.
Hi @daganesh,
The fix is already part of the latest RC, let me know if it works for you as well.
$ pip install trains==0.10.3rc2
from clearml.
Closing, due to lack of activity.
from clearml.
Hi,
I'm using Keras API of Tensorflow 2.1. I monitor my model using the TensorBoard callback. In their current implementation, the validation and train metrics have the same name but are written by two different writers, so the plots are groupped by the name. For example, the validation and train loss curves are plot in the same figure under the name epoch_loss; the validation and train accuracy are plot in the same figure under the name epoch_accuracy; and so on.
In trains 0.14.0, only one curve is plot for each name. I suppose that the validation curves are overwriting the train curves. Is that a bug? Or did I miss some configuration to have the same behaviour of Tensorboard?
Thanks!
from clearml.
As a workaround, I set auto_connect_frameworks=False
and created a keras callback as follows:
class TrainsLogger(tf.keras.callbacks.Callback):
def __init__(self, logger, *args, **kwargs):
super(TrainsLogger, self).__init__(*args, **kwargs)
self.logger = logger
def on_epoch_end(self, epoch, logs=None):
relevant_metric_names = ['loss',
'precision',
'sensitivity',
'specificity',
'balanced_accuracy']
for name in relevant_metric_names:
self.logger.report_scalar(name, 'train', logs[f'{name}'], epoch)
self.logger.report_scalar(name, 'validation', logs[f'val_{name}'], epoch)
But unfortunately, this way I loose the awesome automagical advantage! To keep part of it, I could keep auto_connect_frameworks=True
, but then the results section becomes a mess with duplicates unless I click "view full screen" and deselect one by one which is not convenient.
from clearml.
Hi @pedropalb , as promised an RC is out with a fix, could you verify ?
$ pip install trains==0.14.1rc2
from clearml.
Related Issues (20)
- Incorrect ordering of iterations in scalar reporting of metrics HOT 1
- Greater control in downsampling used in plots HOT 2
- Modify report_matplotlib_figure to Omit Iteration Value When iteration=None HOT 3
- Error when calling classes through Fire HOT 2
- Use a function of iterations (e.g. epochs) as the time scale for scalars and plots HOT 2
- Pathlib Path instances in a dataclass do not get tracked by task.connect() HOT 1
- delete datasets after call get_local_copy
- report_matplotlib_figure of subplots HOT 2
- How to use Omegaconf without Hydra? HOT 1
- Scrolling log problem when using tqdm as training process bar HOT 5
- ClearML feature for integration KerasTuner is broken HOT 1
- Fix typo in docs and default sdk config HOT 1
- Executing clearml-task from cli with "-m" modules HOT 1
- Dynamic GPU/Queue Allocation for Workers in ClearML
- Add tag with Clearm-task (cli tools) HOT 1
- Problem creating datasets with Azure storage when multi file HOT 5
- Task creation failed!Always searching for this project? But I don't have it! HOT 1
- Support Megatron-LM training job on k8s cluster HOT 4
- Model.get_local_copy with specific download path. HOT 1
- "413 Request Entity Too Large" when uploading files to ClearML HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml.