It is a common practice, I believe, to have more than one tensorboard writers in your

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Add support to multiple tensorboard writers (e.g. train, val) about clearml HOT 13 CLOSED

allegroai commented on July 29, 2024

Add support to multiple tensorboard writers (e.g. train, val)

from clearml.

Comments (13)

bmartinn commented on July 29, 2024 2

Hi @pedropalb,

This fix is included in the newly released Trains v0.14.1 :)

$ pip install trains==0.14.1

from clearml.

bmartinn commented on July 29, 2024 1

Hi @pedropalb , we were able to reproduce the bug.
Working on a solution :) I'll update here once an RC is available.

BTW:
Awesome workaround :)
If you just want to disable the Tensorboard auto-magic you can do (ref):

task = Task.init('examples', 'Keras callbacks', auto_connect_frameworks={'tensorflow': False, })

from clearml.

bmartinn commented on July 29, 2024

If I understand you correctly, you have a single Tensorboard file, with two loss graphs, one "val" one "train".

This scenario should have had a single output graph in TRAINS ("loss") with two series, one "val" one "train", or two graphs ("val" and "train") with a single series per graph, named "loss".

If this is not what you are getting, could you please send a short code snippet to reproduce the problem?

from clearml.

daganesh commented on July 29, 2024

Thanks for the quick reply.
In our scenario we have a single experiment with two tensorboard event writer instances, and two different file-names.
One for train and one for validation.
We track both of them simultaneously.
In TensorBoard two "events" files are created, one for each file-name.

The reason for the problem is that all the scalar, plots and graphs have the exact same name for train and validation. This is done on purpose.

The problem is that the latest call to EventWriter will override the previous one, because the exact same hierarchy-names are used for both train and validation.
TrainsAI does not use the events file-name, thus the overriding happens.

Is that clearer?

from clearml.

bmartinn commented on July 29, 2024

@daganesh if I understand correctly, you are creating two EventWriters, writing two protobuf log files (Tensorboard log files).
Is this correct?
If this is indeed the case, then currently Trains does not support multiple tensorboard writers in the same experiment. If this is a popular use case, we will add it.

from clearml.

daganesh commented on July 29, 2024

Correct.
I do believe this is a popular use case.
Please refer also to the mnist code sample in your example projects.
You could see two event writers, one for train and one for validation.
In this example you can also see the problem:

I guess you meant to comment out the train writer, but in one place it was not commented out properly.
So you get two active writers simultaneously.
The results is that in the server plots you can, for every 100 plots points:
- 99 training sample points
- 1 validation point (the 100th). This point overrides the train point.

from clearml.

bmartinn commented on July 29, 2024

@daganesh thank you for noticing this bug is also evident in the examples.

If you like, you can already test the fix in the latest release-candidate :)
$ pip install trains==0.10.3rc0

The implemented solution logic is: if you only have a single Tensorboard writer, everything stays the same.
If you have an additional writer, using the same combination of title/series, then it will have a prefix/suffix based on the logdir.

You can test it with this example code
I this example since the first used writer is the "testing" writer, then the "train" writer will have a prefix of "train".
We didn't want to force two prefixes, since it is not always that you create Tensorboard writers before you use them, so there is no way to know in advance that there will be two Tensorboard writers...

from clearml.

daganesh commented on July 29, 2024

Thanks for the quick fix.
We'll give it a try

from clearml.

bmartinn commented on July 29, 2024

Hi @daganesh,
The fix is already part of the latest RC, let me know if it works for you as well.
$ pip install trains==0.10.3rc2

from clearml.

bmartinn commented on July 29, 2024

Closing, due to lack of activity.

from clearml.

pedropalb commented on July 29, 2024

Hi,

I'm using Keras API of Tensorflow 2.1. I monitor my model using the TensorBoard callback. In their current implementation, the validation and train metrics have the same name but are written by two different writers, so the plots are groupped by the name. For example, the validation and train loss curves are plot in the same figure under the name epoch_loss; the validation and train accuracy are plot in the same figure under the name epoch_accuracy; and so on.

In trains 0.14.0, only one curve is plot for each name. I suppose that the validation curves are overwriting the train curves. Is that a bug? Or did I miss some configuration to have the same behaviour of Tensorboard?

Thanks!

from clearml.

pedropalb commented on July 29, 2024

As a workaround, I set auto_connect_frameworks=False and created a keras callback as follows:

class TrainsLogger(tf.keras.callbacks.Callback):
    def __init__(self, logger, *args, **kwargs):
        super(TrainsLogger, self).__init__(*args, **kwargs)
        self.logger = logger
        
    def on_epoch_end(self, epoch, logs=None):
        relevant_metric_names = ['loss',
                                 'precision',
                                 'sensitivity',
                                 'specificity',
                                 'balanced_accuracy']
        
        for name in relevant_metric_names:
            self.logger.report_scalar(name, 'train', logs[f'{name}'], epoch)
            self.logger.report_scalar(name, 'validation', logs[f'val_{name}'], epoch)

But unfortunately, this way I loose the awesome automagical advantage! To keep part of it, I could keep auto_connect_frameworks=True, but then the results section becomes a mess with duplicates unless I click "view full screen" and deselect one by one which is not convenient.

from clearml.

bmartinn commented on July 29, 2024

Hi @pedropalb , as promised an RC is out with a fix, could you verify ?

$ pip install trains==0.14.1rc2

from clearml.

Add support to multiple tensorboard writers (e.g. train, val) about clearml HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent