Coder Social home page Coder Social logo

Comments (13)

bmartinn avatar bmartinn commented on July 29, 2024 2

Hi @pedropalb,

This fix is included in the newly released Trains v0.14.1 :)

$ pip install trains==0.14.1

from clearml.

bmartinn avatar bmartinn commented on July 29, 2024 1

Hi @pedropalb , we were able to reproduce the bug.
Working on a solution :) I'll update here once an RC is available.

BTW:
Awesome workaround :)
If you just want to disable the Tensorboard auto-magic you can do (ref):

task = Task.init('examples', 'Keras callbacks', auto_connect_frameworks={'tensorflow': False, })

from clearml.

bmartinn avatar bmartinn commented on July 29, 2024

If I understand you correctly, you have a single Tensorboard file, with two loss graphs, one "val" one "train".

This scenario should have had a single output graph in TRAINS ("loss") with two series, one "val" one "train", or two graphs ("val" and "train") with a single series per graph, named "loss".

If this is not what you are getting, could you please send a short code snippet to reproduce the problem?

from clearml.

daganesh avatar daganesh commented on July 29, 2024

Thanks for the quick reply.
In our scenario we have a single experiment with two tensorboard event writer instances, and two different file-names.
One for train and one for validation.
We track both of them simultaneously.
In TensorBoard two "events" files are created, one for each file-name.

The reason for the problem is that all the scalar, plots and graphs have the exact same name for train and validation. This is done on purpose.

The problem is that the latest call to EventWriter will override the previous one, because the exact same hierarchy-names are used for both train and validation.
TrainsAI does not use the events file-name, thus the overriding happens.

Is that clearer?

from clearml.

bmartinn avatar bmartinn commented on July 29, 2024

@daganesh if I understand correctly, you are creating two EventWriters, writing two protobuf log files (Tensorboard log files).
Is this correct?
If this is indeed the case, then currently Trains does not support multiple tensorboard writers in the same experiment. If this is a popular use case, we will add it.

from clearml.

daganesh avatar daganesh commented on July 29, 2024

Correct.
I do believe this is a popular use case.
Please refer also to the mnist code sample in your example projects.
You could see two event writers, one for train and one for validation.
In this example you can also see the problem:

  • I guess you meant to comment out the train writer, but in one place it was not commented out properly.
  • So you get two active writers simultaneously.
  • The results is that in the server plots you can, for every 100 plots points:
    • 99 training sample points
    • 1 validation point (the 100th). This point overrides the train point.

from clearml.

bmartinn avatar bmartinn commented on July 29, 2024

@daganesh thank you for noticing this bug is also evident in the examples.

If you like, you can already test the fix in the latest release-candidate :)
$ pip install trains==0.10.3rc0

The implemented solution logic is: if you only have a single Tensorboard writer, everything stays the same.
If you have an additional writer, using the same combination of title/series, then it will have a prefix/suffix based on the logdir.

You can test it with this example code
I this example since the first used writer is the "testing" writer, then the "train" writer will have a prefix of "train".
We didn't want to force two prefixes, since it is not always that you create Tensorboard writers before you use them, so there is no way to know in advance that there will be two Tensorboard writers...

from clearml.

daganesh avatar daganesh commented on July 29, 2024

Thanks for the quick fix.
We'll give it a try

from clearml.

bmartinn avatar bmartinn commented on July 29, 2024

Hi @daganesh,
The fix is already part of the latest RC, let me know if it works for you as well.
$ pip install trains==0.10.3rc2

from clearml.

bmartinn avatar bmartinn commented on July 29, 2024

Closing, due to lack of activity.

from clearml.

pedropalb avatar pedropalb commented on July 29, 2024

Hi,

I'm using Keras API of Tensorflow 2.1. I monitor my model using the TensorBoard callback. In their current implementation, the validation and train metrics have the same name but are written by two different writers, so the plots are groupped by the name. For example, the validation and train loss curves are plot in the same figure under the name epoch_loss; the validation and train accuracy are plot in the same figure under the name epoch_accuracy; and so on.

In trains 0.14.0, only one curve is plot for each name. I suppose that the validation curves are overwriting the train curves. Is that a bug? Or did I miss some configuration to have the same behaviour of Tensorboard?

Thanks!

from clearml.

pedropalb avatar pedropalb commented on July 29, 2024

As a workaround, I set auto_connect_frameworks=False and created a keras callback as follows:

class TrainsLogger(tf.keras.callbacks.Callback):
    def __init__(self, logger, *args, **kwargs):
        super(TrainsLogger, self).__init__(*args, **kwargs)
        self.logger = logger
        
    def on_epoch_end(self, epoch, logs=None):
        relevant_metric_names = ['loss',
                                 'precision',
                                 'sensitivity',
                                 'specificity',
                                 'balanced_accuracy']
        
        for name in relevant_metric_names:
            self.logger.report_scalar(name, 'train', logs[f'{name}'], epoch)
            self.logger.report_scalar(name, 'validation', logs[f'val_{name}'], epoch)

But unfortunately, this way I loose the awesome automagical advantage! To keep part of it, I could keep auto_connect_frameworks=True, but then the results section becomes a mess with duplicates unless I click "view full screen" and deselect one by one which is not convenient.

from clearml.

bmartinn avatar bmartinn commented on July 29, 2024

Hi @pedropalb , as promised an RC is out with a fix, could you verify ?

$ pip install trains==0.14.1rc2

from clearml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.