Coder Social home page Coder Social logo

Comments (8)

ricardorei avatar ricardorei commented on June 5, 2024

Thanks for reporting that issue.

That was a problem when updating pytorch lightning version. In the older version on_fit_end() callback function only received 2 positional arguments, I thought I had solved that before updating lightning dependencies... I'll fix that today!

from comet.

ricardorei avatar ricardorei commented on June 5, 2024

I released a version 0.0.6.post1 that solves that... tell me if it works!

Cumprimentos

from comet.

ZordoC avatar ZordoC commented on June 5, 2024

Hey!

This time the model trained successfully according to the logs!

Epoch 2: 100%|██████████| 25000/25000 [1:16:41<00:00,  5.43it/s, loss=0.056, v_num=4-35, pearson=0.924, kendall=0.81, spearman=0.946, avg_loss=0.0621] 
                                                              
Training Report Experiment:
         train_loss_step  train_loss  ...  train_avg_loss  train_loss_epoch
Epoch 0         0.183138    0.183138  ...        0.099132               NaN
Epoch 1         0.006920    0.006920  ...        0.101763          0.107044
Epoch 2         0.001943    0.001943  ...        0.065580          0.067810

[3 rows x 12 columns]

All looks good, but when inspecting the experiments folder :

Screenshot 2020-11-25 at 14 59 52

Seems like something is missing (the metadata data from the csv)

Whenever I try to load the model:

Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from comet.models import load_checkpoint
>>> model  = load_checkpoint("events.out.tfevents.1606298119.ip-172-31-41-58.27572.0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/comet/lib/python3.6/site-packages/comet/models/__init__.py", line 135, in load_checkpoint
    checkpoint, hparams=hparams
  File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/core/saving.py", line 132, in load_from_checkpoint
    checkpoint = pl_load(checkpoint_path, map_location=lambda storage, loc: storage)
  File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/utilities/cloud_io.py", line 32, in load
    return torch.load(f, map_location=map_location)
  File "/home/ubuntu/comet/lib/python3.6/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/ubuntu/comet/lib/python3.6/site-packages/torch/serialization.py", line 692, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x18'.

I guess that's the correct way of loading the model right? Could you provide an example if not?

Best

Jose

from comet.

ricardorei avatar ricardorei commented on June 5, 2024

Actually the events.out.tfevents.1606298119.ip-172-31-41-58.27572.0 is a tensorboard file! not the checkpoint file. The checkpoint file should end with .ckpt. From your ls, it looks like lightning has not saved any checkpoint...

from comet.

ricardorei avatar ricardorei commented on June 5, 2024

Screenshot 2020-11-25 at 18 48 48

I released another post-release version 0.0.6.post2 that should have that fixed.

The problem was the new lightning version that deprecated the file_path parameter from the ModelCheckpoint and changed the behaviour of the period parameter. These two updates made the ModelCheckpoint callback useless.

Obrigado mais uma vez! Todos os bugs são bem vindos, especialmente agora no inicio 😃

from comet.

ZordoC avatar ZordoC commented on June 5, 2024

No problems! I'll close the issue.

If you have anything that I can help with I'm interested! Maybe write some examples/docs on how to train a model? Would you be up to that? I've been interested in contributing to a OSS for a while :-)

Obrigado!

from comet.

ricardorei avatar ricardorei commented on June 5, 2024

Yep, that would be awesome! If for example, you write a tutorial on how to train a system we can add that to the documentation!

from comet.

ZordoC avatar ZordoC commented on June 5, 2024

Okay I will do that :-) !

Best

from comet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.