Comments (8)
Thanks for reporting that issue.
That was a problem when updating pytorch lightning version. In the older version on_fit_end()
callback function only received 2 positional arguments, I thought I had solved that before updating lightning dependencies... I'll fix that today!
from comet.
I released a version 0.0.6.post1 that solves that... tell me if it works!
Cumprimentos
from comet.
Hey!
This time the model trained successfully according to the logs!
Epoch 2: 100%|██████████| 25000/25000 [1:16:41<00:00, 5.43it/s, loss=0.056, v_num=4-35, pearson=0.924, kendall=0.81, spearman=0.946, avg_loss=0.0621]
Training Report Experiment:
train_loss_step train_loss ... train_avg_loss train_loss_epoch
Epoch 0 0.183138 0.183138 ... 0.099132 NaN
Epoch 1 0.006920 0.006920 ... 0.101763 0.107044
Epoch 2 0.001943 0.001943 ... 0.065580 0.067810
[3 rows x 12 columns]
All looks good, but when inspecting the experiments folder :
Seems like something is missing (the metadata data from the csv)
Whenever I try to load the model:
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from comet.models import load_checkpoint
>>> model = load_checkpoint("events.out.tfevents.1606298119.ip-172-31-41-58.27572.0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/comet/lib/python3.6/site-packages/comet/models/__init__.py", line 135, in load_checkpoint
checkpoint, hparams=hparams
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/core/saving.py", line 132, in load_from_checkpoint
checkpoint = pl_load(checkpoint_path, map_location=lambda storage, loc: storage)
File "/home/ubuntu/comet/lib/python3.6/site-packages/pytorch_lightning/utilities/cloud_io.py", line 32, in load
return torch.load(f, map_location=map_location)
File "/home/ubuntu/comet/lib/python3.6/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/comet/lib/python3.6/site-packages/torch/serialization.py", line 692, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x18'.
I guess that's the correct way of loading the model right? Could you provide an example if not?
Best
Jose
from comet.
Actually the events.out.tfevents.1606298119.ip-172-31-41-58.27572.0
is a tensorboard file! not the checkpoint file. The checkpoint file should end with .ckpt
. From your ls
, it looks like lightning has not saved any checkpoint...
from comet.
I released another post-release version 0.0.6.post2 that should have that fixed.
The problem was the new lightning version that deprecated the file_path
parameter from the ModelCheckpoint and changed the behaviour of the period
parameter. These two updates made the ModelCheckpoint callback useless.
Obrigado mais uma vez! Todos os bugs são bem vindos, especialmente agora no inicio 😃
from comet.
No problems! I'll close the issue.
If you have anything that I can help with I'm interested! Maybe write some examples/docs on how to train a model? Would you be up to that? I've been interested in contributing to a OSS for a while :-)
Obrigado!
from comet.
Yep, that would be awesome! If for example, you write a tutorial on how to train a system we can add that to the documentation!
from comet.
Okay I will do that :-) !
Best
from comet.
Related Issues (20)
- [QUESTION] Train UnifiedMetric/XCOMET with word level predictions. HOT 1
- Sparsemax not actually used in COMET-KIWI, XCOMET-XL/XXL HOT 4
- Invalid link reference of reference-free model in readme
- Minimizing cpu RAM vs only use GPU RAM HOT 1
- what is the precision when load_from_checkpoint?
- Runtime error when loading wmt23-cometkiwi-da-xl HOT 1
- Different scores from different COMET package versions 1.1.2 and 2.2.1 HOT 2
- Different versions of COMET code give different scores with the same model and date.
- [QUESTION] large file scoring HOT 3
- [QUESTION] Splitting big models over multiple GPUs HOT 6
- [QUESTION] Memory footprint HOT 21
- [INPUT] Text Length of Input (source, reference, and hypothesis) HOT 2
- Change the global variable logger to comet_logger HOT 1
- Training script for XCOMET HOT 1
- Safetensors Support
- [QUESTION] OOM when load XCOMET-XXL in A100 with 40G memory for prediction HOT 4
- [QUESTION] why num_layers = num_hidden_layers + 1 HOT 1
- [QUESTION] Comet kiwi architecture HOT 11
- Training data and scripts used for wmt22-cometkiwi-da HOT 3
- Add missing library stubs or py.typed marker
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comet.