Thank you for sharing your great work! I have a trouble when I learn to use this t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks AndyResume is working fine nowRegardsJoram <span clas

How to resume downstream training,about s3prl/s3prl

andi611 commented on June 26, 2024

Hi, the --ckpt is for loading the pre-trained model and not downstream ckpt.
However, I've added the function you need in this update: 91be1dc

Now you can use the following command to resume downstream training:
python run_downstream.py --resume=./result/result_transformer_cpc_phone/exp_632/states-70000.ckpt

from s3prl.

liu-x-p commented on June 26, 2024

@andi611 Thank you for reply! I will update the code and try that.

from s3prl.

joram5 commented on June 26, 2024

Hi Andy,

I refreshed my code to get the --resume option. I am getting the following fail.

I am running:

python run_downstream.py --resume=./result/result_transformer_cpc_phone/exp_632/states-8000.ckpt

I get the following fail:

[run_downstream] - getting upstream model: transformer
[Transformer] - Pre-trained weights loaded!
[Transformer] - Number of parameters: 85087488
Traceback (most recent call last):
File "run_downstream.py", line 248, in
main()
File "run_downstream.py", line 231, in main
train_loader, dev_loader, test_loader = get_all_dataloaders(args, config['dataloader'])
KeyError: 'dataloader'

What can be the problem?

Regards
Joram

from s3prl.

andi611 commented on June 26, 2024

Hi Andy,

I refreshed my code to get the --resume option. I am getting the following fail.

I am running:

python run_downstream.py --resume=./result/result_transformer_cpc_phone/exp_632/states-8000.ckpt

I get the following fail:

[run_downstream] - getting upstream model: transformer
[Transformer] - Pre-trained weights loaded!
[Transformer] - Number of parameters: 85087488
Traceback (most recent call last):
File "run_downstream.py", line 248, in
main()
File "run_downstream.py", line 231, in main
train_loader, dev_loader, test_loader = get_all_dataloaders(args, config['dataloader'])
KeyError: 'dataloader'

What can be the problem?

Hi,

I found that this error is caused by a bug in our code.
For your current situation,
please change this line to the following:

config = yaml.load(open('/result/result_transformer_cpc_phone/exp_632/downstream.yaml', 'r'), Loader=yaml.FullLoader)

I've updated a permanent fix to this bug in this commit, you can refresh your code once again after you get your current ckpt resumed with the previous temporary fix.
The permanent fix will allow future ckpts to run without error with --resume.

I hope this helps,
Andy

from s3prl.

joram5 commented on June 26, 2024

Thanks Andy Resume is working fine now Regards Joram

…

________________________________ מאת: Andy T. Liu <[email protected]> ‏‏נשלח: יום שלישי 27 אוקטובר 2020 09:20 ‏‏אל: andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning <Self-Supervised-Speech-Pretraining-and-Representation-Learning@noreply.github.com> עותק: Joram Peer <[email protected]>; Comment <[email protected]> ‏‏נושא: Re: [andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning] How to resume downstream training (#40) Hi Andy, I refreshed my code to get the --resume option. I am getting the following fail. I am running: python run_downstream.py --resume=./result/result_transformer_cpc_phone/exp_632/states-8000.ckpt I get the following fail: [run_downstream] - getting upstream model: transformer [Transformer] - Pre-trained weights loaded! [Transformer] - Number of parameters: 85087488 Traceback (most recent call last): File "run_downstream.py", line 248, in main() File "run_downstream.py", line 231, in main train_loader, dev_loader, test_loader = get_all_dataloaders(args, config['dataloader']) KeyError: 'dataloader' What can be the problem? Hi, I found that this error is caused by a bug in our code. For your current situation, please change this line<https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/blob/9d83f757aa9454d72979e626ebf8124913b3cf5f/run_downstream.py#L101> to the following: config = yaml.load(open('/result/result_transformer_cpc_phone/exp_632/downstream.yaml', 'r'), Loader=yaml.FullLoader) I've updated a permanent fix to this bug in this commit<d2aafd8>, you can refresh your code once again after you get your current ckpt resumed with the previous temporary fix. I hope this helps, Andy — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#40 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AMCELQL2VNTM2ZRRLX7BJDLSMZYFHANCNFSM4SRT37VQ>.

from s3prl.

joram5 commented on June 26, 2024

Hi Andy, The resume is running very well now. I have further two questions if it is ok with you: 1. After each resume there is a drop in acuracy (ACC), is this expected behaviour? : [cid:f8c2ef0c-22f0-4c67-914a-e296f313f580] 2. I have run 340000 iterations so far. Does the above convergance rate seem reasonable? Should I try to increase the learning rate? Further info: I started with the following command line: python run_downstream.py --run=phone_linear --upstream=transformer --ckpt=../S3PRL/mockingjay/fmllrLarge960-T-libri/states-1000000.ckpt Resume commands lines are in the form: python run_downstream.py --resume=./result/result_transformer_cpc_phone/exp_632/states-324000.ckpt Thank you Regards Joram

…

________________________________ מאת: Joram Peer <[email protected]> ‏‏נשלח: יום רביעי 28 אוקטובר 2020 00:59 ‏‏אל: andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning <[email protected]> עותק: Comment <[email protected]> ‏‏נושא: Re:‏ [andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning] How to resume downstream training (#40) Thanks Andy Resume is working fine now Regards Joram

________________________________ מאת: Andy T. Liu <[email protected]> ‏‏נשלח: יום שלישי 27 אוקטובר 2020 09:20 ‏‏אל: andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning <Self-Supervised-Speech-Pretraining-and-Representation-Learning@noreply.github.com> עותק: Joram Peer <[email protected]>; Comment <[email protected]> ‏‏נושא: Re: [andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning] How to resume downstream training (#40) Hi Andy, I refreshed my code to get the --resume option. I am getting the following fail. I am running: python run_downstream.py --resume=./result/result_transformer_cpc_phone/exp_632/states-8000.ckpt I get the following fail: [run_downstream] - getting upstream model: transformer [Transformer] - Pre-trained weights loaded! [Transformer] - Number of parameters: 85087488 Traceback (most recent call last): File "run_downstream.py", line 248, in main() File "run_downstream.py", line 231, in main train_loader, dev_loader, test_loader = get_all_dataloaders(args, config['dataloader']) KeyError: 'dataloader' What can be the problem? Hi, I found that this error is caused by a bug in our code. For your current situation, please change this line<https://github.com/andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning/blob/9d83f757aa9454d72979e626ebf8124913b3cf5f/run_downstream.py#L101> to the following: config = yaml.load(open('/result/result_transformer_cpc_phone/exp_632/downstream.yaml', 'r'), Loader=yaml.FullLoader) I've updated a permanent fix to this bug in this commit<d2aafd8>, you can refresh your code once again after you get your current ckpt resumed with the previous temporary fix. I hope this helps, Andy — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#40 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AMCELQL2VNTM2ZRRLX7BJDLSMZYFHANCNFSM4SRT37VQ>.

from s3prl.

andi611 commented on June 26, 2024

Hi,

Since I never used the --resume function, so my experience on this subject is limited.
However, my co-worker tells me that the "drop" (a vertical line?) you see may be caused by a mismatch in the time step for the Tensorboard log. Hence this is an error in the display of the training log, and it does not really affect the model's performance.
We've updated this line, I think it will resolve this mismatched timestep issue.

For your second question, although I am having trouble seeing the image you've provided, but I believe the convergence rate is reasonable. I've run this task many times (without the use of --resume), the results are pretty stable.

Phone classification over 100 hrs of training data should take a while for the model to converge (on training set). However, the performance on the test set is already very high even after the first 10000 steps of training.

from s3prl.

joram5 commented on June 26, 2024

Thanks Andy Regards Joram

…

________________________________ From: Andy T. Liu <[email protected]> Sent: 29 October 2020 18:35 To: andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning <Self-Supervised-Speech-Pretraining-and-Representation-Learning@noreply.github.com> Cc: Joram Peer <[email protected]>; Comment <[email protected]> Subject: Re: [andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning] How to resume downstream training (#40) Hi, Since I never used the --resume function, so my experience on this subject is limited. However, my co-worker tells me that the "drop" (a vertical line?) you see may be caused by a mismatch in the time step for the Tensorboard log. Hence this is an error in the display of the training log, and it does not really affect the model's performance. We've updated this line<6690f64>, I think it will resolve this mismatched timestep issue. For your second question, although I am having trouble seeing the image you've provided, but I believe the convergence rate is reasonable. I've run this task many times (without the use of --resume), the results are pretty stable. Phone classification over 100 hrs of training data should take a while for the model to converge (on training set). However, the performance on the test set is already very high even after the first 10000 steps of training. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#40 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AMCELQIF4OJ6P42SLA74R4LSNGKUZANCNFSM4SRT37VQ>.

from s3prl.

How to resume downstream training about s3prl HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent