Hello, executing python train.py --dataset owndataset --vit_name R50-ViT-B_16

train error: after a time, NaN or Inf found in input tensor. about transunet HOT 2 OPEN

beckschen commented on August 24, 2024

train error: after a time, NaN or Inf found in input tensor.

from transunet.

Comments (2)

Beckschen commented on August 24, 2024

Hello, thanks for your questions. May I kindly ask what's the learning rate you used in batch_size=12?

from transunet.

andife commented on August 24, 2024

The following is the start message

(base) user@pc1:~/project_TransUNet/TransUNet$ python train.py --dataset owndataset --vit_name R50-ViT-B_16 --batch_size 12 --max_iterations 1000 --max_epochs 350
Namespace(base_lr=0.005, batch_size=12, dataset='Owndataset', deterministic=1, exp='TU_Owndataset224', img_size=224, is_pretrain=True, list_dir='./lists/lists_Owndataset', max_epochs=350, max_iterations=1000, n_gpu=1, n_skip=3, num_classes=2, root_path='../data/Owndataset/train_npz', seed=1234, vit_name='R50-ViT-B_16', vit_patches_size=16)
The length of train set is: 234
20 iterations per epoch. 7000 max iterations
0%| | 0/350 [00:00<?, ?it/s]iteration 1 : loss : 0.541960, loss_ce: 0.527893

I realized that I changed the train.py file in order to match the test.py

I added the following lines:
< if args.batch_size != 24 and args.batch_size % 6 == 0:
< args.base_lr *= args.batch_size / 24

=> Maybe it would be better if you didn't have to specify all the individual parameters for training again during the test, but only the checkpoint directory for the specific model values? My starting point was that due to the different LR treatment, the same command line arguments did not allow me to run a training and then a test.

from transunet.

train error: after a time, NaN or Inf found in input tensor. about transunet HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent