Thank you for sharing your great work!! When I ran train.py with my

ValueError: `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006) about lama HOT 4 CLOSED

advimman commented on August 21, 2024

ValueError: `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006)

from lama.

Comments (4)

cohimame commented on August 21, 2024 7

Hello!
Sorry for the late answer!

Suppose, we have 10000 images training dataset, 2 GPU's and a batch_size=5 images per GPU (in .yaml config we define batch_size per GPU). Then:

total_batch_size == 10 (2 gpu x 5 imgs)
train_batches == 1000 (10000 imgs / total_batch_size)

limit_train_batches <= 1000
val_check_interval <=1000

from lama.

cohimame commented on August 21, 2024

Hi!
Thank you for your appreciation of our work!

[2021-12-07 11:28:26,947][main][CRITICAL] - Training failed due to val_check_interval (25000) must be less than or equal to the number of the training batches (3006). If you want to disable validation set limit_val_batches to 0.0 instead.:

With the current batch_size and 12022 train images our training procedure produces 3006 training batches. While we ask it to perform validation after every 25000 batch in epoch.

The simplest way to fix this error is to open
lama/configs/training/trainer/any_gpu_large_ssim_ddp_final.yaml
and edit following parameters:

# @package _group_
kwargs:
  ...
  limit_train_batches: 3006 # <- was 25000
  val_check_interval: 3006 # or less, before it was ${trainer.kwargs.limit_train_batches}

Let us know if it did work out!

from lama.

naoki7090624 commented on August 21, 2024

Thank you for quick response!!

I edited these parameters and ran it again.

# @package _group_
kwargs:
  ...
  limit_train_batches: 3006
  val_check_interval: 3006

But I got the following error.

ValueError: `val_check_interval` (3006) must be less than or equal to the number of the training batches (1203). If you want to disable validation set `limit_val_batches` to 0.0 instead.

Therefore, I edit to val_check_interval: 1203 and it worked fine.

I am concerned if there are appropriate parameters. How can I calculate the appropriate limit_train_batches and val_check_interval?

from lama.

naoki7090624 commented on August 21, 2024

I got it! Thank you!!

from lama.

Recommend Projects

ValueError: `val_check_interval` (25000) must be less than or equal to the number of the training batches (3006) about lama HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent