When I finetune the sam model sam_h, it has been killed after run about epoch 0 2%.</p

What about the gpu's memory? about segment-anything-finetuner HOT 3 OPEN

skyfallsss commented on June 23, 2024

What about the gpu's memory?

from segment-anything-finetuner.

Comments (3)

nikolaydyankov commented on June 23, 2024

Memory is a big issue for me too, I'm training the smallest model with batch size of 1 with both encoders frozen and it still eats 22GB VRAM. If the model could be trained on 640x640 instead of 1024 images that would be a huge memory saver.

from segment-anything-finetuner.

skyfallsss commented on June 23, 2024

I met some problem.It stops after a few epochs.

from segment-anything-finetuner.

skyfallsss commented on June 23, 2024

The errors are as follow.
/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:636: UserWarning: ModelCheckpoint(monitor='val_per_mask_iou') not found in the returned metrics: ['loss', 'loss_focal', 'loss_dice', 'loss_iou', 'train_per_mask_iou']. HINT: Did you call self.log('val_per_mask_iou', value) in the LightningModule?
warning_cache.warn(m)
Traceback (most recent call last):
File "finetune.py", line 414, in
main()
File "finetune.py", line 398, in main
trainer.fit(model)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in fit
self._call_and_handle_interrupt(
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 772, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1194, in _run
self._dispatch()
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1274, in _dispatch
self.training_type_plugin.start_training(self)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1284, in run_stage
return self._run_train()
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1314, in _run_train
self.fit_loop.run()
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 199, in advance
self.update_lr_schedulers("step", update_plateau_schedulers=False)
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 441, in update_lr_schedulers
self._update_learning_rates(
File "/home/kemosheng/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 505, in _update_learning_rates
lr_scheduler["scheduler"].step()
File "/home/kemosheng/anaconda3/envs/sam/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 161, in step
values = self.get_lr()
File "/home/kemosheng/anaconda3/envs/sam/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 260, in get_lr
return [base_lr * lmbda(self.last_epoch)
File "/home/kemosheng/anaconda3/envs/sam/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 260, in
return [base_lr * lmbda(self.last_epoch)
File "finetune.py", line 299, in warmup_step_lr
if steps >= milestone * self.trainer.estimated_stepping_batches:
AttributeError: 'Trainer' object has no attribute 'estimated_stepping_batches'

from segment-anything-finetuner.

What about the gpu's memory? about segment-anything-finetuner HOT 3 OPEN

Comments (3)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent