When I try to train the model, I cannot find the file "/mnt/nfs/scratch1/pchakrabarty/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, when I use the train "bdd_source_and_HP18k.sh", I find the NUM_GPU is 1. Is

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

initial weight download about detectron-self-train HOT 6 CLOSED

arunirc commented on August 15, 2024

initial weight download

from detectron-self-train.

Comments (6)

PCJohn commented on August 15, 2024 1

All other models are trained by finetuning the baseline model (starting from the bdd_peds.pth checkpoint)
The baseline model is trained "from scratch", but does use the pretrained resnet initialization. You'll have to download this model.

See the section "Download Pretrained Backbone Model" in INSTALL.md: https://github.com/AruniRC/detectron-self-train/blob/master/INSTALL.md

from detectron-self-train.

PCJohn commented on August 15, 2024

This is the path to the baseline model.
You can download it from this location: http://maxwell.cs.umass.edu/self-train/models/bdd_ped_models/bdd_baseline/bdd_peds.pth

The link is in the table in the "Models" section in the README

from detectron-self-train.

liyunsheng13 commented on August 15, 2024

Do you mean using the baseline model as initialization to train other models? But I find for the baseline model, there is no initialization model in the train script. Is there an issue or you do it somewhere else in the source code. I think for the baseline model, when you train it, at least you need to use the pretrained resnet as initialzation. But I don't find you do it.
I trained the baseline model with you code for 70000 iterations and only get 10 mIoU which is worse than the reported result (~15). Do you think it is caused by random initialization?

from detectron-self-train.

AruniRC commented on August 15, 2024

@liyunsheng13 it may be a good first step to make sure you have installed everything correctly and the inference demo is working: https://github.com/AruniRC/detectron-self-train#inference-demo

If the demo is working and giving you the expected detection output, then the training scripts should work properly. If there is any further confusion please let us know.

BTW, the line that loads the Imagenet-pretrained Resnet weights for training BDD baseline is in the config YAML: https://github.com/AruniRC/detectron-self-train/blob/master/configs/baselines/bdd100k.yaml#L7

from detectron-self-train.

liyunsheng13 commented on August 15, 2024

Hi, when I use the train script "bdd_source_and_HP18k.sh", I find the NUM_GPU is 1. Is there a type here? I though it would be 4 or 8. If I use 1, I will have the assertion error. Could you me know how many GPUs you use and the batch size per GPU for both "bdd_source_and_HP18k.sh" and the baseline results. It seems that you use 8 gpus with batch size = 1 for the baseline results which let me a little confused.

from detectron-self-train.

AruniRC commented on August 15, 2024

Hi @liyunsheng13 ,

the detectron train_net_step scales the learning rate and other settings based on (a) the number of GPUs available and (b) the NUM_GPU specified in the training config YAML.

When we trained, we kept the YAML unchanged, and set the number of GPUs at run time (this ensures correct learning rates scaling handled internally in the code). On a cluster this is set by the Slurm option --gres GPU:1 for specifying 1 visible GPU. Similarly, when using a local machine, we had to use CUDA_VISIBLE_DEVICES to 1. If this solves your assertion error, let us know, and we will update the README accordingly.

Also, the baseline BDD detector used a standard training pipeline, and we used 4 to 8 GPUs. For all other models (HP, HP-cons etc) we used a single GPU. Note: the config YAMLs are unchanged, only the run time settings are changed.

I'll tag @PCJohn for any additional comments, and confirming that 1 GPU was used when calling the training script on BDD.

from detectron-self-train.

initial weight download about detectron-self-train HOT 6 CLOSED

Comments (6)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent