frank-xwang / ride-longtailrecognition Goto Github PK

View Code? Open in Web Editor NEW

259.0 259.0 25.0 483 KB

[ICLR 2021 Spotlight] Code release for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

License: MIT License

Python 100.00%

ride-longtailrecognition's People

Contributors

Stargazers

Watchers

ride-longtailrecognition's Issues

[Conceptual question] Training Stage 1 only is better than full training when # experts is 4

Describe the error
A clear and concise description of what your question is.

Hi, thank you for sharing your code. I have a question regarding the performance. From my understanding, there are several methods that boost performance such as diversity loss and router.

I actually supposed that diversity loss is the main contributor to performance gain by ensemble effect. I test how much the router contributes to performance with four experts (In Table 4, the comparison is conducted with two experts).

When I ran python train.py -c "configs/config_imbalance_cifar100_ride.json" --reduce_dimension 1 --num_experts 4, the test performance was 49.58.

{'loss': 2.396228114891052, 'accuracy': 0.4958, 'many_class_num': 35, 'medium_class_num': 35, 'few_class_num': 30, 'many_shot_acc': 0.68428576, 'medium_shot_acc': 0.51285714, 'few_shot_acc': 0.256}

On the other hand, if I trained EA as well by python train.py -c "configs/config_imbalance_cifar100_ride_ea.json" -r saved/models/Imbalance_CIFAR100_LT_RIDE/0110_143024/model_best.pth --reduce_dimension 1 --num_experts 4, the test performance was just 49.1 which is the same as the reported score on Table 4 (with distillation version).

{'loss': 2.6827446435928346, 'accuracy': 0.4914, 'top_k_acc': 0.7724, 'many_class_num': 35, 'medium_class_num': 35, 'few_class_num': 30, 'many_shot_acc': 0.6851429, 'medium_shot_acc': 0.5065715, 'few_shot_acc': 0.24766666}

Is the purpose of EA reducing computation only (i.e., GFLOPs)?

Thank you so much.

Mention the person who manage the part
Please use git blame to find the person to ping. Mention this person in the issue with "@". Otherwise the right person may not be notified.

Put an x in this checkbox ONLY AFTER you fully read the guidelines about what to put in each type of issue. We will try our best to address your concerns. However, if you do not follow the guidelines, we may not be able to respond. If we miss your issue, send us an email.

Hello, when can the code be released?

Can you release the resulting model from the first phase of training?

This is excellent work in the direction of long tail recognition. I've been following your work and codes recently, also attempting to reproduce the results in the paper using your code. But I don't have enough computational resources to conduct the experiment. Can you release the resulting models from the first stage of training.

The diversity loss has no effect?

Hi, authors! Thank you for doing such an inspiring job and opening the source code! There is a problem when I using your code, the diversity loss seems not having a great effect.

I run your code "RIDE Without Distill (Stage 1)" of 3 experts on CIFAR100-LT using your config, and got validation accuracy 47.8%. And I tried to do some ablation, I make "additional_diversity_factor"=0.45 (the original setting is -0.45), got validation accuracy 48.0%, which is even 0.2% higher than 47.8%. I didn't change any thing else of your codes. Could you help me figure out the problem?

Thanks a lot!

[Error] RIDE Expert Assignment Module Training (Stage 2)

Thanks for your great work, here is an error when I try to use the model as given in the model zoo on iNaturalist data set. The model is based on ResNet50 backbone with 4 experts and distillation. When I try to use the model as the pretrained model to initialize the model during stage 2. It produces the following errors. It seems that the parameters of some layers are not well loaded.
My command lines are:
python train.py -c "configs/config_iNaturalist_resnet50_ride_ea.json" -r afs/RIDE/RIDE_model/imagenet_4experts_distill/checkpoint-epoch5.pth --reduce_dimension 1 --num_experts 4

The reported errors are as follows:

Loading checkpoint: ./RIDE/RIDE_model/imagenet_4experts_distill/checkpoint-epoch5.pth ...
Traceback (most recent call last):
File "/root/workspace/env_run/utils/util.py", line 59, in load_state_dict
own_state["module."+name].copy_(param)
RuntimeError: The size of tensor a (64) must match the size of tensor b (128) at non-singleton dimension 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 110, in
main(config)
File "train.py", line 73, in main
lr_scheduler=lr_scheduler)
File "/root/workspace/env_run/trainer/trainer.py", line 14, in init
super().init(model, criterion, metric_ftns, optimizer, config)
File "/root/workspace/env_run/base/base_trainer.py", line 59, in init
self._resume_checkpoint(config.resume, state_dict_only=state_dict_only)
File "/root/workspace/env_run/base/base_trainer.py", line 211, in _resume_checkpoint
load_state_dict(self.model, state_dict)
File "/root/workspace/env_run/utils/util.py", line 63, in load_state_dict
print("Error in copying parameter {}, source shape: {}, destination shape: {}".format(name, param.shape, own_sta
te[name].shape))
KeyError: 'backbone.layer1.0.conv1.weight'

[Error] An error that may cause the "reweight" flag to fail

Hi, I got some errors in the class RIDELoss() in loss.py.
It seems that if I set the "reweight" flag of loss to be false in the configuration file, there would be an AttributeError which says 'RIDELoss' object has no attribute 'per_cls_weights_base'. I think you may set self.per_cls_weights_base to be None when self.reweight_epoch == -1 in the function _hook_before_epoch() to solve this problem.

CIFAR 10 support?

thanks for your contribution, do you have any plans to support cifar10 dataset?

about multi-experts network

Hello, did you come up with this term of multi-experts network or are there other sources? If there are other sources, could you please send relevant papers? If you come up with this term, could you please introduce it? I think there are few introductions of this term in the paper

LDAM loss implementation issue

Hi Xwang,
was redirected by another issue here
Vanint/SADE-AgnosticLT#5 (comment)

about temperature_mean

I found there is a $temperature_mean * temperature_mean$ ahead of kl-divergence loss, which isn't include in paper, what's the purpose of it?

About routing loss

From the results in Table 4 of the original paper, it seems that the effect of routing loss is not very significant when the number of experts is 2, and even the performance decreases, so what is the benefit of adding routing loss?

How to implement cRT and τ-norm with this framework?

Hi, @TonyLianLong :

Thanks for your contribution, is there any instruction for the implementations of cRT and τ-norm? or could you please kindly provide the hyperparameters of cRT and τ-norm used in in your paper?

Could you release the code that measure the bias and variance?

Thanks!

The mismatch between code and paper

Hello,
Thanks for your great work! I noticed a bug that your paper maximizes the diversity loss while your code minimizes the diversity loss? Or is there something that misunderstands me？I am looking forward to your reply!
Best Regards,
Feng

[Conceptual question] Are RIDE and EA used at the same time?

Describe the error A clear and concise description of what your question is.
I check that the loss used in the ride_ea.json configuration file is -CrossEntropyLoss. The loss used in the ride.json configuration file is RIDELoss. Therefore, the paper proposes that RIDELoss and EA Module can improve performance respectively. Rather than combining the two.

[Error] An pRuntimeError: invalid argument 5: k not in range for dimension at C:/w/b/windows/pytorch/aten/src\THC/generic/THCTensorTopK.cu:26ython exception is encountered at ...

Describe the error A clear and concise description of what the problem is. This question is caused by the top_choices_num which is from stage 2 of training, when I use my data of 37 category for training. This error will be solved when the top_choices_num is set 30. I try to understand this parameter, but I am failure by the code. I would be grateful if you would explain this parameter. Your work is very meaningful to solve the long tail problem, thank you for sharing.

[Conceptual question] Question about self-distillation

Thanks for your excellent code! It's very easy to get started!

What the meaning of --distill_checkpoint path_to_checkpoint? Do I need to pre-train another model and use it for distillation?

I have simply trained a ResNet50 (2 experts) on ImageNet-LT without Self-distillation, and the top-1 accuracy is 53.264%, which is 1% lower than that in the paper. Will it be helpful if I use self-distillation?

How to understand Formula 13 and 14 in your paper?

Describe the error A clear and concise description of what your question is.
Thank you for your work in long-tailed recognition problem. Your work is excellent, and i want to use RIDE in my research file. However, i was confusion in understanding formula 13 and 14. For example, what do γ means? and How to compute it? as well as what do α means? and How to compute it?

The mismatch between your LDAM implementation and the original one

I found your implementation is a little different with the original implementation https://github.com/kaidic/LDAM-DRW. What's more, there is a issue about this in original repository kaidic/LDAM-DRW#13. And I don't know which one is better or correct?

Question about Diversity loss

Thanks for your great work! But I am confused about the reason why diversity loss works . The paper says that it's a regularization term to encourage complementary decisions. Why we need complementary decisions from experts? Don't we want the same correct answer? If the experts' decision is diverse , how can we assure the final output is correct?

[Error] n_gpu model encounted errors

When I use n_gpu=1, everything is OK.
When I use n_gpu=4 in trainning, the procedure makes error as bellow:

Traceback (most recent call last):
File "train.py", line 110, in
main(config)
File "train.py", line 75, in main
trainer.train()
File "/home/xxx/workspace/Oracle/RIDE_IR/base/base_trainer.py", line 76, in train
result = self._train_epoch(epoch)
File "/home/xxx/workspace/Oracle/RIDE_IR/trainer/trainer.py", line 133, in _train_epoch
"logits": self.real_model.backbone.logits
File "/home/xxx/.conda/envs/pytorch_jhon/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1131, in getattr
type(self).name, name))
AttributeError: 'ResNet_s' object has no attribute 'logits'

When I try to fix this, I add "self.logits = []" at Resnet_s intit stage of ride_resenet_cifa.py , another mistake is occured as below:

Have you ever encounted errors like this and could you offer any help? Thanks so much.
My conda torch related packages version are as below:
ffmpeg 4.3 hf484d3e_0 pytorch
pytorch 1.9.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
torchaudio 0.9.0 py37 pytorch
torchvision 0.10.0 py37_cu102 pytorch
python 3.7.11 h12debd9_0 defaults

When will the pertained models be released?

Hi! You have provided the MODEL_ZOO.md. However, I found the Links in the page can not be clicked. I'm waiting for the release of the pertained models.

Thanks in advance.

Request for more results

It seems that currently only top-1 accuracy is reported after training on imagenet-LT, and I also see the script to compute top-k accuracy. I am wondering how can I get the performance on many-shot, medium-shot and few-shot?

the imagenet-lt dataset is note available

Describe the error A clear and concise description of what your question is.
we download the ImageNet-lt form google drive，but when we execute the ‘configs/config_imagenet_lt_resnet10_ride.json’，we find that should be 'train test val' folder in imagenet-lt dataset,but actually there not

the position of experts

Thanks for your great code! According to your paper, the experts are located in layer3 and layer4 of resnet. Do you conduct some ablation study of the position of the experts, such as experts only in layer4 or in layer2, layer3, and layer4?

where is the routing module in the code?

thanks for your excellent work,but i viewed the code and not found the routing module, could you give me a induce?

Mismatched hyper-parameter settings for

Hi @TonyLianLong,

I noticed that the hyper-parameters reported in your paper is same with LDAM:

weight decay is 2e-4
lr decay steps are 120 and 160

but in your config files, they are changed into:

weight decay is 5e-4
lr decay steps are 160, 180

I am quite confused, could you please explain it?

[GPU Utilization] DataLoader iteration speed quite low at the start of every epoch

Hi, there
Thanks for your great job!
During my train (ResNext50 on Imagenet-LT), I used 8 A100 (total-batch size: 1024). I found at the start of every epoch, the dataloader will stuck for around 30 seconds. Even I change the code into DDP mode, there still get stuck 30 seconds. I wonder whether there's some problem related to the ImageNetLTDataLoader?

Best,

Question on the function of the normalized linear layer

Hi, I just read the released code and found the implementation of the normalized linear layer and the scale factor. I wonder if there exists a special purpose for this design, like training stability?

[Error] An python exception is encountered at stage 2

Describe the error A clear and concise description of what the problem is.
Hi, thanks for your great work.

After training Stage 1 and obtain the checkpoint, I start running the code for stage 2. However, there is an error "RuntimeError: grad can be implicitly created only for scalar outputs" in self._train_epoch(epoch). Google said this is because the loss is not scalar. I have no idea what is wrong with this code since it is packaged very well. Expect your reply.

About the change of network structure

I observed that in the experiment, you used more experts, such as 3 or 4 experts, and in this case, the number of parameters and computation of the base network increased, but I think you should add a more comparative experiment to show whether the overall performance gain is more attributed to your proposed training method or to the increase of the network parameters. For example, when comparing with other methods, you should keep the network computation and number of parameters consistent while also keeping the network structure consistent, because I noticed that your ResNet network structure is different from the original network structure, although you claim that your computation is consistent. But I suspect that the change in network structure may have a significant impact on the overall performance as well. Anyway, my idea is that when comparing with the baseline method, the network structure used in the baseline method should be the same as your network structure, so that we can exclude the effect of the change of the network structure, and finally we can conclude that the various loss functions proposed in your paper are meaningful and valid.

The training accuracy of imagenet data set is not up to standard

I conducted the three expert training according to the readme method, and the training result of ImageNet-LT can only reach about 51.8%. May I ask why?

frank-xwang / ride-longtailrecognition Goto Github PK

ride-longtailrecognition's People

Contributors

Stargazers

Watchers

Forkers

ride-longtailrecognition's Issues

Recommend Projects

Recommend Topics

Recommend Org