aimagelab / vkd Goto Github PK

PyTorch code for ECCV 2020 paper: "Robust Re-Identification by Multiple Views Knowledge Distillation"

License: MIT License

Python 100.00%

deep-learning re-id knowledge-distillation eccv-2020

vkd's Introduction

Robust Re-Identification by Multiple Views Knowledge Distillation

This repository contains Pytorch code for the ECCV20 paper "Robust Re-Identification by Multiple Views Knowledge Distillation" [arXiv]

@inproceedings{porrello2020robust,    
    title={Robust Re-Identification by Multiple Views Knowledge Distillation},
    author={Porrello, Angelo and Bergamini, Luca and Calderara, Simone},
    booktitle={European Conference on Computer Vision},
    pages={93--110},
    year={2020},
    organization={Springer}
}

Installation Note

Tested with Python3.6.8 on Ubuntu (17.04, 18.04).

Setup an empty pip environment
Install packages using pip install -r requirements.txt
Install torch1.3.1 using pip install torch==1.3.1+cu92 torchvision==0.4.2+cu92 -f https://download.pytorch.org/whl/torch_stable.html
Place datasets in .datasets/ (Please note you may need do request some of them to their respective authors)
Run scripts from commands.txt

Please note that if you're running the code from Pycharm (or another IDE) you may need to manually set the working path to PROJECT_PATH

VKD Training (MARS [1])

Data preparation

Create the folder ./datasets/mars
Download the dataset from here
Unzip data and place the two folders inside the MARS [1] folder
Download metadata from here
Place them in a folder named info under the same path
You should end up with the following structure:

PROJECT_PATH/datasets/mars/
|-- bbox_train/
|-- bbox_test/
|-- info/

Teacher-Student Training

First step: the backbone network is trained for the standard Video-To-Video setting. In this stage, each training example comprises of N images drawn from the same tracklet (N=8 by default; you can change it through the argument --num_train_images.

# To train ResNet-50 on MARS (teacher, first step) run:
python ./tools/train_v2v.py mars --backbone resnet50 --num_train_images 8 --p 8 --k 4 --exp_name base_mars_resnet50 --first_milestone 100 --step_milestone 100

Second step: we appoint it as the teacher and freeze its parameters. Then, a new network with the role of the student is instantiated. In doing so, we feed N views (i.e. images captured from multiple cameras) as input to the teacher and ask the student to mimic the same outputs from fewer (M=2 by default,--num_student_images) frames.

# To train a ResVKD-50 (student) run:
python ./tools/train_distill.py mars ./logs/base_mars_resnet50 --exp_name distill_mars_resnet50 --p 12 --k 4 --step_milestone 150 --num_epochs 500

Model Zoo

We provide a bunch of pre-trained checkpoints through two zip files (baseline.zip containing the weights of the teacher networks, distilled.zip the student ones). Therefore, to evaluate ResNet-50 and ResVKD-50 on MARS, proceed as follows:

Download baseline.zip from here and distilled.zip from here (~4.8 GB)
Unzip the two folders inside the PROJECT_PATH/logs folder
Then, you can evaluate both networks using the eval.py script:

python ./tools/eval.py mars ./logs/baseline_public/mars/base_mars_resnet50 --trinet_chk_name chk_end

python ./tools/eval.py mars ./logs/distilled_public/mars/selfdistill/distill_mars_resnet50 --trinet_chk_name chk_di_1

You should end up with the following results on MARS (see Tab.1 of the paper for VeRi-776 and Duke-Video-ReID):

Backbone	top1 I2V	mAP I2V	top1 V2V	mAP V2V
`ResNet-34`	80.81	70.74	86.67	78.03
`ResVKD-34`	82.17	73.68	87.83	79.50
`ResNet-50`	82.22	73.38	87.88	81.13
`ResVKD-50`	83.89	77.27	88.74	82.22
`ResNet-101`	82.78	74.94	88.59	81.66
`ResVKD-101`	85.91	77.64	89.60	82.65

Backbone	top1 I2V	mAP I2V	top1 V2V	mAP V2V
`ResNet-50bam`	82.58	74.11	88.54	81.19
`ResVKD-50bam`	84.34	78.13	89.39	83.07

Backbone	top1 I2V	mAP I2V	top1 V2V	mAP V2V
`DenseNet-121`	82.68	74.34	89.75	81.93
`DenseVKD-121`	84.04	77.09	89.80	82.84

Backbone	top1 I2V	mAP I2V	top1 V2V	mAP V2V
`MobileNet-V2`	78.64	67.94	85.96	77.10
`MobileVKD-V2`	83.33	73.95	88.13	79.62

Teacher-Student Explanations

As discussed in the main paper, we have leveraged GradCam [2] to highlight the input regions that have been considered paramount for predicting the identity. We have performed the same analysis for the teacher network as well as for the student one: as can be seen, the latter pays more attention to the subject of interest compared to its teacher.

You can draw the heatmaps with the following command:

python -u ./tools/save_heatmaps.py mars <path-to-teacher-net> --chk_net1 <teacher-checkpoint-name> <path-to-student-net> --chk_net2 <student-checkpoint-name> --dest_path <output-dir>

References

Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision (2016)
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618-626).

vkd's People

Contributors

Stargazers

Watchers

Forkers

trantorrepository zyg11 cv-ip boomshakay wuyangfeng kuruparan julestalloen wll1998-sudo yurongchen1998 eduardoandrade tim-van-kemenade chrisbyd kingsman0000 jhyuuu dl-kd

vkd's Issues

About multi-camera Multi-shot test

Thank you for your great job!
In Figure 3, you use the multi-shot feature fusion of multi-camera for testing. I would like to ask you for a feature fusion of a multi-camera. Do you exclude all the cameras of the same id that participate in the fusion in the gallery? Or just remove a camera?

How to evaluate I2V using one network?

How to evaluate I2V using one network, and where are the codes?

About training and testing details

Hi!
There are very few details about training and testing in your article. Do you have any supplementary materials?If not, can you describe it to me in detail? Thanks!
yours

RuntimeError: result type Long can't be cast to the desired output type Bool

Hi, guys：

First of all, thank you for your outstanding work, but when I was training the teacher network, I encountered such a problem.

python train_v2v.py mars --backbone resnet50 --num_train_images 8 --p 4 --k 4 --exp_name base_mars_resnet50 --first_milestone 100 --step_milestone 100

=> MARS loaded
Dataset statistics:

subset | # ids | # tracklets | # images

train | 625 | 8298 | 509914
query | 626 | 1980 | 114493
gallery | 622 | 9330 | 543216

total | 1251 | 19608 | 1167623

number of images per tracklet: 2 ~ 920, average 59.5

EXP_NAME: base_mars_resnet50
Traceback (most recent call last):
File "train_v2v.py", line 125, in
main()
File "train_v2v.py", line 96, in main
triplet_loss_batch = triplet_loss(embeddings, y)
File "/home/fei/code/VKD/model/loss.py", line 208, in call
return super(OnlineTripletLoss, self).call(*args, **kwargs)
File "/home/fei/anaconda3/envs/VKD/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fei/code/VKD/model/loss.py", line 146, in forward
negative_mask = same_id_mask ^ 1
RuntimeError: result type Long can't be cast to the desired output type Bool

How can I fix it ？

How to run it on custom video??

Heatmap code?

Can you upload the code that draws the heatmap in your paper, thanks.

How to conduct cross-architecture transfer?

How to conduct cross-architecture transfer? What is the command? or what codes need I revise? Thanks.

Structure of DukeMTMC-VideoReID

Hello!

I would like to transform a dataset after the DukeMTMC-VideoReID structure used in this project and to just plug it in. However, it seems that the structure of Duke used in this project is not the same as https://github.com/Yu-Wu/DukeMTMC-VideoReID. Could you provide more details on how you structured the Duke dataset before giving it to the network?

Thank you!

which network is used for evaluation?

I am confused for the evaluation codes:
for idx_iteration in range(args.num_generations):
print(f'starting generation {idx_iteration+1}')
print('#'*100)
teacher_net = d_trainer(teacher_net, student_net)
d_trainer.evaluate(teacher_net)
teacher_net.teacher_mode()

    student_net = deepcopy(teacher_net)
    saver.save_net(student_net, f'chk_di_{idx_iteration + 1}')

    student_net.reinit_layers(args.reinit_l4, args.reinit_l3)

Do you use student network or teacher network for evaluation?

what is the meaning of several parameters?

what is the meaning of p and k in the training command line?:
python ./tools/train_distill.py mars ./logs/base_mars_resnet50 --exp_name distill_mars_resnet50 --p 12 --k 4 --step_milestone 150 --num_epochs 500

And what relationships between these to parameters with the N and M in your paper?

Error on training: stack expects a non-empty TensorList

Hello,
I am trying to reproduce the results on Google Collab. I followed the instructions, but maybe I have mistaken something:
I have all the needed files on a Google Drive and, therefore, the datasets structure looks like this: VKD-master/datasets/mars and there are 3 folders (info, bbox_train, bbox_test).
However, when I start running the training command, I get the following error:
"RuntimeError: stack expects a non-empty TensorList"
I have attached a file of the entire log. Also, when I tried on the pre-trained model, the same error occurred.
Any change I do, brings me back to this same error. Please, guide me towards a solution. Any idea is welcomed.
Thank you,
Anca

inputs

I'm sorry I have little knownledge about video reid.
What form is the input？Is it a sequential input？

Checkpoint zip is broken

(base) ➜  Downloads unzip distilled.zip 
Archive:  distilled.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of distilled.zip or
        distilled.zip.zip, and cannot find distilled.zip.ZIP, period.

(base) ➜  Downloads
~ 7z x distilled.zip 

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz (A0671),ASM,AES-NI)

Scanning the drive for archives:
1 file, 2079948095 bytes (1984 MiB)

Extracting archive: distilled.zip

ERRORS:
Headers Error
Unconfirmed start of archive


WARNINGS:
There are data after the end of archive

--
Path = distilled.zip
Type = zip
ERRORS:
Headers Error
Unconfirmed start of archive
WARNINGS:
There are data after the end of archive
Physical Size = 91999574
Tail Size = 1987948521

ERROR: CRC Failed : distilled_public/duke/crossdistill/distill_duke_resnet101_to_resnet34/chk/chk_di_1
                                                                            
Sub items Errors: 1

Archives with Errors: 1

Warnings: 1

Open Errors: 1

Sub items Errors: 1
(base) ➜  Downloads sudo apt-get install fastjar

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  chromium-codecs-ffmpeg-extra fonts-open-sans gir1.2-goa-1.0 libceres1 libfwupdplugin1 libpython2-dev libqrcodegencpp1 librlottie0-1 libxcb-screensaver0 libxxhash0 python3-cached-property python3-docker
  python3-dockerpty python3-docopt python3-jsonschema python3-pyrsistent python3-texttable python3-websocket
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  fastjar
0 upgraded, 1 newly installed, 0 to remove and 295 not upgraded.
Need to get 66,7 kB of archives.
After this operation, 175 kB of additional disk space will be used.
0% [Working]
Get:1 http://ru.archive.ubuntu.com/ubuntu focal/universe amd64 fastjar amd64 2:0.98-6build1 [66,7 kB]
Fetched 66,7 kB in 6s (12,1 kB/s)       
Selecting previously unselected package fastjar.
(Reading database ... 445500 files and directories currently installed.)
Preparing to unpack .../fastjar_2%3a0.98-6build1_amd64.deb ...
Unpacking fastjar (2:0.98-6build1) ...
Setting up fastjar (2:0.98-6build1) ...
Processing triggers for install-info (6.7.0.dfsg.2-5) ...
Processing triggers for man-db (2.9.1-1) ...
(base) ➜  Downloads jar xvf distilled.zip 
  inflated: distilled_public/duke/crossdistill/distill_duke_resnet101_to_mobilenet/chk/chk_di_1
  inflated: distilled_public/duke/crossdistill/distill_duke_resnet101_to_mobilenet/params/hparams.json
  inflated: distilled_public/duke/crossdistill/distill_duke_resnet101_to_mobilenet/params/params.json
Error inflating file! (-3)

Animal Re-ID Details

Hi Authors, Thanks for the nice work and sharing the GitHub code!

I am trying to reproduce the results of Table 7 for Amur Tiger for animal re-identification task. The command.txt of the released code has commands for Image To Video and Video To Video settings. It can be a great help to let me know how to use the GitHub repository for the Image 2 Image setting.

Also, in the tools/ subdirectory, there are two training files, train_v2v.py, and train_distill.py. Which of these two files should I use for the Image2Image setting? If possible, please let me know the details of the hyperparameter used for the same.

Thanks again!

How many GPUs are used for training?

How many GPUs are used for training? I used 8 gpus, and the results are not good.

eval.py size mismatch(distill(student) part)

Hello,I'm confuse about eval part.First,when i use "python ./tools/eval.py mars ./logs/distilled_public/mars/selfdistill/distill_mars_resnet50 --trinet_chk_name chk_di_1",it can show the result as the table(top1,map...)
But when I want to eval resnet34 and change it to"python ./tools/eval.py mars ./logs/distilled_public/mars/selfdistill/distill_mars_resnet34 --trinet_chk_name chk_di_1" ,
it'll show size mismatch wrong.Can any one help me with the problem？Thanks a lot！

Here is the wrong message:

Traceback (most recent call last):
File "/home/kingsman/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/203.7148.72/plugins/python-ce/helpers/pydev/pydevd.py", line 1477, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/kingsman/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/203.7148.72/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/kingsman/VKD-master/tools/eval.py", line 219, in
main()
File "/home/kingsman/VKD-master/tools/eval.py", line 210, in main
net.load_state_dict(state_dict)
File "/home/kingsman/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TriNet:
Missing key(s) in state_dict: "backbone.features_layers.1.0.0.conv3.weight", "backbone.features_layers.1.0.0.bn3.weight", "backbone.features_layers.1.0.0.bn3.bias", "backbone.features_layers.1.0.0.bn3.running_mean", "backbone.features_layers.1.0.0.bn3.running_var", "backbone.features_layers.1.0.0.downsample.0.weight", "backbone.features_layers.1.0.0.downsample.1.weight", "backbone.features_layers.1.0.0.downsample.1.bias", "backbone.features_layers.1.0.0.downsample.1.running_mean", "backbone.features_layers.1.0.0.downsample.1.running_var", "backbone.features_layers.1.0.1.conv3.weight", "backbone.features_layers.1.0.1.bn3.weight", "backbone.features_layers.1.0.1.bn3.bias", "backbone.features_layers.1.0.1.bn3.running_mean", "backbone.features_layers.1.0.1.bn3.running_var", "backbone.features_layers.1.0.2.conv3.weight", "backbone.features_layers.1.0.2.bn3.weight", "backbone.features_layers.1.0.2.bn3.bias", "backbone.features_layers.1.0.2.bn3.running_mean", "backbone.features_layers.1.0.2.bn3.running_var", "backbone.features_layers.2.0.0.conv3.weight", "backbone.features_layers.2.0.0.bn3.weight", "backbone.features_layers.2.0.0.bn3.bias", "backbone.features_layers.2.0.0.bn3.running_mean", "backbone.features_layers.2.0.0.bn3.running_var", "backbone.features_layers.2.0.1.conv3.weight", "backbone.features_layers.2.0.1.bn3.weight", "backbone.features_layers.2.0.1.bn3.bias", "backbone.features_layers.2.0.1.bn3.running_mean", "backbone.features_layers.2.0.1.bn3.running_var", "backbone.features_layers.2.0.2.conv3.weight", "backbone.features_layers.2.0.2.bn3.weight", "backbone.features_layers.2.0.2.bn3.bias", "backbone.features_layers.2.0.2.bn3.running_mean", "backbone.features_layers.2.0.2.bn3.running_var", "backbone.features_layers.2.0.3.conv3.weight", "backbone.features_layers.2.0.3.bn3.weight", "backbone.features_layers.2.0.3.bn3.bias", "backbone.features_layers.2.0.3.bn3.running_mean", "backbone.features_layers.2.0.3.bn3.running_var", "backbone.features_layers.3.0.0.conv3.weight", "backbone.features_layers.3.0.0.bn3.weight", "backbone.features_layers.3.0.0.bn3.bias", "backbone.features_layers.3.0.0.bn3.running_mean", "backbone.features_layers.3.0.0.bn3.running_var", "backbone.features_layers.3.0.1.conv3.weight", "backbone.features_layers.3.0.1.bn3.weight", "backbone.features_layers.3.0.1.bn3.bias", "backbone.features_layers.3.0.1.bn3.running_mean", "backbone.features_layers.3.0.1.bn3.running_var", "backbone.features_layers.3.0.2.conv3.weight", "backbone.features_layers.3.0.2.bn3.weight", "backbone.features_layers.3.0.2.bn3.bias", "backbone.features_layers.3.0.2.bn3.running_mean", "backbone.features_layers.3.0.2.bn3.running_var", "backbone.features_layers.3.0.3.conv3.weight", "backbone.features_layers.3.0.3.bn3.weight", "backbone.features_layers.3.0.3.bn3.bias", "backbone.features_layers.3.0.3.bn3.running_mean", "backbone.features_layers.3.0.3.bn3.running_var", "backbone.features_layers.3.0.4.conv3.weight", "backbone.features_layers.3.0.4.bn3.weight", "backbone.features_layers.3.0.4.bn3.bias", "backbone.features_layers.3.0.4.bn3.running_mean", "backbone.features_layers.3.0.4.bn3.running_var", "backbone.features_layers.3.0.5.conv3.weight", "backbone.features_layers.3.0.5.bn3.weight", "backbone.features_layers.3.0.5.bn3.bias", "backbone.features_layers.3.0.5.bn3.running_mean", "backbone.features_layers.3.0.5.bn3.running_var", "backbone.features_layers.4.0.0.conv3.weight", "backbone.features_layers.4.0.0.bn3.weight", "backbone.features_layers.4.0.0.bn3.bias", "backbone.features_layers.4.0.0.bn3.running_mean", "backbone.features_layers.4.0.0.bn3.running_var", "backbone.features_layers.4.0.1.conv3.weight", "backbone.features_layers.4.0.1.bn3.weight", "backbone.features_layers.4.0.1.bn3.bias", "backbone.features_layers.4.0.1.bn3.running_mean", "backbone.features_layers.4.0.1.bn3.running_var", "backbone.features_layers.4.0.2.conv3.weight", "backbone.features_layers.4.0.2.bn3.weight", "backbone.features_layers.4.0.2.bn3.bias", "backbone.features_layers.4.0.2.bn3.running_mean", "backbone.features_layers.4.0.2.bn3.running_var".
size mismatch for backbone.features_layers.1.0.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1]).
size mismatch for backbone.features_layers.1.0.1.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]).
size mismatch for backbone.features_layers.1.0.2.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]).
size mismatch for backbone.features_layers.2.0.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
size mismatch for backbone.features_layers.2.0.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
size mismatch for backbone.features_layers.2.0.2.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
size mismatch for backbone.features_layers.2.0.3.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
size mismatch for backbone.features_layers.3.0.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for backbone.features_layers.3.0.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 512, 1, 1]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.3.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.4.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.5.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.4.0.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
size mismatch for backbone.features_layers.4.0.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([2048, 1024, 1, 1]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.1.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1]).
size mismatch for backbone.features_layers.4.0.2.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1]).
size mismatch for classifier.bottleneck.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.bottleneck.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.bottleneck.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.bottleneck.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.classifier.weight: copying a param with shape torch.Size([625, 512]) from checkpoint, the shape in current model is torch.Size([625, 2048]).
python-BaseException
terminate called without an active exception

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

About trained models

Hi~thanks for your code first. Can you offer trained model on DukeMTMC-VID dataset based ResVKD-50bam?

GPUs required?

What GPUs were used to train this? I would like to know what is the minimum recommended setup.

I am running this with a GeForce RTX 2080 Ti and, following the instructions, when I run
python ./tools/train_v2v.py mars --backbone resnet50 --num_train_images 8 --p 8 --k 4 --exp_name base_mars_resnet50 --first_milestone 100 --step_milestone 100

I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.76 GiB total capacity; 9.69 GiB already allocated; 29.75 MiB free; 197.14 MiB cached)

I have nothing else running and my GPU is 100% idle.