aimagelab / vkd Goto Github PK

View Code? Open in Web Editor NEW

73.0 9.0 15.0 1.6 MB

PyTorch code for ECCV 2020 paper: "Robust Re-Identification by Multiple Views Knowledge Distillation"

License: MIT License

Python 100.00%

deep-learning re-id knowledge-distillation eccv-2020

vkd's Issues

Structure of DukeMTMC-VideoReID

Hello!

I would like to transform a dataset after the DukeMTMC-VideoReID structure used in this project and to just plug it in. However, it seems that the structure of Duke used in this project is not the same as https://github.com/Yu-Wu/DukeMTMC-VideoReID. Could you provide more details on how you structured the Duke dataset before giving it to the network?

Thank you!

How to run it on custom video??

Heatmap code?

Can you upload the code that draws the heatmap in your paper, thanks.

GPUs required?

What GPUs were used to train this? I would like to know what is the minimum recommended setup.

I am running this with a GeForce RTX 2080 Ti and, following the instructions, when I run
python ./tools/train_v2v.py mars --backbone resnet50 --num_train_images 8 --p 8 --k 4 --exp_name base_mars_resnet50 --first_milestone 100 --step_milestone 100

I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.76 GiB total capacity; 9.69 GiB already allocated; 29.75 MiB free; 197.14 MiB cached)

I have nothing else running and my GPU is 100% idle.

About multi-camera Multi-shot test

Thank you for your great job!
In Figure 3, you use the multi-shot feature fusion of multi-camera for testing. I would like to ask you for a feature fusion of a multi-camera. Do you exclude all the cameras of the same id that participate in the fusion in the gallery? Or just remove a camera?

How to evaluate I2V using one network?

How to evaluate I2V using one network, and where are the codes?

How to conduct cross-architecture transfer?

How to conduct cross-architecture transfer? What is the command? or what codes need I revise? Thanks.

RuntimeError: result type Long can't be cast to the desired output type Bool

Hi, guys：

First of all, thank you for your outstanding work, but when I was training the teacher network, I encountered such a problem.

python train_v2v.py mars --backbone resnet50 --num_train_images 8 --p 4 --k 4 --exp_name base_mars_resnet50 --first_milestone 100 --step_milestone 100

=> MARS loaded
Dataset statistics:

subset | # ids | # tracklets | # images

train | 625 | 8298 | 509914
query | 626 | 1980 | 114493
gallery | 622 | 9330 | 543216

total | 1251 | 19608 | 1167623

number of images per tracklet: 2 ~ 920, average 59.5

EXP_NAME: base_mars_resnet50
Traceback (most recent call last):
File "train_v2v.py", line 125, in
main()
File "train_v2v.py", line 96, in main
triplet_loss_batch = triplet_loss(embeddings, y)
File "/home/fei/code/VKD/model/loss.py", line 208, in call
return super(OnlineTripletLoss, self).call(*args, **kwargs)
File "/home/fei/anaconda3/envs/VKD/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fei/code/VKD/model/loss.py", line 146, in forward
negative_mask = same_id_mask ^ 1
RuntimeError: result type Long can't be cast to the desired output type Bool

How can I fix it ？

About trained models

Hi~thanks for your code first. Can you offer trained model on DukeMTMC-VID dataset based ResVKD-50bam?

About training and testing details

Hi!
There are very few details about training and testing in your article. Do you have any supplementary materials?If not, can you describe it to me in detail? Thanks!
yours

inputs

I'm sorry I have little knownledge about video reid.
What form is the input？Is it a sequential input？

Checkpoint zip is broken

(base) ➜  Downloads unzip distilled.zip 
Archive:  distilled.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of distilled.zip or
        distilled.zip.zip, and cannot find distilled.zip.ZIP, period.

(base) ➜  Downloads
~ 7z x distilled.zip 

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz (A0671),ASM,AES-NI)

Scanning the drive for archives:
1 file, 2079948095 bytes (1984 MiB)

Extracting archive: distilled.zip

ERRORS:
Headers Error
Unconfirmed start of archive


WARNINGS:
There are data after the end of archive

--
Path = distilled.zip
Type = zip
ERRORS:
Headers Error
Unconfirmed start of archive
WARNINGS:
There are data after the end of archive
Physical Size = 91999574
Tail Size = 1987948521

ERROR: CRC Failed : distilled_public/duke/crossdistill/distill_duke_resnet101_to_resnet34/chk/chk_di_1
                                                                            
Sub items Errors: 1

Archives with Errors: 1

Warnings: 1

Open Errors: 1

Sub items Errors: 1
(base) ➜  Downloads sudo apt-get install fastjar

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  chromium-codecs-ffmpeg-extra fonts-open-sans gir1.2-goa-1.0 libceres1 libfwupdplugin1 libpython2-dev libqrcodegencpp1 librlottie0-1 libxcb-screensaver0 libxxhash0 python3-cached-property python3-docker
  python3-dockerpty python3-docopt python3-jsonschema python3-pyrsistent python3-texttable python3-websocket
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  fastjar
0 upgraded, 1 newly installed, 0 to remove and 295 not upgraded.
Need to get 66,7 kB of archives.
After this operation, 175 kB of additional disk space will be used.
0% [Working]
Get:1 http://ru.archive.ubuntu.com/ubuntu focal/universe amd64 fastjar amd64 2:0.98-6build1 [66,7 kB]
Fetched 66,7 kB in 6s (12,1 kB/s)       
Selecting previously unselected package fastjar.
(Reading database ... 445500 files and directories currently installed.)
Preparing to unpack .../fastjar_2%3a0.98-6build1_amd64.deb ...
Unpacking fastjar (2:0.98-6build1) ...
Setting up fastjar (2:0.98-6build1) ...
Processing triggers for install-info (6.7.0.dfsg.2-5) ...
Processing triggers for man-db (2.9.1-1) ...
(base) ➜  Downloads jar xvf distilled.zip 
  inflated: distilled_public/duke/crossdistill/distill_duke_resnet101_to_mobilenet/chk/chk_di_1
  inflated: distilled_public/duke/crossdistill/distill_duke_resnet101_to_mobilenet/params/hparams.json
  inflated: distilled_public/duke/crossdistill/distill_duke_resnet101_to_mobilenet/params/params.json
Error inflating file! (-3)

what is the meaning of several parameters?

what is the meaning of p and k in the training command line?:
python ./tools/train_distill.py mars ./logs/base_mars_resnet50 --exp_name distill_mars_resnet50 --p 12 --k 4 --step_milestone 150 --num_epochs 500

And what relationships between these to parameters with the N and M in your paper?

How many GPUs are used for training?

How many GPUs are used for training? I used 8 gpus, and the results are not good.

Error on training: stack expects a non-empty TensorList

Hello,
I am trying to reproduce the results on Google Collab. I followed the instructions, but maybe I have mistaken something:
I have all the needed files on a Google Drive and, therefore, the datasets structure looks like this: VKD-master/datasets/mars and there are 3 folders (info, bbox_train, bbox_test).
However, when I start running the training command, I get the following error:
"RuntimeError: stack expects a non-empty TensorList"
I have attached a file of the entire log. Also, when I tried on the pre-trained model, the same error occurred.
Any change I do, brings me back to this same error. Please, guide me towards a solution. Any idea is welcomed.
Thank you,
Anca

eval.py size mismatch(distill(student) part)

Hello,I'm confuse about eval part.First,when i use "python ./tools/eval.py mars ./logs/distilled_public/mars/selfdistill/distill_mars_resnet50 --trinet_chk_name chk_di_1",it can show the result as the table(top1,map...)
But when I want to eval resnet34 and change it to"python ./tools/eval.py mars ./logs/distilled_public/mars/selfdistill/distill_mars_resnet34 --trinet_chk_name chk_di_1" ,
it'll show size mismatch wrong.Can any one help me with the problem？Thanks a lot！

Here is the wrong message:

Traceback (most recent call last):
File "/home/kingsman/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/203.7148.72/plugins/python-ce/helpers/pydev/pydevd.py", line 1477, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/kingsman/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/203.7148.72/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/kingsman/VKD-master/tools/eval.py", line 219, in
main()
File "/home/kingsman/VKD-master/tools/eval.py", line 210, in main
net.load_state_dict(state_dict)
File "/home/kingsman/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TriNet:
Missing key(s) in state_dict: "backbone.features_layers.1.0.0.conv3.weight", "backbone.features_layers.1.0.0.bn3.weight", "backbone.features_layers.1.0.0.bn3.bias", "backbone.features_layers.1.0.0.bn3.running_mean", "backbone.features_layers.1.0.0.bn3.running_var", "backbone.features_layers.1.0.0.downsample.0.weight", "backbone.features_layers.1.0.0.downsample.1.weight", "backbone.features_layers.1.0.0.downsample.1.bias", "backbone.features_layers.1.0.0.downsample.1.running_mean", "backbone.features_layers.1.0.0.downsample.1.running_var", "backbone.features_layers.1.0.1.conv3.weight", "backbone.features_layers.1.0.1.bn3.weight", "backbone.features_layers.1.0.1.bn3.bias", "backbone.features_layers.1.0.1.bn3.running_mean", "backbone.features_layers.1.0.1.bn3.running_var", "backbone.features_layers.1.0.2.conv3.weight", "backbone.features_layers.1.0.2.bn3.weight", "backbone.features_layers.1.0.2.bn3.bias", "backbone.features_layers.1.0.2.bn3.running_mean", "backbone.features_layers.1.0.2.bn3.running_var", "backbone.features_layers.2.0.0.conv3.weight", "backbone.features_layers.2.0.0.bn3.weight", "backbone.features_layers.2.0.0.bn3.bias", "backbone.features_layers.2.0.0.bn3.running_mean", "backbone.features_layers.2.0.0.bn3.running_var", "backbone.features_layers.2.0.1.conv3.weight", "backbone.features_layers.2.0.1.bn3.weight", "backbone.features_layers.2.0.1.bn3.bias", "backbone.features_layers.2.0.1.bn3.running_mean", "backbone.features_layers.2.0.1.bn3.running_var", "backbone.features_layers.2.0.2.conv3.weight", "backbone.features_layers.2.0.2.bn3.weight", "backbone.features_layers.2.0.2.bn3.bias", "backbone.features_layers.2.0.2.bn3.running_mean", "backbone.features_layers.2.0.2.bn3.running_var", "backbone.features_layers.2.0.3.conv3.weight", "backbone.features_layers.2.0.3.bn3.weight", "backbone.features_layers.2.0.3.bn3.bias", "backbone.features_layers.2.0.3.bn3.running_mean", "backbone.features_layers.2.0.3.bn3.running_var", "backbone.features_layers.3.0.0.conv3.weight", "backbone.features_layers.3.0.0.bn3.weight", "backbone.features_layers.3.0.0.bn3.bias", "backbone.features_layers.3.0.0.bn3.running_mean", "backbone.features_layers.3.0.0.bn3.running_var", "backbone.features_layers.3.0.1.conv3.weight", "backbone.features_layers.3.0.1.bn3.weight", "backbone.features_layers.3.0.1.bn3.bias", "backbone.features_layers.3.0.1.bn3.running_mean", "backbone.features_layers.3.0.1.bn3.running_var", "backbone.features_layers.3.0.2.conv3.weight", "backbone.features_layers.3.0.2.bn3.weight", "backbone.features_layers.3.0.2.bn3.bias", "backbone.features_layers.3.0.2.bn3.running_mean", "backbone.features_layers.3.0.2.bn3.running_var", "backbone.features_layers.3.0.3.conv3.weight", "backbone.features_layers.3.0.3.bn3.weight", "backbone.features_layers.3.0.3.bn3.bias", "backbone.features_layers.3.0.3.bn3.running_mean", "backbone.features_layers.3.0.3.bn3.running_var", "backbone.features_layers.3.0.4.conv3.weight", "backbone.features_layers.3.0.4.bn3.weight", "backbone.features_layers.3.0.4.bn3.bias", "backbone.features_layers.3.0.4.bn3.running_mean", "backbone.features_layers.3.0.4.bn3.running_var", "backbone.features_layers.3.0.5.conv3.weight", "backbone.features_layers.3.0.5.bn3.weight", "backbone.features_layers.3.0.5.bn3.bias", "backbone.features_layers.3.0.5.bn3.running_mean", "backbone.features_layers.3.0.5.bn3.running_var", "backbone.features_layers.4.0.0.conv3.weight", "backbone.features_layers.4.0.0.bn3.weight", "backbone.features_layers.4.0.0.bn3.bias", "backbone.features_layers.4.0.0.bn3.running_mean", "backbone.features_layers.4.0.0.bn3.running_var", "backbone.features_layers.4.0.1.conv3.weight", "backbone.features_layers.4.0.1.bn3.weight", "backbone.features_layers.4.0.1.bn3.bias", "backbone.features_layers.4.0.1.bn3.running_mean", "backbone.features_layers.4.0.1.bn3.running_var", "backbone.features_layers.4.0.2.conv3.weight", "backbone.features_layers.4.0.2.bn3.weight", "backbone.features_layers.4.0.2.bn3.bias", "backbone.features_layers.4.0.2.bn3.running_mean", "backbone.features_layers.4.0.2.bn3.running_var".
size mismatch for backbone.features_layers.1.0.0.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1]).
size mismatch for backbone.features_layers.1.0.1.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]).
size mismatch for backbone.features_layers.1.0.2.conv1.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]).
size mismatch for backbone.features_layers.2.0.0.conv1.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
size mismatch for backbone.features_layers.2.0.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.0.downsample.1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for backbone.features_layers.2.0.1.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
size mismatch for backbone.features_layers.2.0.2.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
size mismatch for backbone.features_layers.2.0.3.conv1.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
size mismatch for backbone.features_layers.3.0.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for backbone.features_layers.3.0.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 512, 1, 1]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.running_mean: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.0.downsample.1.running_var: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for backbone.features_layers.3.0.1.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.2.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.3.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.4.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.3.0.5.conv1.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
size mismatch for backbone.features_layers.4.0.0.conv1.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
size mismatch for backbone.features_layers.4.0.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([2048, 1024, 1, 1]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for backbone.features_layers.4.0.1.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1]).
size mismatch for backbone.features_layers.4.0.2.conv1.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 2048, 1, 1]).
size mismatch for classifier.bottleneck.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.bottleneck.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.bottleneck.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.bottleneck.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for classifier.classifier.weight: copying a param with shape torch.Size([625, 512]) from checkpoint, the shape in current model is torch.Size([625, 2048]).
python-BaseException
terminate called without an active exception

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Animal Re-ID Details

Hi Authors, Thanks for the nice work and sharing the GitHub code!

I am trying to reproduce the results of Table 7 for Amur Tiger for animal re-identification task. The command.txt of the released code has commands for Image To Video and Video To Video settings. It can be a great help to let me know how to use the GitHub repository for the Image 2 Image setting.

Also, in the tools/ subdirectory, there are two training files, train_v2v.py, and train_distill.py. Which of these two files should I use for the Image2Image setting? If possible, please let me know the details of the hyperparameter used for the same.

Thanks again!

which network is used for evaluation?

I am confused for the evaluation codes:
for idx_iteration in range(args.num_generations):
print(f'starting generation {idx_iteration+1}')
print('#'*100)
teacher_net = d_trainer(teacher_net, student_net)
d_trainer.evaluate(teacher_net)
teacher_net.teacher_mode()

    student_net = deepcopy(teacher_net)
    saver.save_net(student_net, f'chk_di_{idx_iteration + 1}')

    student_net.reinit_layers(args.reinit_l4, args.reinit_l3)

Do you use student network or teacher network for evaluation?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.