Coder Social home page Coder Social logo

epiphqny / vistr Goto Github PK

View Code? Open in Web Editor NEW
734.0 11.0 96.0 112 KB

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

Home Page: https://arxiv.org/abs/2011.14503

License: Apache License 2.0

Python 65.50% C++ 0.30% Cuda 30.91% C 3.29%
cvpr2021 video-instance-segmentation transformers instance-segmentation

vistr's Introduction

VisTR: End-to-End Video Instance Segmentation with Transformers

This is the official implementation of the VisTR paper:

Installation

We provide instructions how to install dependencies via conda. First, clone the repository locally:

git clone https://github.com/Epiphqny/vistr.git

Then, install PyTorch 1.6 and torchvision 0.7:

conda install pytorch==1.6.0 torchvision==0.7.0

Install pycocotools

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

Compile DCN module(requires GCC>=5.3, cuda>=10.0)

cd models/dcn
python setup.py build_ext --inplace

Preparation

Download and extract 2019 version of YoutubeVIS train and val images with annotations from CodeLab or YoutubeVIS. We expect the directory structure to be the following:

VisTR
├── data
│   ├── train
│   ├── val
│   ├── annotations
│   │   ├── instances_train_sub.json
│   │   ├── instances_val_sub.json
├── models
...

Download the pretrained DETR models Google Drive BaiduYun(passcode:alge) on COCO and save it to the pretrained path.

Training

Training of the model requires at least 32g memory GPU, we performed the experiment on 32g V100 card. (As the training resolution is limited by the GPU memory, if you have a larger memory GPU and want to perform the experiment, please contact with me, thanks very much)

To train baseline VisTR on a single node with 8 gpus for 18 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --backbone resnet101/50 --ytvos_path /path/to/ytvos --masks --pretrained_weights /path/to/pretrained_path

Inference

python inference.py --masks --model_path /path/to/model_weights --save_path /path/to/results.json

Models

We provide baseline VisTR models, and plan to include more in future. AP is computed on YouTubeVIS dataset by submitting the result json file to the CodeLab system, and inference time is calculated by pure model inference time (without data-loading and post-processing).

name backbone FPS mask AP model result json zip detailed AP
0 VisTR R50 69.9 36.2 vistr_r50.pth vistr_r50.zip

1 VisTR R101 57.7 40.1 vistr_r101.pth vistr_r101.zip

License

VisTR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Acknowledgement

We would like to thank the DETR open-source project for its awesome work, part of the code are modified from its project.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@inproceedings{wang2020end,
  title={End-to-End Video Instance Segmentation with Transformers},
  author={Wang, Yuqing and Xu, Zhaoliang and Wang, Xinlong and Shen, Chunhua and Cheng, Baoshan and Shen, Hao and Xia, Huaxia},
  booktitle =  {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

vistr's People

Contributors

epiphqny avatar wxinlong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vistr's Issues

python inference.py --masks --model_path vistr_r50.pth

File "inference.py", line 236, in
main(args)
File "inference.py", line 172, in main
model.load_state_dict(state_dict)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VisTRsegm:
Missing key(s) in state_dict: "vistr.backbone.0.body.layer3.6.conv1.weight",

Precomputed results?

Thanks for the work!
Besides the pretrained model, can the precomputed results (jsons) also be made available?

I don't have access to a V100, and I would really appreciate having the precomputed results.
Thanks again!

error in load mask

Hi, thank you for your contributions to VIS. I got this error when loading dataset, occurring at line 69, it looks like there are something wrong when loading mask. Does anyone got the same error?
image

the VisTR'forward output doesn't match the code in inference.py

error is here:

ssh://[email protected]:22/home/reid/anaconda3/envs/vistr/bin/python -u /home/reid/ldz_project/vistr/inference.py --model_path ckpt/vistr_r101.pth
Process video:  0
Traceback (most recent call last):
  File "/home/reid/ldz_project/vistr/inference.py", line 244, in <module>
    main(args)
  File "/home/reid/ldz_project/vistr/inference.py", line 209, in main
    logits, boxes, masks = outputs['pred_logits'].softmax(-1)[0,:,:-1], outputs['pred_boxes'][0], outputs['pred_masks'][0]
KeyError: 'pred_masks'

Process finished with exit code 1

the code in VisTR

def forward(self, samples: NestedTensor):
        ...
        out = {'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]}
        if self.aux_loss:
            out['aux_outputs'] = self._set_aux_loss(outputs_class, outputs_coord)
        return out

the code in inference.py:

# inference time is calculated for this operation
outputs = model(img)
# end of model inference
logits, boxes, masks = outputs['pred_logits'].softmax(-1)[0,:,:-1], outputs['pred_boxes'][0], outputs['pred_masks'][0]

About visualization of results

I would like to know how to convert the results.json into a visual video or image,could you give some code reference?

cant understand this clip code

in transformer.py

115 if self.norm is not None:
116 output = self.norm(output)
117 if self.return_intermediate:
118 intermediate.pop() # why there needs pop, and pop what things?
119 intermediate.append(output)

Too many iterations in ONE EPOCH?

Dear authors,

Thanks for your great open-source work. I have a question regarding the training:

In each epoch, the number of iterations equals to the number of images. However, in each iteration, the input is a whole video which contains 36 images. That said, in average, one image is trained 36 times in the same epoch. In general, I think a common way is to set the number of iterations to be equal to the number of videos, such that one image is only seen once in each epoch. I am wondering why there are such many iterations in one epoch? Is it specially set for the method, or just for a convenient implementation?

Thanks a lot! Look forward hearing from you.

about position embedding

image

Are the position coded inputs and outputs tensor?What are the dimensions of the input and output of position coding?

is there anybody knows how to convert to onnx file

I cant do it, it output

for i in range(frame_masks.size(1)):
Traceback (most recent call last):
File "inference.py", line 250, in
main(args)
File "inference.py", line 208, in main
torch.onnx.export(model, img.squeeze(0), "vistr.onnx", export_params=True,verbose=True,opset_version=11, do_constant_folding=True,
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/init.py", line 203, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 86, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 526, in _export
graph, params_dict, torch_out = _model_to_graph(model, args, verbose, input_names,
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 382, in _model_to_graph
graph = _optimize_graph(graph, operator_export_type,
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 188, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/init.py", line 241, in _run_symbolic_function
return utils._run_symbolic_function(*args, **kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 791, in _run_symbolic_function
return symbolic_fn(g, *inputs, **attrs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 129, in wrapper
return fn(g, *args)
File "/root/anaconda3/lib/python3.8/site-packages/torch/onnx/symbolic_opset11.py", line 672, in flatten
output_dims[start_dim] = output_dims[start_dim] * input_dims[i]
IndexError: list index out of range

How long does one epoch training take

I wonder how long does one epoch training take if you use 8xV100 GPUs,
I am using 4xV100 GPUs and it takes ~20 hours for one epoch, so it's like 10 hours per epoch for 8xV100?

Question about Classes Index

Thank you for sharing this great work! I have a question about your classes index. YoutubeVIS has 40 classes and a empty class is involved in VisTR, thus 41 classes in total, right? Then I noticed that in your implementation, the loss_labels() is computed by

target_classes = torch.full(src_logits.shape[:2], self.num_classes, dtype=torch.int64, device=src_logits.device)
target_classes[idx] = target_classes_o
loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)

Since ytvis's class begins from 1 to 40 and I didn't find any re-index operation in your code, I am confused by the empty class here. Do you use 40 represent empty class (you set self.num_classes=40)? Thank you

track id

Hi there,
Thanks for realizing the code.
I had a very simple question. Does VisTR require tracking IDs?

Inference with Tracking

@Epiphqny - Excellent work. We would like to test this out for instance segmentation and tracking. Your inference.py file says that this code does video segmentation only, no tracking. The code also seems to be similar to the code used by DETR for segmentation.
How do we do instance tracking along with segmentation? Could you please share thoughts or code for that.

Question about instance queries

Thank you for sharing this great work! I have a question about the instance queries. Are instance categories and instance queries arranged in order? In other words, Whether the final output category of the first instance queries in the frame 1 and the first instance queries in the frame 2 is the same ? If they're same,how to do that? (I can't find description of this in the paper) Thank you!

Have trouble in running my datasets

Hello, sir or madam! I'm very interested in your excellent work(VisTR) ,and I'm trying to run the VisTR on my datasets with four classes .I make the num_frames 3, and the instances predicted in one image 10. But the results is that the 10 instances in every image are same. When I run your datasets with 40 classes , I find the 10 instances are not same. They will stand for the classes appear in the image, and maybe only two instances will output, whch is the good results. Can you help me with this problem? Thanks very much!

category_id

Thanks a lot for your work.Now your model outputs 42 labels,i.e.0-41,and you select 0-40.Since 1-40 is categories in the annotation,does 0 mean background?And what does 41 mean?

The visualization of decoder attention_weight

image
I want to visualization attention_weight of decoder moudle, I take the output of multihead_attn in the last layer of decoder ,but the shape is that(bs,360,36hw) where h*w is the shape of feature map, I don't understand that there are 36 different attention_weight with the same instances of the same frame as the picture show
Can you explain what this means

a error about inference

python --model_path ckpt/vistr_r101.pth

It seems that the sizes of model and vistr_r101.pth are different.
If I replace 'vistr.' in keys, the error is still here.

error is here:

Traceback (most recent call last):
  File "/home/reid/ldz_project/vistr/inference.py", line 237, in <module>
    main(args)
  File "/home/reid/ldz_project/vistr/inference.py", line 173, in main
    model.load_state_dict(state_dict)
  File "/home/reid/anaconda3/envs/vistr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for VisTR:
	Missing key(s) in state_dict: "transformer.encoder.layers.0.self_attn.in_proj_weight", "transformer.encoder.layers.0.self_attn.in_proj_bias", "transformer.encoder.layers.0.self_attn.out_proj.weight", "transformer.encoder.layers.0.self_attn.out_proj.bias", "transformer.encoder.layers.0.linear1.weight", "transformer.encoder.layers.0.linear1.bias", "transformer.encoder.layers.0.linear2.weight", "transformer.encoder.layers.0.linear2.bias", "transformer.encoder.layers.0.norm1.weight", "transformer.encoder.layers.0.norm1.bias", "transformer.encoder.layers.0.norm2.weight", "transformer.encoder.layers.0.norm2.bias", "transformer.encoder.layers.1.self_attn.in_proj_weight", "transformer.encoder.layers.1.self_attn.in_proj_bias", "transformer.encoder.layers.1.self_attn.out_proj.weight", "transformer.encoder.layers.1.self_attn.out_proj.bias", "transformer.encoder.layers.1.linear1.weight", "transformer.encoder.layers.1.linear1.bias", "transformer.encoder.layers.1.linear2.weight", "transformer.encoder.layers.1.linear2.bias", "transformer.encoder.layers.1.norm1.weight", "transformer.encoder.layers.1.norm1.bias", "transformer.encoder.layers.1.norm2.weight", "transformer.encoder.layers.1.norm2.bias", "transformer.encoder.layers.2.self_attn.in_proj_weight", "transformer.encoder.layers.2.self_attn.in_proj_bias", "transformer.encoder.layers.2.self_attn.out_proj.weight", "transformer.encoder.layers.2.self_attn.out_proj.bias", "transformer.encoder.layers.2.linear1.weight", "transformer.encoder.layers.2.linear1.bias", "transformer.encoder.layers.2.linear2.weight", "transformer.encoder.layers.2.linear2.bias", "transformer.encoder.layers.2.norm1.weight", "transformer.encoder.layers.2.norm1.bias", "transformer.encoder.layers.2.norm2.weight", "transformer.encoder.layers.2.norm2.bias", "transformer.encoder.layers.3.self_attn.in_proj_weight", "transformer.encoder.layers.3.self_attn.in_proj_bias", "transformer.encoder.layers.3.self_attn.out_proj.weight", "transformer.encoder.layers.3.self_attn.out_proj.bias", "transformer.encoder.layers.3.linear1.weight", "transformer.encoder.layers.3.linear1.bias", "transformer.encoder.layers.3.linear2.weight", "transformer.encoder.layers.3.linear2.bias", "transformer.encoder.layers.3.norm1.weight", "transformer.encoder.layers.3.norm1.bias", "transformer.encoder.layers.3.norm2.weight", "transformer.encoder.layers.3.norm2.bias", "transformer.encoder.layers.4.self_attn.in_proj_weight", "transformer.encoder.layers.4.self_attn.in_proj_bias", "transformer.encoder.layers.4.self_attn.out_proj.weight", "transformer.encoder.layers.4.self_attn.out_proj.bias", "transformer.encoder.layers.4.linear1.weight", "transformer.encoder.layers.4.linear1.bias", "transformer.encoder.layers.4.linear2.weight", "transformer.encoder.layers.4.linear2.bias", "transformer.encoder.layers.4.norm1.weight", "transformer.encoder.layers.4.norm1.bias", "transformer.encoder.layers.4.norm2.weight", "transformer.encoder.layers.4.norm2.bias", "transformer.encoder.layers.5.self_attn.in_proj_weight", "transformer.encoder.layers.5.self_attn.in_proj_bias", "transformer.encoder.layers.5.self_attn.out_proj.weight", "transformer.encoder.layers.5.self_attn.out_proj.bias", "transformer.encoder.layers.5.linear1.weight", "transformer.encoder.layers.5.linear1.bias", "transformer.encoder.layers.5.linear2.weight", "transformer.encoder.layers.5.linear2.bias", "transformer.encoder.layers.5.norm1.weight", "transformer.encoder.layers.5.norm1.bias", "transformer.encoder.layers.5.norm2.weight", "transformer.encoder.layers.5.norm2.bias", "transformer.decoder.layers.0.self_attn.in_proj_weight", "transformer.decoder.layers.0.self_attn.in_proj_bias", "transformer.decoder.layers.0.self_attn.out_proj.weight", "transformer.decoder.layers.0.self_attn.out_proj.bias", "transformer.decoder.layers.0.multihead_attn.in_proj_weight", "transformer.decoder.layers.0.multihead_attn.in_proj_bias", "transformer.decoder.layers.0.multihead_attn.out_proj.weight", "transformer.decoder.layers.0.multihead_attn.out_proj.bias", "transformer.decoder.layers.0.linear1.weight", "transformer.decoder.layers.0.linear1.bias", "transformer.decoder.layers.0.linear2.weight", "transformer.decoder.layers.0.linear2.bias", "transformer.decoder.layers.0.norm1.weight", "transformer.decoder.layers.0.norm1.bias", "transformer.decoder.layers.0.norm2.weight", "transformer.decoder.layers.0.norm2.bias", "transformer.decoder.layers.0.norm3.weight", "transformer.decoder.layers.0.norm3.bias", "transformer.decoder.layers.1.self_attn.in_proj_weight", "transformer.decoder.layers.1.self_attn.in_proj_bias", "transformer.decoder.layers.1.self_attn.out_proj.weight", "transformer.decoder.layers.1.self_attn.out_proj.bias", "transformer.decoder.layers.1.multihead_attn.in_proj_weight", "transformer.decoder.layers.1.multihead_attn.in_proj_bias", "transformer.decoder.layers.1.multihead_attn.out_proj.weight", "transformer.decoder.layers.1.multihead_attn.out_proj.bias", "transformer.decoder.layers.1.linear1.weight", "transformer.decoder.layers.1.linear1.bias", "transformer.decoder.layers.1.linear2.weight", "transformer.decoder.layers.1.linear2.bias", "transformer.decoder.layers.1.norm1.weight", "transformer.decoder.layers.1.norm1.bias", "transformer.decoder.layers.1.norm2.weight", "transformer.decoder.layers.1.norm2.bias", "transformer.decoder.layers.1.norm3.weight", "transformer.decoder.layers.1.norm3.bias", "transformer.decoder.layers.2.self_attn.in_proj_weight", "transformer.decoder.layers.2.self_attn.in_proj_bias", "transformer.decoder.layers.2.self_attn.out_proj.weight", "transformer.decoder.layers.2.self_attn.out_proj.bias", "transformer.decoder.layers.2.multihead_attn.in_proj_weight", "transformer.decoder.layers.2.multihead_attn.in_proj_bias", "transformer.decoder.layers.2.multihead_attn.out_proj.weight", "transformer.decoder.layers.2.multihead_attn.out_proj.bias", "transformer.decoder.layers.2.linear1.weight", "transformer.decoder.layers.2.linear1.bias", "transformer.decoder.layers.2.linear2.weight", "transformer.decoder.layers.2.linear2.bias", "transformer.decoder.layers.2.norm1.weight", "transformer.decoder.layers.2.norm1.bias", "transformer.decoder.layers.2.norm2.weight", "transformer.decoder.layers.2.norm2.bias", "transformer.decoder.layers.2.norm3.weight", "transformer.decoder.layers.2.norm3.bias", "transformer.decoder.layers.3.self_attn.in_proj_weight", "transformer.decoder.layers.3.self_attn.in_proj_bias", "transformer.decoder.layers.3.self_attn.out_proj.weight", "transformer.decoder.layers.3.self_attn.out_proj.bias", "transformer.decoder.layers.3.multihead_attn.in_proj_weight", "transformer.decoder.layers.3.multihead_attn.in_proj_bias", "transformer.decoder.layers.3.multihead_attn.out_proj.weight", "transformer.decoder.layers.3.multihead_attn.out_proj.bias", "transformer.decoder.layers.3.linear1.weight", "transformer.decoder.layers.3.linear1.bias", "transformer.decoder.layers.3.linear2.weight", "transformer.decoder.layers.3.linear2.bias", "transformer.decoder.layers.3.norm1.weight", "transformer.decoder.layers.3.norm1.bias", "transformer.decoder.layers.3.norm2.weight", "transformer.decoder.layers.3.norm2.bias", "transformer.decoder.layers.3.norm3.weight", "transformer.decoder.layers.3.norm3.bias", "transformer.decoder.layers.4.self_attn.in_proj_weight", "transformer.decoder.layers.4.self_attn.in_proj_bias", "transformer.decoder.layers.4.self_attn.out_proj.weight", "transformer.decoder.layers.4.self_attn.out_proj.bias", "transformer.decoder.layers.4.multihead_attn.in_proj_weight", "transformer.decoder.layers.4.multihead_attn.in_proj_bias", "transformer.decoder.layers.4.multihead_attn.out_proj.weight", "transformer.decoder.layers.4.multihead_attn.out_proj.bias", "transformer.decoder.layers.4.linear1.weight", "transformer.decoder.layers.4.linear1.bias", "transformer.decoder.layers.4.linear2.weight", "transformer.decoder.layers.4.linear2.bias", "transformer.decoder.layers.4.norm1.weight", "transformer.decoder.layers.4.norm1.bias", "transformer.decoder.layers.4.norm2.weight", "transformer.decoder.layers.4.norm2.bias", "transformer.decoder.layers.4.norm3.weight", "transformer.decoder.layers.4.norm3.bias", "transformer.decoder.layers.5.self_attn.in_proj_weight", "transformer.decoder.layers.5.self_attn.in_proj_bias", "transformer.decoder.layers.5.self_attn.out_proj.weight", "transformer.decoder.layers.5.self_attn.out_proj.bias", "transformer.decoder.layers.5.multihead_attn.in_proj_weight", "transformer.decoder.layers.5.multihead_attn.in_proj_bias", "transformer.decoder.layers.5.multihead_attn.out_proj.weight", "transformer.decoder.layers.5.multihead_attn.out_proj.bias", "transformer.decoder.layers.5.linear1.weight", "transformer.decoder.layers.5.linear1.bias", "transformer.decoder.layers.5.linear2.weight", "transformer.decoder.layers.5.linear2.bias", "transformer.decoder.layers.5.norm1.weight", "transformer.decoder.layers.5.norm1.bias", "transformer.decoder.layers.5.norm2.weight", "transformer.decoder.layers.5.norm2.bias", "transformer.decoder.layers.5.norm3.weight", "transformer.decoder.layers.5.norm3.bias", "transformer.decoder.norm.weight", "transformer.decoder.norm.bias", "class_embed.weight", "class_embed.bias", "bbox_embed.layers.0.weight", "bbox_embed.layers.0.bias", "bbox_embed.layers.1.weight", "bbox_embed.layers.1.bias", "bbox_embed.layers.2.weight", "bbox_embed.layers.2.bias", "query_embed.weight", "input_proj.weight", "input_proj.bias", "backbone.0.body.conv1.weight", "backbone.0.body.bn1.weight", "backbone.0.body.bn1.bias", "backbone.0.body.bn1.running_mean", "backbone.0.body.bn1.running_var", "backbone.0.body.layer1.0.conv1.weight", "backbone.0.body.layer1.0.bn1.weight", "backbone.0.body.layer1.0.bn1.bias", "backbone.0.body.layer1.0.bn1.running_mean", "backbone.0.body.layer1.0.bn1.running_var", "backbone.0.body.layer1.0.conv2.weight", "backbone.0.body.layer1.0.bn2.weight", "backbone.0.body.layer1.0.bn2.bias", "backbone.0.body.layer1.0.bn2.running_mean", "backbone.0.body.layer1.0.bn2.running_var", "backbone.0.body.layer1.0.conv3.weight", "backbone.0.body.layer1.0.bn3.weight", "backbone.0.body.layer1.0.bn3.bias", "backbone.0.body.layer1.0.bn3.running_mean", "backbone.0.body.layer1.0.bn3.running_var", "backbone.0.body.layer1.0.downsample.0.weight", "backbone.0.body.layer1.0.downsample.1.weight", "backbone.0.body.layer1.0.downsample.1.bias", "backbone.0.body.layer1.0.downsample.1.running_mean", "backbone.0.body.layer1.0.downsample.1.running_var", "backbone.0.body.layer1.1.conv1.weight", "backbone.0.body.layer1.1.bn1.weight", "backbone.0.body.layer1.1.bn1.bias", "backbone.0.body.layer1.1.bn1.running_mean", "backbone.0.body.layer1.1.bn1.running_var", "backbone.0.body.layer1.1.conv2.weight", "backbone.0.body.layer1.1.bn2.weight", "backbone.0.body.layer1.1.bn2.bias", "backbone.0.body.layer1.1.bn2.running_mean", "backbone.0.body.layer1.1.bn2.running_var", "backbone.0.body.layer1.1.conv3.weight", "backbone.0.body.layer1.1.bn3.weight", "backbone.0.body.layer1.1.bn3.bias", "backbone.0.body.layer1.1.bn3.running_mean", "backbone.0.body.layer1.1.bn3.running_var", "backbone.0.body.layer1.2.conv1.weight", "backbone.0.body.layer1.2.bn1.weight", "backbone.0.body.layer1.2.bn1.bias", "backbone.0.body.layer1.2.bn1.running_mean", "backbone.0.body.layer1.2.bn1.running_var", "backbone.0.body.layer1.2.conv2.weight", "backbone.0.body.layer1.2.bn2.weight", "backbone.0.body.layer1.2.bn2.bias", "backbone.0.body.layer1.2.bn2.running_mean", "backbone.0.body.layer1.2.bn2.running_var", "backbone.0.body.layer1.2.conv3.weight", "backbone.0.body.layer1.2.bn3.weight", "backbone.0.body.layer1.2.bn3.bias", "backbone.0.body.layer1.2.bn3.running_mean", "backbone.0.body.layer1.2.bn3.running_var", "backbone.0.body.layer2.0.conv1.weight", "backbone.0.body.layer2.0.bn1.weight", "backbone.0.body.layer2.0.bn1.bias", "backbone.0.body.layer2.0.bn1.running_mean", "backbone.0.body.layer2.0.bn1.running_var", "backbone.0.body.layer2.0.conv2.weight", "backbone.0.body.layer2.0.bn2.weight", "backbone.0.body.layer2.0.bn2.bias", "backbone.0.body.layer2.0.bn2.running_mean", "backbone.0.body.layer2.0.bn2.running_var", "backbone.0.body.layer2.0.conv3.weight", "backbone.0.body.layer2.0.bn3.weight", "backbone.0.body.layer2.0.bn3.bias", "backbone.0.body.layer2.0.bn3.running_mean", "backbone.0.body.layer2.0.bn3.running_var", "backbone.0.body.layer2.0.downsample.0.weight", "backbone.0.body.layer2.0.downsample.1.weight", "backbone.0.body.layer2.0.downsample.1.bias", "backbone.0.body.layer2.0.downsample.1.running_mean", "backbone.0.body.layer2.0.downsample.1.running_var", "backbone.0.body.layer2.1.conv1.weight", "backbone.0.body.layer2.1.bn1.weight", "backbone.0.body.layer2.1.bn1.bias", "backbone.0.body.layer2.1.bn1.running_mean", "backbone.0.body.layer2.1.bn1.running_var", "backbone.0.body.layer2.1.conv2.weight", "backbone.0.body.layer2.1.bn2.weight", "backbone.0.body.layer2.1.bn2.bias", "backbone.0.body.layer2.1.bn2.running_mean", "backbone.0.body.layer2.1.bn2.running_var", "backbone.0.body.layer2.1.conv3.weight", "backbone.0.body.layer2.1.bn3.weight", "backbone.0.body.layer2.1.bn3.bias", "backbone.0.body.layer2.1.bn3.running_mean", "backbone.0.body.layer2.1.bn3.running_var", "backbone.0.body.layer2.2.conv1.weight", "backbone.0.body.layer2.2.bn1.weight", "backbone.0.body.layer2.2.bn1.bias", "backbone.0.body.layer2.2.bn1.running_mean", "backbone.0.body.layer2.2.bn1.running_var", "backbone.0.body.layer2.2.conv2.weight", "backbone.0.body.layer2.2.bn2.weight", "backbone.0.body.layer2.2.bn2.bias", "backbone.0.body.layer2.2.bn2.running_mean", "backbone.0.body.layer2.2.bn2.running_var", "backbone.0.body.layer2.2.conv3.weight", "backbone.0.body.layer2.2.bn3.weight", "backbone.0.body.layer2.2.bn3.bias", "backbone.0.body.layer2.2.bn3.running_mean", "backbone.0.body.layer2.2.bn3.running_var", "backbone.0.body.layer2.3.conv1.weight", "backbone.0.body.layer2.3.bn1.weight", "backbone.0.body.layer2.3.bn1.bias", "backbone.0.body.layer2.3.bn1.running_mean", "backbone.0.body.layer2.3.bn1.running_var", "backbone.0.body.layer2.3.conv2.weight", "backbone.0.body.layer2.3.bn2.weight", "backbone.0.body.layer2.3.bn2.bias", "backbone.0.body.layer2.3.bn2.running_mean", "backbone.0.body.layer2.3.bn2.running_var", "backbone.0.body.layer2.3.conv3.weight", "backbone.0.body.layer2.3.bn3.weight", "backbone.0.body.layer2.3.bn3.bias", "backbone.0.body.layer2.3.bn3.running_mean", "backbone.0.body.layer2.3.bn3.running_var", "backbone.0.body.layer3.0.conv1.weight", "backbone.0.body.layer3.0.bn1.weight", "backbone.0.body.layer3.0.bn1.bias", "backbone.0.body.layer3.0.bn1.running_mean", "backbone.0.body.layer3.0.bn1.running_var", "backbone.0.body.layer3.0.conv2.weight", "backbone.0.body.layer3.0.bn2.weight", "backbone.0.body.layer3.0.bn2.bias", "backbone.0.body.layer3.0.bn2.running_mean", "backbone.0.body.layer3.0.bn2.running_var", "backbone.0.body.layer3.0.conv3.weight", "backbone.0.body.layer3.0.bn3.weight", "backbone.0.body.layer3.0.bn3.bias", "backbone.0.body.layer3.0.bn3.running_mean", "backbone.0.body.layer3.0.bn3.running_var", "backbone.0.body.layer3.0.downsample.0.weight", "backbone.0.body.layer3.0.downsample.1.weight", "backbone.0.body.layer3.0.downsample.1.bias", "backbone.0.body.layer3.0.downsample.1.running_mean", "backbone.0.body.layer3.0.downsample.1.running_var", "backbone.0.body.layer3.1.conv1.weight", "backbone.0.body.layer3.1.bn1.weight", "backbone.0.body.layer3.1.bn1.bias", "backbone.0.body.layer3.1.bn1.running_mean", "backbone.0.body.layer3.1.bn1.running_var", "backbone.0.body.layer3.1.conv2.weight", "backbone.0.body.layer3.1.bn2.weight", "backbone.0.body.layer3.1.bn2.bias", "backbone.0.body.layer3.1.bn2.running_mean", "backbone.0.body.layer3.1.bn2.running_var", "backbone.0.body.layer3.1.conv3.weight", "backbone.0.body.layer3.1.bn3.weight", "backbone.0.body.layer3.1.bn3.bias", "backbone.0.body.layer3.1.bn3.running_mean", "backbone.0.body.layer3.1.bn3.running_var", "backbone.0.body.layer3.2.conv1.weight", "backbone.0.body.layer3.2.bn1.weight", "backbone.0.body.layer3.2.bn1.bias", "backbone.0.body.layer3.2.bn1.running_mean", "backbone.0.body.layer3.2.bn1.running_var", "backbone.0.body.layer3.2.conv2.weight", "backbone.0.body.layer3.2.bn2.weight", "backbone.0.body.layer3.2.bn2.bias", "backbone.0.body.layer3.2.bn2.running_mean", "backbone.0.body.layer3.2.bn2.running_var", "backbone.0.body.layer3.2.conv3.weight", "backbone.0.body.layer3.2.bn3.weight", "backbone.0.body.layer3.2.bn3.bias", "backbone.0.body.layer3.2.bn3.running_mean", "backbone.0.body.layer3.2.bn3.running_var", "backbone.0.body.layer3.3.conv1.weight", "backbone.0.body.layer3.3.bn1.weight", "backbone.0.body.layer3.3.bn1.bias", "backbone.0.body.layer3.3.bn1.running_mean", "backbone.0.body.layer3.3.bn1.running_var", "backbone.0.body.layer3.3.conv2.weight", "backbone.0.body.layer3.3.bn2.weight", "backbone.0.body.layer3.3.bn2.bias", "backbone.0.body.layer3.3.bn2.running_mean", "backbone.0.body.layer3.3.bn2.running_var", "backbone.0.body.layer3.3.conv3.weight", "backbone.0.body.layer3.3.bn3.weight", "backbone.0.body.layer3.3.bn3.bias", "backbone.0.body.layer3.3.bn3.running_mean", "backbone.0.body.layer3.3.bn3.running_var", "backbone.0.body.layer3.4.conv1.weight", "backbone.0.body.layer3.4.bn1.weight", "backbone.0.body.layer3.4.bn1.bias", "backbone.0.body.layer3.4.bn1.running_mean", "backbone.0.body.layer3.4.bn1.running_var", "backbone.0.body.layer3.4.conv2.weight", "backbone.0.body.layer3.4.bn2.weight", "backbone.0.body.layer3.4.bn2.bias", "backbone.0.body.layer3.4.bn2.running_mean", "backbone.0.body.layer3.4.bn2.running_var", "backbone.0.body.layer3.4.conv3.weight", "backbone.0.body.layer3.4.bn3.weight", "backbone.0.body.layer3.4.bn3.bias", "backbone.0.body.layer3.4.bn3.running_mean", "backbone.0.body.layer3.4.bn3.running_var", "backbone.0.body.layer3.5.conv1.weight", "backbone.0.body.layer3.5.bn1.weight", "backbone.0.body.layer3.5.bn1.bias", "backbone.0.body.layer3.5.bn1.running_mean", "backbone.0.body.layer3.5.bn1.running_var", "backbone.0.body.layer3.5.conv2.weight", "backbone.0.body.layer3.5.bn2.weight", "backbone.0.body.layer3.5.bn2.bias", "backbone.0.body.layer3.5.bn2.running_mean", "backbone.0.body.layer3.5.bn2.running_var", "backbone.0.body.layer3.5.conv3.weight", "backbone.0.body.layer3.5.bn3.weight", "backbone.0.body.layer3.5.bn3.bias", "backbone.0.body.layer3.5.bn3.running_mean", "backbone.0.body.layer3.5.bn3.running_var", "backbone.0.body.layer3.6.conv1.weight", "backbone.0.body.layer3.6.bn1.weight", "backbone.0.body.layer3.6.bn1.bias", "backbone.0.body.layer3.6.bn1.running_mean", "backbone.0.body.layer3.6.bn1.running_var", "backbone.0.body.layer3.6.conv2.weight", "backbone.0.body.layer3.6.bn2.weight", "backbone.0.body.layer3.6.bn2.bias", "backbone.0.body.layer3.6.bn2.running_mean", "backbone.0.body.layer3.6.bn2.running_var", "backbone.0.body.layer3.6.conv3.weight", "backbone.0.body.layer3.6.bn3.weight", "backbone.0.body.layer3.6.bn3.bias", "backbone.0.body.layer3.6.bn3.running_mean", "backbone.0.body.layer3.6.bn3.running_var", "backbone.0.body.layer3.7.conv1.weight", "backbone.0.body.layer3.7.bn1.weight", "backbone.0.body.layer3.7.bn1.bias", "backbone.0.body.layer3.7.bn1.running_mean", "backbone.0.body.layer3.7.bn1.running_var", "backbone.0.body.layer3.7.conv2.weight", "backbone.0.body.layer3.7.bn2.weight", "backbone.0.body.layer3.7.bn2.bias", "backbone.0.body.layer3.7.bn2.running_mean", "backbone.0.body.layer3.7.bn2.running_var", "backbone.0.body.layer3.7.conv3.weight", "backbone.0.body.layer3.7.bn3.weight", "backbone.0.body.layer3.7.bn3.bias", "backbone.0.body.layer3.7.bn3.running_mean", "backbone.0.body.layer3.7.bn3.running_var", "backbone.0.body.layer3.8.conv1.weight", "backbone.0.body.layer3.8.bn1.weight", "backbone.0.body.layer3.8.bn1.bias", "backbone.0.body.layer3.8.bn1.running_mean", "backbone.0.body.layer3.8.bn1.running_var", "backbone.0.body.layer3.8.conv2.weight", "backbone.0.body.layer3.8.bn2.weight", "backbone.0.body.layer3.8.bn2.bias", "backbone.0.body.layer3.8.bn2.running_mean", "backbone.0.body.layer3.8.bn2.running_var", "backbone.0.body.layer3.8.conv3.weight", "backbone.0.body.layer3.8.bn3.weight", "backbone.0.body.layer3.8.bn3.bias", "backbone.0.body.layer3.8.bn3.running_mean", "backbone.0.body.layer3.8.bn3.running_var", "backbone.0.body.layer3.9.conv1.weight", "backbone.0.body.layer3.9.bn1.weight", "backbone.0.body.layer3.9.bn1.bias", "backbone.0.body.layer3.9.bn1.running_mean", "backbone.0.body.layer3.9.bn1.running_var", "backbone.0.body.layer3.9.conv2.weight", "backbone.0.body.layer3.9.bn2.weight", "backbone.0.body.layer3.9.bn2.bias", "backbone.0.body.layer3.9.bn2.running_mean", "backbone.0.body.layer3.9.bn2.running_var", "backbone.0.body.layer3.9.conv3.weight", "backbone.0.body.layer3.9.bn3.weight", "backbone.0.body.layer3.9.bn3.bias", "backbone.0.body.layer3.9.bn3.running_mean", "backbone.0.body.layer3.9.bn3.running_var", "backbone.0.body.layer3.10.conv1.weight", "backbone.0.body.layer3.10.bn1.weight", "backbone.0.body.layer3.10.bn1.bias", "backbone.0.body.layer3.10.bn1.running_mean", "backbone.0.body.layer3.10.bn1.running_var", "backbone.0.body.layer3.10.conv2.weight", "backbone.0.body.layer3.10.bn2.weight", "backbone.0.body.layer3.10.bn2.bias", "backbone.0.body.layer3.10.bn2.running_mean", "backbone.0.body.layer3.10.bn2.running_var", "backbone.0.body.layer3.10.conv3.weight", "backbone.0.body.layer3.10.bn3.weight", "backbone.0.body.layer3.10.bn3.bias", "backbone.0.body.layer3.10.bn3.running_mean", "backbone.0.body.layer3.10.bn3.running_var", "backbone.0.body.layer3.11.conv1.weight", "backbone.0.body.layer3.11.bn1.weight", "backbone.0.body.layer3.11.bn1.bias", "backbone.0.body.layer3.11.bn1.running_mean", "backbone.0.body.layer3.11.bn1.running_var", "backbone.0.body.layer3.11.conv2.weight", "backbone.0.body.layer3.11.bn2.weight", "backbone.0.body.layer3.11.bn2.bias", "backbone.0.body.layer3.11.bn2.running_mean", "backbone.0.body.layer3.11.bn2.running_var", "backbone.0.body.layer3.11.conv3.weight", "backbone.0.body.layer3.11.bn3.weight", "backbone.0.body.layer3.11.bn3.bias", "backbone.0.body.layer3.11.bn3.running_mean", "backbone.0.body.layer3.11.bn3.running_var", "backbone.0.body.layer3.12.conv1.weight", "backbone.0.body.layer3.12.bn1.weight", "backbone.0.body.layer3.12.bn1.bias", "backbone.0.body.layer3.12.bn1.running_mean", "backbone.0.body.layer3.12.bn1.running_var", "backbone.0.body.layer3.12.conv2.weight", "backbone.0.body.layer3.12.bn2.weight", "backbone.0.body.layer3.12.bn2.bias", "backbone.0.body.layer3.12.bn2.running_mean", "backbone.0.body.layer3.12.bn2.running_var", "backbone.0.body.layer3.12.conv3.weight", "backbone.0.body.layer3.12.bn3.weight", "backbone.0.body.layer3.12.bn3.bias", "backbone.0.body.layer3.12.bn3.running_mean", "backbone.0.body.layer3.12.bn3.running_var", "backbone.0.body.layer3.13.conv1.weight", "backbone.0.body.layer3.13.bn1.weight", "backbone.0.body.layer3.13.bn1.bias", "backbone.0.body.layer3.13.bn1.running_mean", "backbone.0.body.layer3.13.bn1.running_var", "backbone.0.body.layer3.13.conv2.weight", "backbone.0.body.layer3.13.bn2.weight", "backbone.0.body.layer3.13.bn2.bias", "backbone.0.body.layer3.13.bn2.running_mean", "backbone.0.body.layer3.13.bn2.running_var", "backbone.0.body.layer3.13.conv3.weight", "backbone.0.body.layer3.13.bn3.weight", "backbone.0.body.layer3.13.bn3.bias", "backbone.0.body.layer3.13.bn3.running_mean", "backbone.0.body.layer3.13.bn3.running_var", "backbone.0.body.layer3.14.conv1.weight", "backbone.0.body.layer3.14.bn1.weight", "backbone.0.body.layer3.14.bn1.bias", "backbone.0.body.layer3.14.bn1.running_mean", "backbone.0.body.layer3.14.bn1.running_var", "backbone.0.body.layer3.14.conv2.weight", "backbone.0.body.layer3.14.bn2.weight", "backbone.0.body.layer3.14.bn2.bias", "backbone.0.body.layer3.14.bn2.running_mean", "backbone.0.body.layer3.14.bn2.running_var", "backbone.0.body.layer3.14.conv3.weight", "backbone.0.body.layer3.14.bn3.weight", "backbone.0.body.layer3.14.bn3.bias", "backbone.0.body.layer3.14.bn3.running_mean", "backbone.0.body.layer3.14.bn3.running_var", "backbone.0.body.layer3.15.conv1.weight", "backbone.0.body.layer3.15.bn1.weight", "backbone.0.body.layer3.15.bn1.bias", "backbone.0.body.layer3.15.bn1.running_mean", "backbone.0.body.layer3.15.bn1.running_var", "backbone.0.body.layer3.15.conv2.weight", "backbone.0.body.layer3.15.bn2.weight", "backbone.0.body.layer3.15.bn2.bias", "backbone.0.body.layer3.15.bn2.running_mean", "backbone.0.body.layer3.15.bn2.running_var", "backbone.0.body.layer3.15.conv3.weight", "backbone.0.body.layer3.15.bn3.weight", "backbone.0.body.layer3.15.bn3.bias", "backbone.0.body.layer3.15.bn3.running_mean", "backbone.0.body.layer3.15.bn3.running_var", "backbone.0.body.layer3.16.conv1.weight", "backbone.0.body.layer3.16.bn1.weight", "backbone.0.body.layer3.16.bn1.bias", "backbone.0.body.layer3.16.bn1.running_mean", "backbone.0.body.layer3.16.bn1.running_var", "backbone.0.body.layer3.16.conv2.weight", "backbone.0.body.layer3.16.bn2.weight", "backbone.0.body.layer3.16.bn2.bias", "backbone.0.body.layer3.16.bn2.running_mean", "backbone.0.body.layer3.16.bn2.running_var", "backbone.0.body.layer3.16.conv3.weight", "backbone.0.body.layer3.16.bn3.weight", "backbone.0.body.layer3.16.bn3.bias", "backbone.0.body.layer3.16.bn3.running_mean", "backbone.0.body.layer3.16.bn3.running_var", "backbone.0.body.layer3.17.conv1.weight", "backbone.0.body.layer3.17.bn1.weight", "backbone.0.body.layer3.17.bn1.bias", "backbone.0.body.layer3.17.bn1.running_mean", "backbone.0.body.layer3.17.bn1.running_var", "backbone.0.body.layer3.17.conv2.weight", "backbone.0.body.layer3.17.bn2.weight", "backbone.0.body.layer3.17.bn2.bias", "backbone.0.body.layer3.17.bn2.running_mean", "backbone.0.body.layer3.17.bn2.running_var", "backbone.0.body.layer3.17.conv3.weight", "backbone.0.body.layer3.17.bn3.weight", "backbone.0.body.layer3.17.bn3.bias", "backbone.0.body.layer3.17.bn3.running_mean", "backbone.0.body.layer3.17.bn3.running_var", "backbone.0.body.layer3.18.conv1.weight", "backbone.0.body.layer3.18.bn1.weight", "backbone.0.body.layer3.18.bn1.bias", "backbone.0.body.layer3.18.bn1.running_mean", "backbone.0.body.layer3.18.bn1.running_var", "backbone.0.body.layer3.18.conv2.weight", "backbone.0.body.layer3.18.bn2.weight", "backbone.0.body.layer3.18.bn2.bias", "backbone.0.body.layer3.18.bn2.running_mean", "backbone.0.body.layer3.18.bn2.running_var", "backbone.0.body.layer3.18.conv3.weight", "backbone.0.body.layer3.18.bn3.weight", "backbone.0.body.layer3.18.bn3.bias", "backbone.0.body.layer3.18.bn3.running_mean", "backbone.0.body.layer3.18.bn3.running_var", "backbone.0.body.layer3.19.conv1.weight", "backbone.0.body.layer3.19.bn1.weight", "backbone.0.body.layer3.19.bn1.bias", "backbone.0.body.layer3.19.bn1.running_mean", "backbone.0.body.layer3.19.bn1.running_var", "backbone.0.body.layer3.19.conv2.weight", "backbone.0.body.layer3.19.bn2.weight", "backbone.0.body.layer3.19.bn2.bias", "backbone.0.body.layer3.19.bn2.running_mean", "backbone.0.body.layer3.19.bn2.running_var", "backbone.0.body.layer3.19.conv3.weight", "backbone.0.body.layer3.19.bn3.weight", "backbone.0.body.layer3.19.bn3.bias", "backbone.0.body.layer3.19.bn3.running_mean", "backbone.0.body.layer3.19.bn3.running_var", "backbone.0.body.layer3.20.conv1.weight", "backbone.0.body.layer3.20.bn1.weight", "backbone.0.body.layer3.20.bn1.bias", "backbone.0.body.layer3.20.bn1.running_mean", "backbone.0.body.layer3.20.bn1.running_var", "backbone.0.body.layer3.20.conv2.weight", "backbone.0.body.layer3.20.bn2.weight", "backbone.0.body.layer3.20.bn2.bias", "backbone.0.body.layer3.20.bn2.running_mean", "backbone.0.body.layer3.20.bn2.running_var", "backbone.0.body.layer3.20.conv3.weight", "backbone.0.body.layer3.20.bn3.weight", "backbone.0.body.layer3.20.bn3.bias", "backbone.0.body.layer3.20.bn3.running_mean", "backbone.0.body.layer3.20.bn3.running_var", "backbone.0.body.layer3.21.conv1.weight", "backbone.0.body.layer3.21.bn1.weight", "backbone.0.body.layer3.21.bn1.bias", "backbone.0.body.layer3.21.bn1.running_mean", "backbone.0.body.layer3.21.bn1.running_var", "backbone.0.body.layer3.21.conv2.weight", "backbone.0.body.layer3.21.bn2.weight", "backbone.0.body.layer3.21.bn2.bias", "backbone.0.body.layer3.21.bn2.running_mean", "backbone.0.body.layer3.21.bn2.running_var", "backbone.0.body.layer3.21.conv3.weight", "backbone.0.body.layer3.21.bn3.weight", "backbone.0.body.layer3.21.bn3.bias", "backbone.0.body.layer3.21.bn3.running_mean", "backbone.0.body.layer3.21.bn3.running_var", "backbone.0.body.layer3.22.conv1.weight", "backbone.0.body.layer3.22.bn1.weight", "backbone.0.body.layer3.22.bn1.bias", "backbone.0.body.layer3.22.bn1.running_mean", "backbone.0.body.layer3.22.bn1.running_var", "backbone.0.body.layer3.22.conv2.weight", "backbone.0.body.layer3.22.bn2.weight", "backbone.0.body.layer3.22.bn2.bias", "backbone.0.body.layer3.22.bn2.running_mean", "backbone.0.body.layer3.22.bn2.running_var", "backbone.0.body.layer3.22.conv3.weight", "backbone.0.body.layer3.22.bn3.weight", "backbone.0.body.layer3.22.bn3.bias", "backbone.0.body.layer3.22.bn3.running_mean", "backbone.0.body.layer3.22.bn3.running_var", "backbone.0.body.layer4.0.conv1.weight", "backbone.0.body.layer4.0.bn1.weight", "backbone.0.body.layer4.0.bn1.bias", "backbone.0.body.layer4.0.bn1.running_mean", "backbone.0.body.layer4.0.bn1.running_var", "backbone.0.body.layer4.0.conv2.weight", "backbone.0.body.layer4.0.bn2.weight", "backbone.0.body.layer4.0.bn2.bias", "backbone.0.body.layer4.0.bn2.running_mean", "backbone.0.body.layer4.0.bn2.running_var", "backbone.0.body.layer4.0.conv3.weight", "backbone.0.body.layer4.0.bn3.weight", "backbone.0.body.layer4.0.bn3.bias", "backbone.0.body.layer4.0.bn3.running_mean", "backbone.0.body.layer4.0.bn3.running_var", "backbone.0.body.layer4.0.downsample.0.weight", "backbone.0.body.layer4.0.downsample.1.weight", "backbone.0.body.layer4.0.downsample.1.bias", "backbone.0.body.layer4.0.downsample.1.running_mean", "backbone.0.body.layer4.0.downsample.1.running_var", "backbone.0.body.layer4.1.conv1.weight", "backbone.0.body.layer4.1.bn1.weight", "backbone.0.body.layer4.1.bn1.bias", "backbone.0.body.layer4.1.bn1.running_mean", "backbone.0.body.layer4.1.bn1.running_var", "backbone.0.body.layer4.1.conv2.weight", "backbone.0.body.layer4.1.bn2.weight", "backbone.0.body.layer4.1.bn2.bias", "backbone.0.body.layer4.1.bn2.running_mean", "backbone.0.body.layer4.1.bn2.running_var", "backbone.0.body.layer4.1.conv3.weight", "backbone.0.body.layer4.1.bn3.weight", "backbone.0.body.layer4.1.bn3.bias", "backbone.0.body.layer4.1.bn3.running_mean", "backbone.0.body.layer4.1.bn3.running_var", "backbone.0.body.layer4.2.conv1.weight", "backbone.0.body.layer4.2.bn1.weight", "backbone.0.body.layer4.2.bn1.bias", "backbone.0.body.layer4.2.bn1.running_mean", "backbone.0.body.layer4.2.bn1.running_var", "backbone.0.body.layer4.2.conv2.weight", "backbone.0.body.layer4.2.bn2.weight", "backbone.0.body.layer4.2.bn2.bias", "backbone.0.body.layer4.2.bn2.running_mean", "backbone.0.body.layer4.2.bn2.running_var", "backbone.0.body.layer4.2.conv3.weight", "backbone.0.body.layer4.2.bn3.weight", "backbone.0.body.layer4.2.bn3.bias", "backbone.0.body.layer4.2.bn3.running_mean", "backbone.0.body.layer4.2.bn3.running_var". 
	Unexpected key(s) in state_dict: "vistr.transformer.encoder.layers.0.self_attn.in_proj_weight", "vistr.transformer.encoder.layers.0.self_attn.in_proj_bias", "vistr.transformer.encoder.layers.0.self_attn.out_proj.weight", "vistr.transformer.encoder.layers.0.self_attn.out_proj.bias", "vistr.transformer.encoder.layers.0.linear1.weight", "vistr.transformer.encoder.layers.0.linear1.bias", "vistr.transformer.encoder.layers.0.linear2.weight", "vistr.transformer.encoder.layers.0.linear2.bias", "vistr.transformer.encoder.layers.0.norm1.weight", "vistr.transformer.encoder.layers.0.norm1.bias", "vistr.transformer.encoder.layers.0.norm2.weight", "vistr.transformer.encoder.layers.0.norm2.bias", "vistr.transformer.encoder.layers.1.self_attn.in_proj_weight", "vistr.transformer.encoder.layers.1.self_attn.in_proj_bias", "vistr.transformer.encoder.layers.1.self_attn.out_proj.weight", "vistr.transformer.encoder.layers.1.self_attn.out_proj.bias", "vistr.transformer.encoder.layers.1.linear1.weight", "vistr.transformer.encoder.layers.1.linear1.bias", "vistr.transformer.encoder.layers.1.linear2.weight", "vistr.transformer.encoder.layers.1.linear2.bias", "vistr.transformer.encoder.layers.1.norm1.weight", "vistr.transformer.encoder.layers.1.norm1.bias", "vistr.transformer.encoder.layers.1.norm2.weight", "vistr.transformer.encoder.layers.1.norm2.bias", "vistr.transformer.encoder.layers.2.self_attn.in_proj_weight", "vistr.transformer.encoder.layers.2.self_attn.in_proj_bias", "vistr.transformer.encoder.layers.2.self_attn.out_proj.weight", "vistr.transformer.encoder.layers.2.self_attn.out_proj.bias", "vistr.transformer.encoder.layers.2.linear1.weight", "vistr.transformer.encoder.layers.2.linear1.bias", "vistr.transformer.encoder.layers.2.linear2.weight", "vistr.transformer.encoder.layers.2.linear2.bias", "vistr.transformer.encoder.layers.2.norm1.weight", "vistr.transformer.encoder.layers.2.norm1.bias", "vistr.transformer.encoder.layers.2.norm2.weight", "vistr.transformer.encoder.layers.2.norm2.bias", "vistr.transformer.encoder.layers.3.self_attn.in_proj_weight", "vistr.transformer.encoder.layers.3.self_attn.in_proj_bias", "vistr.transformer.encoder.layers.3.self_attn.out_proj.weight", "vistr.transformer.encoder.layers.3.self_attn.out_proj.bias", "vistr.transformer.encoder.layers.3.linear1.weight", "vistr.transformer.encoder.layers.3.linear1.bias", "vistr.transformer.encoder.layers.3.linear2.weight", "vistr.transformer.encoder.layers.3.linear2.bias", "vistr.transformer.encoder.layers.3.norm1.weight", "vistr.transformer.encoder.layers.3.norm1.bias", "vistr.transformer.encoder.layers.3.norm2.weight", "vistr.transformer.encoder.layers.3.norm2.bias", "vistr.transformer.encoder.layers.4.self_attn.in_proj_weight", "vistr.transformer.encoder.layers.4.self_attn.in_proj_bias", "vistr.transformer.encoder.layers.4.self_attn.out_proj.weight", "vistr.transformer.encoder.layers.4.self_attn.out_proj.bias", "vistr.transformer.encoder.layers.4.linear1.weight", "vistr.transformer.encoder.layers.4.linear1.bias", "vistr.transformer.encoder.layers.4.linear2.weight", "vistr.transformer.encoder.layers.4.linear2.bias", "vistr.transformer.encoder.layers.4.norm1.weight", "vistr.transformer.encoder.layers.4.norm1.bias", "vistr.transformer.encoder.layers.4.norm2.weight", "vistr.transformer.encoder.layers.4.norm2.bias", "vistr.transformer.encoder.layers.5.self_attn.in_proj_weight", "vistr.transformer.encoder.layers.5.self_attn.in_proj_bias", "vistr.transformer.encoder.layers.5.self_attn.out_proj.weight", "vistr.transformer.encoder.layers.5.self_attn.out_proj.bias", "vistr.transformer.encoder.layers.5.linear1.weight", "vistr.transformer.encoder.layers.5.linear1.bias", "vistr.transformer.encoder.layers.5.linear2.weight", "vistr.transformer.encoder.layers.5.linear2.bias", "vistr.transformer.encoder.layers.5.norm1.weight", "vistr.transformer.encoder.layers.5.norm1.bias", "vistr.transformer.encoder.layers.5.norm2.weight", "vistr.transformer.encoder.layers.5.norm2.bias", "vistr.transformer.decoder.layers.0.self_attn.in_proj_weight", "vistr.transformer.decoder.layers.0.self_attn.in_proj_bias", "vistr.transformer.decoder.layers.0.self_attn.out_proj.weight", "vistr.transformer.decoder.layers.0.self_attn.out_proj.bias", "vistr.transformer.decoder.layers.0.multihead_attn.in_proj_weight", "vistr.transformer.decoder.layers.0.multihead_attn.in_proj_bias", "vistr.transformer.decoder.layers.0.multihead_attn.out_proj.weight", "vistr.transformer.decoder.layers.0.multihead_attn.out_proj.bias", "vistr.transformer.decoder.layers.0.linear1.weight", "vistr.transformer.decoder.layers.0.linear1.bias", "vistr.transformer.decoder.layers.0.linear2.weight", "vistr.transformer.decoder.layers.0.linear2.bias", "vistr.transformer.decoder.layers.0.norm1.weight", "vistr.transformer.decoder.layers.0.norm1.bias", "vistr.transformer.decoder.layers.0.norm2.weight", "vistr.transformer.decoder.layers.0.norm2.bias", "vistr.transformer.decoder.layers.0.norm3.weight", "vistr.transformer.decoder.layers.0.norm3.bias", "vistr.transformer.decoder.layers.1.self_attn.in_proj_weight", "vistr.transformer.decoder.layers.1.self_attn.in_proj_bias", "vistr.transformer.decoder.layers.1.self_attn.out_proj.weight", "vistr.transformer.decoder.layers.1.self_attn.out_proj.bias", "vistr.transformer.decoder.layers.1.multihead_attn.in_proj_weight", "vistr.transformer.decoder.layers.1.multihead_attn.in_proj_bias", "vistr.transformer.decoder.layers.1.multihead_attn.out_proj.weight", "vistr.transformer.decoder.layers.1.multihead_attn.out_proj.bias", "vistr.transformer.decoder.layers.1.linear1.weight", "vistr.transformer.decoder.layers.1.linear1.bias", "vistr.transformer.decoder.layers.1.linear2.weight", "vistr.transformer.decoder.layers.1.linear2.bias", "vistr.transformer.decoder.layers.1.norm1.weight", "vistr.transformer.decoder.layers.1.norm1.bias", "vistr.transformer.decoder.layers.1.norm2.weight", "vistr.transformer.decoder.layers.1.norm2.bias", "vistr.transformer.decoder.layers.1.norm3.weight", "vistr.transformer.decoder.layers.1.norm3.bias", "vistr.transformer.decoder.layers.2.self_attn.in_proj_weight", "vistr.transformer.decoder.layers.2.self_attn.in_proj_bias", "vistr.transformer.decoder.layers.2.self_attn.out_proj.weight", "vistr.transformer.decoder.layers.2.self_attn.out_proj.bias", "vistr.transformer.decoder.layers.2.multihead_attn.in_proj_weight", "vistr.transformer.decoder.layers.2.multihead_attn.in_proj_bias", "vistr.transformer.decoder.layers.2.multihead_attn.out_proj.weight", "vistr.transformer.decoder.layers.2.multihead_attn.out_proj.bias", "vistr.transformer.decoder.layers.2.linear1.weight", "vistr.transformer.decoder.layers.2.linear1.bias", "vistr.transformer.decoder.layers.2.linear2.weight", "vistr.transformer.decoder.layers.2.linear2.bias", "vistr.transformer.decoder.layers.2.norm1.weight", "vistr.transformer.decoder.layers.2.norm1.bias", "vistr.transformer.decoder.layers.2.norm2.weight", "vistr.transformer.decoder.layers.2.norm2.bias", "vistr.transformer.decoder.layers.2.norm3.weight", "vistr.transformer.decoder.layers.2.norm3.bias", "vistr.transformer.decoder.layers.3.self_attn.in_proj_weight", "vistr.transformer.decoder.layers.3.self_attn.in_proj_bias", "vistr.transformer.decoder.layers.3.self_attn.out_proj.weight", "vistr.transformer.decoder.layers.3.self_attn.out_proj.bias", "vistr.transformer.decoder.layers.3.multihead_attn.in_proj_weight", "vistr.transformer.decoder.layers.3.multihead_attn.in_proj_bias", "vistr.transformer.decoder.layers.3.multihead_attn.out_proj.weight", "vistr.transformer.decoder.layers.3.multihead_attn.out_proj.bias", "vistr.transformer.decoder.layers.3.linear1.weight", "vistr.transformer.decoder.layers.3.linear1.bias", "vistr.transformer.decoder.layers.3.linear2.weight", "vistr.transformer.decoder.layers.3.linear2.bias", "vistr.transformer.decoder.layers.3.norm1.weight", "vistr.transformer.decoder.layers.3.norm1.bias", "vistr.transformer.decoder.layers.3.norm2.weight", "vistr.transformer.decoder.layers.3.norm2.bias", "vistr.transformer.decoder.layers.3.norm3.weight", "vistr.transformer.decoder.layers.3.norm3.bias", "vistr.transformer.decoder.layers.4.self_attn.in_proj_weight", "vistr.transformer.decoder.layers.4.self_attn.in_proj_bias", "vistr.transformer.decoder.layers.4.self_attn.out_proj.weight", "vistr.transformer.decoder.layers.4.self_attn.out_proj.bias", "vistr.transformer.decoder.layers.4.multihead_attn.in_proj_weight", "vistr.transformer.decoder.layers.4.multihead_attn.in_proj_bias", "vistr.transformer.decoder.layers.4.multihead_attn.out_proj.weight", "vistr.transformer.decoder.layers.4.multihead_attn.out_proj.bias", "vistr.transformer.decoder.layers.4.linear1.weight", "vistr.transformer.decoder.layers.4.linear1.bias", "vistr.transformer.decoder.layers.4.linear2.weight", "vistr.transformer.decoder.layers.4.linear2.bias", "vistr.transformer.decoder.layers.4.norm1.weight", "vistr.transformer.decoder.layers.4.norm1.bias", "vistr.transformer.decoder.layers.4.norm2.weight", "vistr.transformer.decoder.layers.4.norm2.bias", "vistr.transformer.decoder.layers.4.norm3.weight", "vistr.transformer.decoder.layers.4.norm3.bias", "vistr.transformer.decoder.layers.5.self_attn.in_proj_weight", "vistr.transformer.decoder.layers.5.self_attn.in_proj_bias", "vistr.transformer.decoder.layers.5.self_attn.out_proj.weight", "vistr.transformer.decoder.layers.5.self_attn.out_proj.bias", "vistr.transformer.decoder.layers.5.multihead_attn.in_proj_weight", "vistr.transformer.decoder.layers.5.multihead_attn.in_proj_bias", "vistr.transformer.decoder.layers.5.multihead_attn.out_proj.weight", "vistr.transformer.decoder.layers.5.multihead_attn.out_proj.bias", "vistr.transformer.decoder.layers.5.linear1.weight", "vistr.transformer.decoder.layers.5.linear1.bias", "vistr.transformer.decoder.layers.5.linear2.weight", "vistr.transformer.decoder.layers.5.linear2.bias", "vistr.transformer.decoder.layers.5.norm1.weight", "vistr.transformer.decoder.layers.5.norm1.bias", "vistr.transformer.decoder.layers.5.norm2.weight", "vistr.transformer.decoder.layers.5.norm2.bias", "vistr.transformer.decoder.layers.5.norm3.weight", "vistr.transformer.decoder.layers.5.norm3.bias", "vistr.transformer.decoder.norm.weight", "vistr.transformer.decoder.norm.bias", "vistr.class_embed.weight", "vistr.class_embed.bias", "vistr.bbox_embed.layers.0.weight", "vistr.bbox_embed.layers.0.bias", "vistr.bbox_embed.layers.1.weight", "vistr.bbox_embed.layers.1.bias", "vistr.bbox_embed.layers.2.weight", "vistr.bbox_embed.layers.2.bias", "vistr.query_embed.weight", "vistr.input_proj.weight", "vistr.input_proj.bias", "vistr.backbone.0.body.conv1.weight", "vistr.backbone.0.body.bn1.weight", "vistr.backbone.0.body.bn1.bias", "vistr.backbone.0.body.bn1.running_mean", "vistr.backbone.0.body.bn1.running_var", "vistr.backbone.0.body.layer1.0.conv1.weight", "vistr.backbone.0.body.layer1.0.bn1.weight", "vistr.backbone.0.body.layer1.0.bn1.bias", "vistr.backbone.0.body.layer1.0.bn1.running_mean", "vistr.backbone.0.body.layer1.0.bn1.running_var", "vistr.backbone.0.body.layer1.0.conv2.weight", "vistr.backbone.0.body.layer1.0.bn2.weight", "vistr.backbone.0.body.layer1.0.bn2.bias", "vistr.backbone.0.body.layer1.0.bn2.running_mean", "vistr.backbone.0.body.layer1.0.bn2.running_var", "vistr.backbone.0.body.layer1.0.conv3.weight", "vistr.backbone.0.body.layer1.0.bn3.weight", "vistr.backbone.0.body.layer1.0.bn3.bias", "vistr.backbone.0.body.layer1.0.bn3.running_mean", "vistr.backbone.0.body.layer1.0.bn3.running_var", "vistr.backbone.0.body.layer1.0.downsample.0.weight", "vistr.backbone.0.body.layer1.0.downsample.1.weight", "vistr.backbone.0.body.layer1.0.downsample.1.bias", "vistr.backbone.0.body.layer1.0.downsample.1.running_mean", "vistr.backbone.0.body.layer1.0.downsample.1.running_var", "vistr.backbone.0.body.layer1.1.conv1.weight", "vistr.backbone.0.body.layer1.1.bn1.weight", "vistr.backbone.0.body.layer1.1.bn1.bias", "vistr.backbone.0.body.layer1.1.bn1.running_mean", "vistr.backbone.0.body.layer1.1.bn1.running_var", "vistr.backbone.0.body.layer1.1.conv2.weight", "vistr.backbone.0.body.layer1.1.bn2.weight", "vistr.backbone.0.body.layer1.1.bn2.bias", "vistr.backbone.0.body.layer1.1.bn2.running_mean", "vistr.backbone.0.body.layer1.1.bn2.running_var", "vistr.backbone.0.body.layer1.1.conv3.weight", "vistr.backbone.0.body.layer1.1.bn3.weight", "vistr.backbone.0.body.layer1.1.bn3.bias", "vistr.backbone.0.body.layer1.1.bn3.running_mean", "vistr.backbone.0.body.layer1.1.bn3.running_var", "vistr.backbone.0.body.layer1.2.conv1.weight", "vistr.backbone.0.body.layer1.2.bn1.weight", "vistr.backbone.0.body.layer1.2.bn1.bias", "vistr.backbone.0.body.layer1.2.bn1.running_mean", "vistr.backbone.0.body.layer1.2.bn1.running_var", "vistr.backbone.0.body.layer1.2.conv2.weight", "vistr.backbone.0.body.layer1.2.bn2.weight", "vistr.backbone.0.body.layer1.2.bn2.bias", "vistr.backbone.0.body.layer1.2.bn2.running_mean", "vistr.backbone.0.body.layer1.2.bn2.running_var", "vistr.backbone.0.body.layer1.2.conv3.weight", "vistr.backbone.0.body.layer1.2.bn3.weight", "vistr.backbone.0.body.layer1.2.bn3.bias", "vistr.backbone.0.body.layer1.2.bn3.running_mean", "vistr.backbone.0.body.layer1.2.bn3.running_var", "vistr.backbone.0.body.layer2.0.conv1.weight", "vistr.backbone.0.body.layer2.0.bn1.weight", "vistr.backbone.0.body.layer2.0.bn1.bias", "vistr.backbone.0.body.layer2.0.bn1.running_mean", "vistr.backbone.0.body.layer2.0.bn1.running_var", "vistr.backbone.0.body.layer2.0.conv2.weight", "vistr.backbone.0.body.layer2.0.bn2.weight", "vistr.backbone.0.body.layer2.0.bn2.bias", "vistr.backbone.0.body.layer2.0.bn2.running_mean", "vistr.backbone.0.body.layer2.0.bn2.running_var", "vistr.backbone.0.body.layer2.0.conv3.weight", "vistr.backbone.0.body.layer2.0.bn3.weight", "vistr.backbone.0.body.layer2.0.bn3.bias", "vistr.backbone.0.body.layer2.0.bn3.running_mean", "vistr.backbone.0.body.layer2.0.bn3.running_var", "vistr.backbone.0.body.layer2.0.downsample.0.weight", "vistr.backbone.0.body.layer2.0.downsample.1.weight", "vistr.backbone.0.body.layer2.0.downsample.1.bias", "vistr.backbone.0.body.layer2.0.downsample.1.running_mean", "vistr.backbone.0.body.layer2.0.downsample.1.running_var", "vistr.backbone.0.body.layer2.1.conv1.weight", "vistr.backbone.0.body.layer2.1.bn1.weight", "vistr.backbone.0.body.layer2.1.bn1.bias", "vistr.backbone.0.body.layer2.1.bn1.running_mean", "vistr.backbone.0.body.layer2.1.bn1.running_var", "vistr.backbone.0.body.layer2.1.conv2.weight", "vistr.backbone.0.body.layer2.1.bn2.weight", "vistr.backbone.0.body.layer2.1.bn2.bias", "vistr.backbone.0.body.layer2.1.bn2.running_mean", "vistr.backbone.0.body.layer2.1.bn2.running_var", "vistr.backbone.0.body.layer2.1.conv3.weight", "vistr.backbone.0.body.layer2.1.bn3.weight", "vistr.backbone.0.body.layer2.1.bn3.bias", "vistr.backbone.0.body.layer2.1.bn3.running_mean", "vistr.backbone.0.body.layer2.1.bn3.running_var", "vistr.backbone.0.body.layer2.2.conv1.weight", "vistr.backbone.0.body.layer2.2.bn1.weight", "vistr.backbone.0.body.layer2.2.bn1.bias", "vistr.backbone.0.body.layer2.2.bn1.running_mean", "vistr.backbone.0.body.layer2.2.bn1.running_var", "vistr.backbone.0.body.layer2.2.conv2.weight", "vistr.backbone.0.body.layer2.2.bn2.weight", "vistr.backbone.0.body.layer2.2.bn2.bias", "vistr.backbone.0.body.layer2.2.bn2.running_mean", "vistr.backbone.0.body.layer2.2.bn2.running_var", "vistr.backbone.0.body.layer2.2.conv3.weight", "vistr.backbone.0.body.layer2.2.bn3.weight", "vistr.backbone.0.body.layer2.2.bn3.bias", "vistr.backbone.0.body.layer2.2.bn3.running_mean", "vistr.backbone.0.body.layer2.2.bn3.running_var", "vistr.backbone.0.body.layer2.3.conv1.weight", "vistr.backbone.0.body.layer2.3.bn1.weight", "vistr.backbone.0.body.layer2.3.bn1.bias", "vistr.backbone.0.body.layer2.3.bn1.running_mean", "vistr.backbone.0.body.layer2.3.bn1.running_var", "vistr.backbone.0.body.layer2.3.conv2.weight", "vistr.backbone.0.body.layer2.3.bn2.weight", "vistr.backbone.0.body.layer2.3.bn2.bias", "vistr.backbone.0.body.layer2.3.bn2.running_mean", "vistr.backbone.0.body.layer2.3.bn2.running_var", "vistr.backbone.0.body.layer2.3.conv3.weight", "vistr.backbone.0.body.layer2.3.bn3.weight", "vistr.backbone.0.body.layer2.3.bn3.bias", "vistr.backbone.0.body.layer2.3.bn3.running_mean", "vistr.backbone.0.body.layer2.3.bn3.running_var", "vistr.backbone.0.body.layer3.0.conv1.weight", "vistr.backbone.0.body.layer3.0.bn1.weight", "vistr.backbone.0.body.layer3.0.bn1.bias", "vistr.backbone.0.body.layer3.0.bn1.running_mean", "vistr.backbone.0.body.layer3.0.bn1.running_var", "vistr.backbone.0.body.layer3.0.conv2.weight", "vistr.backbone.0.body.layer3.0.bn2.weight", "vistr.backbone.0.body.layer3.0.bn2.bias", "vistr.backbone.0.body.layer3.0.bn2.running_mean", "vistr.backbone.0.body.layer3.0.bn2.running_var", "vistr.backbone.0.body.layer3.0.conv3.weight", "vistr.backbone.0.body.layer3.0.bn3.weight", "vistr.backbone.0.body.layer3.0.bn3.bias", "vistr.backbone.0.body.layer3.0.bn3.running_mean", "vistr.backbone.0.body.layer3.0.bn3.running_var", "vistr.backbone.0.body.layer3.0.downsample.0.weight", "vistr.backbone.0.body.layer3.0.downsample.1.weight", "vistr.backbone.0.body.layer3.0.downsample.1.bias", "vistr.backbone.0.body.layer3.0.downsample.1.running_mean", "vistr.backbone.0.body.layer3.0.downsample.1.running_var", "vistr.backbone.0.body.layer3.1.conv1.weight", "vistr.backbone.0.body.layer3.1.bn1.weight", "vistr.backbone.0.body.layer3.1.bn1.bias", "vistr.backbone.0.body.layer3.1.bn1.running_mean", "vistr.backbone.0.body.layer3.1.bn1.running_var", "vistr.backbone.0.body.layer3.1.conv2.weight", "vistr.backbone.0.body.layer3.1.bn2.weight", "vistr.backbone.0.body.layer3.1.bn2.bias", "vistr.backbone.0.body.layer3.1.bn2.running_mean", "vistr.backbone.0.body.layer3.1.bn2.running_var", "vistr.backbone.0.body.layer3.1.conv3.weight", "vistr.backbone.0.body.layer3.1.bn3.weight", "vistr.backbone.0.body.layer3.1.bn3.bias", "vistr.backbone.0.body.layer3.1.bn3.running_mean", "vistr.backbone.0.body.layer3.1.bn3.running_var", "vistr.backbone.0.body.layer3.2.conv1.weight", "vistr.backbone.0.body.layer3.2.bn1.weight", "vistr.backbone.0.body.layer3.2.bn1.bias", "vistr.backbone.0.body.layer3.2.bn1.running_mean", "vistr.backbone.0.body.layer3.2.bn1.running_var", "vistr.backbone.0.body.layer3.2.conv2.weight", "vistr.backbone.0.body.layer3.2.bn2.weight", "vistr.backbone.0.body.layer3.2.bn2.bias", "vistr.backbone.0.body.layer3.2.bn2.running_mean", "vistr.backbone.0.body.layer3.2.bn2.running_var", "vistr.backbone.0.body.layer3.2.conv3.weight", "vistr.backbone.0.body.layer3.2.bn3.weight", "vistr.backbone.0.body.layer3.2.bn3.bias", "vistr.backbone.0.body.layer3.2.bn3.running_mean", "vistr.backbone.0.body.layer3.2.bn3.running_var", "vistr.backbone.0.body.layer3.3.conv1.weight", "vistr.backbone.0.body.layer3.3.bn1.weight", "vistr.backbone.0.body.layer3.3.bn1.bias", "vistr.backbone.0.body.layer3.3.bn1.running_mean", "vistr.backbone.0.body.layer3.3.bn1.running_var", "vistr.backbone.0.body.layer3.3.conv2.weight", "vistr.backbone.0.body.layer3.3.bn2.weight", "vistr.backbone.0.body.layer3.3.bn2.bias", "vistr.backbone.0.body.layer3.3.bn2.running_mean", "vistr.backbone.0.body.layer3.3.bn2.running_var", "vistr.backbone.0.body.layer3.3.conv3.weight", "vistr.backbone.0.body.layer3.3.bn3.weight", "vistr.backbone.0.body.layer3.3.bn3.bias", "vistr.backbone.0.body.layer3.3.bn3.running_mean", "vistr.backbone.0.body.layer3.3.bn3.running_var", "vistr.backbone.0.body.layer3.4.conv1.weight", "vistr.backbone.0.body.layer3.4.bn1.weight", "vistr.backbone.0.body.layer3.4.bn1.bias", "vistr.backbone.0.body.layer3.4.bn1.running_mean", "vistr.backbone.0.body.layer3.4.bn1.running_var", "vistr.backbone.0.body.layer3.4.conv2.weight", "vistr.backbone.0.body.layer3.4.bn2.weight", "vistr.backbone.0.body.layer3.4.bn2.bias", "vistr.backbone.0.body.layer3.4.bn2.running_mean", "vistr.backbone.0.body.layer3.4.bn2.running_var", "vistr.backbone.0.body.layer3.4.conv3.weight", "vistr.backbone.0.body.layer3.4.bn3.weight", "vistr.backbone.0.body.layer3.4.bn3.bias", "vistr.backbone.0.body.layer3.4.bn3.running_mean", "vistr.backbone.0.body.layer3.4.bn3.running_var", "vistr.backbone.0.body.layer3.5.conv1.weight", "vistr.backbone.0.body.layer3.5.bn1.weight", "vistr.backbone.0.body.layer3.5.bn1.bias", "vistr.backbone.0.body.layer3.5.bn1.running_mean", "vistr.backbone.0.body.layer3.5.bn1.running_var", "vistr.backbone.0.body.layer3.5.conv2.weight", "vistr.backbone.0.body.layer3.5.bn2.weight", "vistr.backbone.0.body.layer3.5.bn2.bias", "vistr.backbone.0.body.layer3.5.bn2.running_mean", "vistr.backbone.0.body.layer3.5.bn2.running_var", "vistr.backbone.0.body.layer3.5.conv3.weight", "vistr.backbone.0.body.layer3.5.bn3.weight", "vistr.backbone.0.body.layer3.5.bn3.bias", "vistr.backbone.0.body.layer3.5.bn3.running_mean", "vistr.backbone.0.body.layer3.5.bn3.running_var", "vistr.backbone.0.body.layer3.6.conv1.weight", "vistr.backbone.0.body.layer3.6.bn1.weight", "vistr.backbone.0.body.layer3.6.bn1.bias", "vistr.backbone.0.body.layer3.6.bn1.running_mean", "vistr.backbone.0.body.layer3.6.bn1.running_var", "vistr.backbone.0.body.layer3.6.conv2.weight", "vistr.backbone.0.body.layer3.6.bn2.weight", "vistr.backbone.0.body.layer3.6.bn2.bias", "vistr.backbone.0.body.layer3.6.bn2.running_mean", "vistr.backbone.0.body.layer3.6.bn2.running_var", "vistr.backbone.0.body.layer3.6.conv3.weight", "vistr.backbone.0.body.layer3.6.bn3.weight", "vistr.backbone.0.body.layer3.6.bn3.bias", "vistr.backbone.0.body.layer3.6.bn3.running_mean", "vistr.backbone.0.body.layer3.6.bn3.running_var", "vistr.backbone.0.body.layer3.7.conv1.weight", "vistr.backbone.0.body.layer3.7.bn1.weight", "vistr.backbone.0.body.layer3.7.bn1.bias", "vistr.backbone.0.body.layer3.7.bn1.running_mean", "vistr.backbone.0.body.layer3.7.bn1.running_var", "vistr.backbone.0.body.layer3.7.conv2.weight", "vistr.backbone.0.body.layer3.7.bn2.weight", "vistr.backbone.0.body.layer3.7.bn2.bias", "vistr.backbone.0.body.layer3.7.bn2.running_mean", "vistr.backbone.0.body.layer3.7.bn2.running_var", "vistr.backbone.0.body.layer3.7.conv3.weight", "vistr.backbone.0.body.layer3.7.bn3.weight", "vistr.backbone.0.body.layer3.7.bn3.bias", "vistr.backbone.0.body.layer3.7.bn3.running_mean", "vistr.backbone.0.body.layer3.7.bn3.running_var", "vistr.backbone.0.body.layer3.8.conv1.weight", "vistr.backbone.0.body.layer3.8.bn1.weight", "vistr.backbone.0.body.layer3.8.bn1.bias", "vistr.backbone.0.body.layer3.8.bn1.running_mean", "vistr.backbone.0.body.layer3.8.bn1.running_var", "vistr.backbone.0.body.layer3.8.conv2.weight", "vistr.backbone.0.body.layer3.8.bn2.weight", "vistr.backbone.0.body.layer3.8.bn2.bias", "vistr.backbone.0.body.layer3.8.bn2.running_mean", "vistr.backbone.0.body.layer3.8.bn2.running_var", "vistr.backbone.0.body.layer3.8.conv3.weight", "vistr.backbone.0.body.layer3.8.bn3.weight", "vistr.backbone.0.body.layer3.8.bn3.bias", "vistr.backbone.0.body.layer3.8.bn3.running_mean", "vistr.backbone.0.body.layer3.8.bn3.running_var", "vistr.backbone.0.body.layer3.9.conv1.weight", "vistr.backbone.0.body.layer3.9.bn1.weight", "vistr.backbone.0.body.layer3.9.bn1.bias", "vistr.backbone.0.body.layer3.9.bn1.running_mean", "vistr.backbone.0.body.layer3.9.bn1.running_var", "vistr.backbone.0.body.layer3.9.conv2.weight", "vistr.backbone.0.body.layer3.9.bn2.weight", "vistr.backbone.0.body.layer3.9.bn2.bias", "vistr.backbone.0.body.layer3.9.bn2.running_mean", "vistr.backbone.0.body.layer3.9.bn2.running_var", "vistr.backbone.0.body.layer3.9.conv3.weight", "vistr.backbone.0.body.layer3.9.bn3.weight", "vistr.backbone.0.body.layer3.9.bn3.bias", "vistr.backbone.0.body.layer3.9.bn3.running_mean", "vistr.backbone.0.body.layer3.9.bn3.running_var", "vistr.backbone.0.body.layer3.10.conv1.weight", "vistr.backbone.0.body.layer3.10.bn1.weight", "vistr.backbone.0.body.layer3.10.bn1.bias", "vistr.backbone.0.body.layer3.10.bn1.running_mean", "vistr.backbone.0.body.layer3.10.bn1.running_var", "vistr.backbone.0.body.layer3.10.conv2.weight", "vistr.backbone.0.body.layer3.10.bn2.weight", "vistr.backbone.0.body.layer3.10.bn2.bias", "vistr.backbone.0.body.layer3.10.bn2.running_mean", "vistr.backbone.0.body.layer3.10.bn2.running_var", "vistr.backbone.0.body.layer3.10.conv3.weight", "vistr.backbone.0.body.layer3.10.bn3.weight", "vistr.backbone.0.body.layer3.10.bn3.bias", "vistr.backbone.0.body.layer3.10.bn3.running_mean", "vistr.backbone.0.body.layer3.10.bn3.running_var", "vistr.backbone.0.body.layer3.11.conv1.weight", "vistr.backbone.0.body.layer3.11.bn1.weight", "vistr.backbone.0.body.layer3.11.bn1.bias", "vistr.backbone.0.body.layer3.11.bn1.running_mean", "vistr.backbone.0.body.layer3.11.bn1.running_var", "vistr.backbone.0.body.layer3.11.conv2.weight", "vistr.backbone.0.body.layer3.11.bn2.weight", "vistr.backbone.0.body.layer3.11.bn2.bias", "vistr.backbone.0.body.layer3.11.bn2.running_mean", "vistr.backbone.0.body.layer3.11.bn2.running_var", "vistr.backbone.0.body.layer3.11.conv3.weight", "vistr.backbone.0.body.layer3.11.bn3.weight", "vistr.backbone.0.body.layer3.11.bn3.bias", "vistr.backbone.0.body.layer3.11.bn3.running_mean", "vistr.backbone.0.body.layer3.11.bn3.running_var", "vistr.backbone.0.body.layer3.12.conv1.weight", "vistr.backbone.0.body.layer3.12.bn1.weight", "vistr.backbone.0.body.layer3.12.bn1.bias", "vistr.backbone.0.body.layer3.12.bn1.running_mean", "vistr.backbone.0.body.layer3.12.bn1.running_var", "vistr.backbone.0.body.layer3.12.conv2.weight", "vistr.backbone.0.body.layer3.12.bn2.weight", "vistr.backbone.0.body.layer3.12.bn2.bias", "vistr.backbone.0.body.layer3.12.bn2.running_mean", "vistr.backbone.0.body.layer3.12.bn2.running_var", "vistr.backbone.0.body.layer3.12.conv3.weight", "vistr.backbone.0.body.layer3.12.bn3.weight", "vistr.backbone.0.body.layer3.12.bn3.bias", "vistr.backbone.0.body.layer3.12.bn3.running_mean", "vistr.backbone.0.body.layer3.12.bn3.running_var", "vistr.backbone.0.body.layer3.13.conv1.weight", "vistr.backbone.0.body.layer3.13.bn1.weight", "vistr.backbone.0.body.layer3.13.bn1.bias", "vistr.backbone.0.body.layer3.13.bn1.running_mean", "vistr.backbone.0.body.layer3.13.bn1.running_var", "vistr.backbone.0.body.layer3.13.conv2.weight", "vistr.backbone.0.body.layer3.13.bn2.weight", "vistr.backbone.0.body.layer3.13.bn2.bias", "vistr.backbone.0.body.layer3.13.bn2.running_mean", "vistr.backbone.0.body.layer3.13.bn2.running_var", "vistr.backbone.0.body.layer3.13.conv3.weight", "vistr.backbone.0.body.layer3.13.bn3.weight", "vistr.backbone.0.body.layer3.13.bn3.bias", "vistr.backbone.0.body.layer3.13.bn3.running_mean", "vistr.backbone.0.body.layer3.13.bn3.running_var", "vistr.backbone.0.body.layer3.14.conv1.weight", "vistr.backbone.0.body.layer3.14.bn1.weight", "vistr.backbone.0.body.layer3.14.bn1.bias", "vistr.backbone.0.body.layer3.14.bn1.running_mean", "vistr.backbone.0.body.layer3.14.bn1.running_var", "vistr.backbone.0.body.layer3.14.conv2.weight", "vistr.backbone.0.body.layer3.14.bn2.weight", "vistr.backbone.0.body.layer3.14.bn2.bias", "vistr.backbone.0.body.layer3.14.bn2.running_mean", "vistr.backbone.0.body.layer3.14.bn2.running_var", "vistr.backbone.0.body.layer3.14.conv3.weight", "vistr.backbone.0.body.layer3.14.bn3.weight", "vistr.backbone.0.body.layer3.14.bn3.bias", "vistr.backbone.0.body.layer3.14.bn3.running_mean", "vistr.backbone.0.body.layer3.14.bn3.running_var", "vistr.backbone.0.body.layer3.15.conv1.weight", "vistr.backbone.0.body.layer3.15.bn1.weight", "vistr.backbone.0.body.layer3.15.bn1.bias", "vistr.backbone.0.body.layer3.15.bn1.running_mean", "vistr.backbone.0.body.layer3.15.bn1.running_var", "vistr.backbone.0.body.layer3.15.conv2.weight", "vistr.backbone.0.body.layer3.15.bn2.weight", "vistr.backbone.0.body.layer3.15.bn2.bias", "vistr.backbone.0.body.layer3.15.bn2.running_mean", "vistr.backbone.0.body.layer3.15.bn2.running_var", "vistr.backbone.0.body.layer3.15.conv3.weight", "vistr.backbone.0.body.layer3.15.bn3.weight", "vistr.backbone.0.body.layer3.15.bn3.bias", "vistr.backbone.0.body.layer3.15.bn3.running_mean", "vistr.backbone.0.body.layer3.15.bn3.running_var", "vistr.backbone.0.body.layer3.16.conv1.weight", "vistr.backbone.0.body.layer3.16.bn1.weight", "vistr.backbone.0.body.layer3.16.bn1.bias", "vistr.backbone.0.body.layer3.16.bn1.running_mean", "vistr.backbone.0.body.layer3.16.bn1.running_var", "vistr.backbone.0.body.layer3.16.conv2.weight", "vistr.backbone.0.body.layer3.16.bn2.weight", "vistr.backbone.0.body.layer3.16.bn2.bias", "vistr.backbone.0.body.layer3.16.bn2.running_mean", "vistr.backbone.0.body.layer3.16.bn2.running_var", "vistr.backbone.0.body.layer3.16.conv3.weight", "vistr.backbone.0.body.layer3.16.bn3.weight", "vistr.backbone.0.body.layer3.16.bn3.bias", "vistr.backbone.0.body.layer3.16.bn3.running_mean", "vistr.backbone.0.body.layer3.16.bn3.running_var", "vistr.backbone.0.body.layer3.17.conv1.weight", "vistr.backbone.0.body.layer3.17.bn1.weight", "vistr.backbone.0.body.layer3.17.bn1.bias", "vistr.backbone.0.body.layer3.17.bn1.running_mean", "vistr.backbone.0.body.layer3.17.bn1.running_var", "vistr.backbone.0.body.layer3.17.conv2.weight", "vistr.backbone.0.body.layer3.17.bn2.weight", "vistr.backbone.0.body.layer3.17.bn2.bias", "vistr.backbone.0.body.layer3.17.bn2.running_mean", "vistr.backbone.0.body.layer3.17.bn2.running_var", "vistr.backbone.0.body.layer3.17.conv3.weight", "vistr.backbone.0.body.layer3.17.bn3.weight", "vistr.backbone.0.body.layer3.17.bn3.bias", "vistr.backbone.0.body.layer3.17.bn3.running_mean", "vistr.backbone.0.body.layer3.17.bn3.running_var", "vistr.backbone.0.body.layer3.18.conv1.weight", "vistr.backbone.0.body.layer3.18.bn1.weight", "vistr.backbone.0.body.layer3.18.bn1.bias", "vistr.backbone.0.body.layer3.18.bn1.running_mean", "vistr.backbone.0.body.layer3.18.bn1.running_var", "vistr.backbone.0.body.layer3.18.conv2.weight", "vistr.backbone.0.body.layer3.18.bn2.weight", "vistr.backbone.0.body.layer3.18.bn2.bias", "vistr.backbone.0.body.layer3.18.bn2.running_mean", "vistr.backbone.0.body.layer3.18.bn2.running_var", "vistr.backbone.0.body.layer3.18.conv3.weight", "vistr.backbone.0.body.layer3.18.bn3.weight", "vistr.backbone.0.body.layer3.18.bn3.bias", "vistr.backbone.0.body.layer3.18.bn3.running_mean", "vistr.backbone.0.body.layer3.18.bn3.running_var", "vistr.backbone.0.body.layer3.19.conv1.weight", "vistr.backbone.0.body.layer3.19.bn1.weight", "vistr.backbone.0.body.layer3.19.bn1.bias", "vistr.backbone.0.body.layer3.19.bn1.running_mean", "vistr.backbone.0.body.layer3.19.bn1.running_var", "vistr.backbone.0.body.layer3.19.conv2.weight", "vistr.backbone.0.body.layer3.19.bn2.weight", "vistr.backbone.0.body.layer3.19.bn2.bias", "vistr.backbone.0.body.layer3.19.bn2.running_mean", "vistr.backbone.0.body.layer3.19.bn2.running_var", "vistr.backbone.0.body.layer3.19.conv3.weight", "vistr.backbone.0.body.layer3.19.bn3.weight", "vistr.backbone.0.body.layer3.19.bn3.bias", "vistr.backbone.0.body.layer3.19.bn3.running_mean", "vistr.backbone.0.body.layer3.19.bn3.running_var", "vistr.backbone.0.body.layer3.20.conv1.weight", "vistr.backbone.0.body.layer3.20.bn1.weight", "vistr.backbone.0.body.layer3.20.bn1.bias", "vistr.backbone.0.body.layer3.20.bn1.running_mean", "vistr.backbone.0.body.layer3.20.bn1.running_var", "vistr.backbone.0.body.layer3.20.conv2.weight", "vistr.backbone.0.body.layer3.20.bn2.weight", "vistr.backbone.0.body.layer3.20.bn2.bias", "vistr.backbone.0.body.layer3.20.bn2.running_mean", "vistr.backbone.0.body.layer3.20.bn2.running_var", "vistr.backbone.0.body.layer3.20.conv3.weight", "vistr.backbone.0.body.layer3.20.bn3.weight", "vistr.backbone.0.body.layer3.20.bn3.bias", "vistr.backbone.0.body.layer3.20.bn3.running_mean", "vistr.backbone.0.body.layer3.20.bn3.running_var", "vistr.backbone.0.body.layer3.21.conv1.weight", "vistr.backbone.0.body.layer3.21.bn1.weight", "vistr.backbone.0.body.layer3.21.bn1.bias", "vistr.backbone.0.body.layer3.21.bn1.running_mean", "vistr.backbone.0.body.layer3.21.bn1.running_var", "vistr.backbone.0.body.layer3.21.conv2.weight", "vistr.backbone.0.body.layer3.21.bn2.weight", "vistr.backbone.0.body.layer3.21.bn2.bias", "vistr.backbone.0.body.layer3.21.bn2.running_mean", "vistr.backbone.0.body.layer3.21.bn2.running_var", "vistr.backbone.0.body.layer3.21.conv3.weight", "vistr.backbone.0.body.layer3.21.bn3.weight", "vistr.backbone.0.body.layer3.21.bn3.bias", "vistr.backbone.0.body.layer3.21.bn3.running_mean", "vistr.backbone.0.body.layer3.21.bn3.running_var", "vistr.backbone.0.body.layer3.22.conv1.weight", "vistr.backbone.0.body.layer3.22.bn1.weight", "vistr.backbone.0.body.layer3.22.bn1.bias", "vistr.backbone.0.body.layer3.22.bn1.running_mean", "vistr.backbone.0.body.layer3.22.bn1.running_var", "vistr.backbone.0.body.layer3.22.conv2.weight", "vistr.backbone.0.body.layer3.22.bn2.weight", "vistr.backbone.0.body.layer3.22.bn2.bias", "vistr.backbone.0.body.layer3.22.bn2.running_mean", "vistr.backbone.0.body.layer3.22.bn2.running_var", "vistr.backbone.0.body.layer3.22.conv3.weight", "vistr.backbone.0.body.layer3.22.bn3.weight", "vistr.backbone.0.body.layer3.22.bn3.bias", "vistr.backbone.0.body.layer3.22.bn3.running_mean", "vistr.backbone.0.body.layer3.22.bn3.running_var", "vistr.backbone.0.body.layer4.0.conv1.weight", "vistr.backbone.0.body.layer4.0.bn1.weight", "vistr.backbone.0.body.layer4.0.bn1.bias", "vistr.backbone.0.body.layer4.0.bn1.running_mean", "vistr.backbone.0.body.layer4.0.bn1.running_var", "vistr.backbone.0.body.layer4.0.conv2.weight", "vistr.backbone.0.body.layer4.0.bn2.weight", "vistr.backbone.0.body.layer4.0.bn2.bias", "vistr.backbone.0.body.layer4.0.bn2.running_mean", "vistr.backbone.0.body.layer4.0.bn2.running_var", "vistr.backbone.0.body.layer4.0.conv3.weight", "vistr.backbone.0.body.layer4.0.bn3.weight", "vistr.backbone.0.body.layer4.0.bn3.bias", "vistr.backbone.0.body.layer4.0.bn3.running_mean", "vistr.backbone.0.body.layer4.0.bn3.running_var", "vistr.backbone.0.body.layer4.0.downsample.0.weight", "vistr.backbone.0.body.layer4.0.downsample.1.weight", "vistr.backbone.0.body.layer4.0.downsample.1.bias", "vistr.backbone.0.body.layer4.0.downsample.1.running_mean", "vistr.backbone.0.body.layer4.0.downsample.1.running_var", "vistr.backbone.0.body.layer4.1.conv1.weight", "vistr.backbone.0.body.layer4.1.bn1.weight", "vistr.backbone.0.body.layer4.1.bn1.bias", "vistr.backbone.0.body.layer4.1.bn1.running_mean", "vistr.backbone.0.body.layer4.1.bn1.running_var", "vistr.backbone.0.body.layer4.1.conv2.weight", "vistr.backbone.0.body.layer4.1.bn2.weight", "vistr.backbone.0.body.layer4.1.bn2.bias", "vistr.backbone.0.body.layer4.1.bn2.running_mean", "vistr.backbone.0.body.layer4.1.bn2.running_var", "vistr.backbone.0.body.layer4.1.conv3.weight", "vistr.backbone.0.body.layer4.1.bn3.weight", "vistr.backbone.0.body.layer4.1.bn3.bias", "vistr.backbone.0.body.layer4.1.bn3.running_mean", "vistr.backbone.0.body.layer4.1.bn3.running_var", "vistr.backbone.0.body.layer4.2.conv1.weight", "vistr.backbone.0.body.layer4.2.bn1.weight", "vistr.backbone.0.body.layer4.2.bn1.bias", "vistr.backbone.0.body.layer4.2.bn1.running_mean", "vistr.backbone.0.body.layer4.2.bn1.running_var", "vistr.backbone.0.body.layer4.2.conv2.weight", "vistr.backbone.0.body.layer4.2.bn2.weight", "vistr.backbone.0.body.layer4.2.bn2.bias", "vistr.backbone.0.body.layer4.2.bn2.running_mean", "vistr.backbone.0.body.layer4.2.bn2.running_var", "vistr.backbone.0.body.layer4.2.conv3.weight", "vistr.backbone.0.body.layer4.2.bn3.weight", "vistr.backbone.0.body.layer4.2.bn3.bias", "vistr.backbone.0.body.layer4.2.bn3.running_mean", "vistr.backbone.0.body.layer4.2.bn3.running_var", "bbox_attention.q_linear.weight", "bbox_attention.q_linear.bias", "bbox_attention.k_linear.weight", "bbox_attention.k_linear.bias", "mask_head.lay1.weight", "mask_head.lay1.bias", "mask_head.gn1.weight", "mask_head.gn1.bias", "mask_head.lay2.weight", "mask_head.lay2.bias", "mask_head.gn2.weight", "mask_head.gn2.bias", "mask_head.lay3.weight", "mask_head.lay3.bias", "mask_head.gn3.weight", "mask_head.gn3.bias", "mask_head.lay4.weight", "mask_head.lay4.bias", "mask_head.gn4.weight", "mask_head.gn4.bias", "mask_head.gn5.weight", "mask_head.gn5.bias", "mask_head.conv_offset.weight", "mask_head.conv_offset.bias", "mask_head.dcn.weight", "mask_head.adapter1.weight", "mask_head.adapter1.bias", "mask_head.adapter2.weight", "mask_head.adapter2.bias", "mask_head.adapter3.weight", "mask_head.adapter3.bias", "insmask_head.0.weight", "insmask_head.0.bias", "insmask_head.1.weight", "insmask_head.1.bias", "insmask_head.3.weight", "insmask_head.3.bias", "insmask_head.4.weight", "insmask_head.4.bias", "insmask_head.6.weight", "insmask_head.6.bias", "insmask_head.7.weight", "insmask_head.7.bias", "insmask_head.9.weight", "insmask_head.9.bias". 

Process finished with exit code 1

Custom Backbone

Hello, I am very impressed with your work. As so, I want to train your model from scratch with a custom backbone for my research. I will be appreciated if you can provide requirements for the custom backbone with how-to-do guidelines. Best!

torch.nn.modules.module.ModuleAttributeError: 'VisTRsegm' object has no attribute 'module'

Hi, Thanks for sharing your great work!
When I run python main.py --pretrained_weights 384_coco_r50.pth --backbone resnet50 --num_frames 2 --mask on one GPU ( I only have one), this error appears in the line model.module.load_state_dict(checkpoint, strict=False). This checkpoint was downloaded from your drive named '384_coco_r50.pth', could you please help fix this? And could you share one demo video sample data so we can reproduce this code more easily? Thanks!

About the results and the loss convergence

I run your code in 8 V100 GPUs using resnet50 as backbone. But it seems that the loss convergence is unstable in training after the first few epochs. The AP of the last epoch is 0.33, but the AP of the 12th epoch is 0.35.
Could you provide the loss curve and the APs of every epoch? It would be very helpful for me as a reference.
Thanks.

category_id

I notice that your category_id is from 0 to 39,while masktrackrcnn is from 1 to 40.It confuses me.
I inference the validation set with the weight you trained,and delete all the results with "category_id"=0 in the results.json.But the score doesn't change.Could you please tell me why?

Dice Loss

Hello, would you like to let me know the value of the dice coefficient (or dice loss) during training on the updated codebase? Thank you.

ModuleNotFoundError: No module named 'pycocotools.ytvos'

i have done this

git clone https://github.com/youtubevos/cocoapi.git

cd PythonAPI

python setup.py build_ext --inplace

python setup.py build_ext install

still No module named 'pycocotools.ytvos'

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

Looking in indexes: https://repo.huaweicloud.com/repository/pypi/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
Cloning https://github.com/cocodataset/cocoapi.git to /tmp/pip-req-build-x2a4p4jp
Running command git clone -q https://github.com/cocodataset/cocoapi.git /tmp/pip-req-build-x2a4p4jp
fatal: unable to access 'https://github.com/cocodataset/cocoapi.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.
WARNING: Discarding git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI. Command errored out with exit status 128: git clone -q https://github.com/cocodataset/cocoapi.git /tmp/pip-req-build-x2a4p4jp Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone -q https://github.com/cocodataset/cocoapi.git /tmp/pip-req-build-x2a4p4jp Check the logs for full command output.

Invalid file type (application/json)

Hello, when trying to submit the results.json file to CodeLab I get the error message "Invalid file type (application/json)". Anyone else run into the same problem? Is there another way to get the AP?

About GPU memory

Thank you for your inspiring work.
As you mentioned in readme, the required GPU memory is at least 32GB, which is inapplicable for many researchers.
In this way, I have a question.
Is it possible to train ``visTR'' in 3*TITAN X GPU with 12GB memory each?

Why input size is d × (T · H · W ) ?

Some questions:

  1. In Section 3.1, you use d × (T · H · W ) as input size. Why not T × (d · H · W )?
  2. CNN feature is reshaped to B×C×THW. However, the input token size is HW × B × C for transformer encoder in code. It seems confusing.

请问可以利用此方法进行RGB->Seg语义分割预测吗?

非常感谢开源!我是气象专业的学生,大概看了论文的内容,文中提到了知道 t-n, ..., t-1, t, t+1, ..., t+n 序列的视频帧,其中 t 时刻已经标注,可以利用 t 时刻前后信息来进行分割。

因此,我想请问一下此模型是否可以只利用 t 时刻之前的视频帧预测分割t时刻的视频帧呢?类似已知 t 时刻的雷达回波和不同类型对流的标注图像,训练监督学习模型,然后利用一系列雷达回波预测分割未来某一时刻雷达图像的对流类型?

Error in inference after re-training the model

After re-training, the model following error occurs in inference.py

training params , --backbone resnet101, --pretrained_weights 384_coco_r101.pth

Error :

Traceback (most recent call last):
File "inference.py", line 258, in
main(args)
File "inference.py", line 190, in main
model.load_state_dict(state_dict)
File "/home/omkarthawakar/anaconda3/envs/vistr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for VisTR:
Missing key(s) in state_dict: "transformer.encoder.layers.3.self_attn.in_proj_weight", "transformer.encoder.layers.3.self_attn.in_proj_bias", "transformer.encoder.layers.3.self_attn.out_proj.weight", "transformer.encoder.layers.3.self_attn.out_proj.bias", "transformer.encoder.layers.3.linear1.weight", "transformer.encoder.layers.3.linear1.bias", "transformer.encoder.layers.3.linear2.weight", "transformer.encoder.layers.3.linear2.bias", "transformer.encoder.layers.3.norm1.weight", "transformer.encoder.layers.3.norm1.bias", "transformer.encoder.layers.3.norm2.weight", "transformer.encoder.layers.3.norm2.bias", "transformer.encoder.layers.4.self_attn.in_proj_weight", "transformer.encoder.layers.4.self_attn.in_proj_bias", "transformer.encoder.layers.4.self_attn.out_proj.weight", "transformer.encoder.layers.4.self_attn.out_proj.bias", "transformer.encoder.layers.4.linear1.weight", "transformer.encoder.layers.4.linear1.bias", "transformer.encoder.layers.4.linear2.weight", "transformer.encoder.layers.4.linear2.bias", "transformer.encoder.layers.4.norm1.weight", "transformer.encoder.layers.4.norm1.bias", "transformer.encoder.layers.4.norm2.weight", "transformer.encoder.layers.4.norm2.bias", "transformer.encoder.layers.5.self_attn.in_proj_weight", "transformer.encoder.layers.5.self_attn.in_proj_bias", "transformer.encoder.layers.5.self_attn.out_proj.weight", "transformer.encoder.layers.5.self_attn.out_proj.bias", "transformer.encoder.layers.5.linear1.weight", "transformer.encoder.layers.5.linear1.bias", "transformer.encoder.layers.5.linear2.weight", "transformer.encoder.layers.5.linear2.bias", "transformer.encoder.layers.5.norm1.weight", "transformer.encoder.layers.5.norm1.bias", "transformer.encoder.layers.5.norm2.weight", "transformer.encoder.layers.5.norm2.bias", "transformer.decoder.layers.3.self_attn.in_proj_weight", "transformer.decoder.layers.3.self_attn.in_proj_bias", "transformer.decoder.layers.3.self_attn.out_proj.weight", "transformer.decoder.layers.3.self_attn.out_proj.bias", "transformer.decoder.layers.3.multihead_attn.in_proj_weight", "transformer.decoder.layers.3.multihead_attn.in_proj_bias", "transformer.decoder.layers.3.multihead_attn.out_proj.weight", "transformer.decoder.layers.3.multihead_attn.out_proj.bias", "transformer.decoder.layers.3.linear1.weight", "transformer.decoder.layers.3.linear1.bias", "transformer.decoder.layers.3.linear2.weight", "transformer.decoder.layers.3.linear2.bias", "transformer.decoder.layers.3.norm1.weight", "transformer.decoder.layers.3.norm1.bias", "transformer.decoder.layers.3.norm2.weight", "transformer.decoder.layers.3.norm2.bias", "transformer.decoder.layers.3.norm3.weight", "transformer.decoder.layers.3.norm3.bias", "transformer.decoder.layers.4.self_attn.in_proj_weight", "transformer.decoder.layers.4.self_attn.in_proj_bias", "transformer.decoder.layers.4.self_attn.out_proj.weight", "transformer.decoder.layers.4.self_attn.out_proj.bias", "transformer.decoder.layers.4.multihead_attn.in_proj_weight", "transformer.decoder.layers.4.multihead_attn.in_proj_bias", "transformer.decoder.layers.4.multihead_attn.out_proj.weight", "transformer.decoder.layers.4.multihead_attn.out_proj.bias", "transformer.decoder.layers.4.linear1.weight", "transformer.decoder.layers.4.linear1.bias", "transformer.decoder.layers.4.linear2.weight", "transformer.decoder.layers.4.linear2.bias", "transformer.decoder.layers.4.norm1.weight", "transformer.decoder.layers.4.norm1.bias", "transformer.decoder.layers.4.norm2.weight", "transformer.decoder.layers.4.norm2.bias", "transformer.decoder.layers.4.norm3.weight", "transformer.decoder.layers.4.norm3.bias", "transformer.decoder.layers.5.self_attn.in_proj_weight", "transformer.decoder.layers.5.self_attn.in_proj_bias", "transformer.decoder.layers.5.self_attn.out_proj.weight", "transformer.decoder.layers.5.self_attn.out_proj.bias", "transformer.decoder.layers.5.multihead_attn.in_proj_weight", "transformer.decoder.layers.5.multihead_attn.in_proj_bias", "transformer.decoder.layers.5.multihead_attn.out_proj.weight", "transformer.decoder.layers.5.multihead_attn.out_proj.bias", "transformer.decoder.layers.5.linear1.weight", "transformer.decoder.layers.5.linear1.bias", "transformer.decoder.layers.5.linear2.weight", "transformer.decoder.layers.5.linear2.bias", "transformer.decoder.layers.5.norm1.weight", "transformer.decoder.layers.5.norm1.bias", "transformer.decoder.layers.5.norm2.weight", "transformer.decoder.layers.5.norm2.bias", "transformer.decoder.layers.5.norm3.weight", "transformer.decoder.layers.5.norm3.bias".

I have used default DeformConv2d from the torchvision in segmentation.py at the time of training

from torchvision.ops import DeformConv2d as DeformConv

bbox and mask

I try to visualize the predictions,and find mask is not wholely in bounding box.Should it be like that?Or is my visualization wrong?
0

Long training time

using 4 V100,32G to train the model, about 1day for 1 epoch.how can i improve the speed from the algorithm?

The youtubevis valid dataset

I wanted to test the results of your model, but I found that there was no segmentation information in the label of YouTubevis's Valid dataset

For training time

Thank you for your work.
I try to re-train the code, but it takes a week for 18 epochs with ResNet-50 backbone on 8 V100 GPUs . Does it need to be trained so long? As far as I know, it just takes several days for training DETR with 300 epochs.

about instances_train_sub.json of dataset

Thanks for interesting works. I have a question about the dataset. I download the label of the dataset which has "train.json" but not instances_train_sub.json. Where can I download the sub json?

Num_queries hard coded

Hi,

Thank you for sharing your work! I'm trying to replicate results on a 12 GB GPU by reducing the num_frames and num_queries parameters. However, I came across the following error:

outputs_seg_masks = outputs_seg_masks.reshape(1,360,outputs_seg_masks.size(-2),outputs_seg_masks.size(-1))
RuntimeError: shape '[1, 360, 75, 76]' is invalid for input of size 1710000

I pinpointed the issue to Line 126, where I think 360 should be replaced with self.vistr.num_queries. Could you correct this in your release?

Also, can you explain what 24 denotes in Line 115?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.