yangsenius / transpose Goto Github PK

View Code? Open in Web Editor NEW

354.0 354.0 55.0 4.76 MB

PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.

Home Page: https://github.com/yangsenius/TransPose/releases/download/paper/transpose.pdf

License: MIT License

Python 9.32% Makefile 0.01% Cuda 15.02% C++ 0.01% Jupyter Notebook 75.47% Cython 0.19%

keypoint localization pose-estimation transformer

transpose's People

Contributors

Stargazers

Watchers

Forkers

elaineok xuewengeophysics civcpeihongwei mornydew mash2612 nishantr05 hishmania94 hakutamatat howsamcolab ldp-1 kobewangsky kwak-hyun-woo realphongha vghost2008 hampen2929 shizi7677 zhangshaojie1993 houshuaishuai lywh superjay1996 sunlinlin-aragon s04240051 leijue222 hodaeb hodaebrahimi waynechen-cloud zhangjiekui sole12343 qipengwang gukerui jaehyek zengwang430521 23119841 mukeshnarendran7 cuishanefeng ruhyadi strakaj ememepose seyedsajadashrafi qitaozhao satpalsr mievst bruinxiong yulong314 esaupr gulpfire phoenixdigitalfx alanlusun thedesiderata xc-jp rxhnxxyathc51 srinikstudent abhishek1ahuja chainxiao

transpose's Issues

Inference code with images ?

Why is a small number of parameters require a very large memory?

I found that this model is much smaller than the parameter or the amount of operation is much smaller than HRNET, but the memory occupied by the training is particularly large. Why is this?Is this the characteristics of VIT?
Thank you for your answer.

What parts of the CNN model did you retain?

Thank you for your excellent work, I found a problem in reading the paper and the code:
I saw your paper said :" Backbone. Many common CNNs can be taken as the backbone. For better comparisons, we choose two typical CNN architectures: ResNet [25] and HRNet [51]. We only retain the initial several parts of the original ImageNet pretrained CNNs to extract feature from images. We name them ResNet-S and HRNet-S, the parameters numbers of which are only about 5.5% and 25% of the original CNNs."
But, after reading the code, I still couldn't know What parts of the CNN model you have retained. Can you tell me in detail?

Possible bug by flops computation?

Hi,

when I run tools/compute_flops.py, this assertion always fails.

can it run on windows？

Code for visualization

Thanks for the great work,
I wonder can you please share the code of how you visualize the attention map? or did I miss it somewhere?

Question about the cosine annealing learning rate decay

Hi, thanks for sharing your study.
While I was reading your paper, I wondered why you were using the cosine annealing scheduler.

Is there any special reason why you chose this scheduler?
Did you train all other compared models with this scheduler in COCO and MPII experiments?

I'm just asking because this scheduler is unfamiliar to me in human pose estimation domain.

Request: Tranpose HA-6

Great work. Thanks for making your work publicly available and accessible to all. Will it be possible to request for the Transpose H A6 model to be made accessible via the torch hub just for easy testing purposes?

Do we have a conclusion on which other joints the model relies on when predicting each joint of the human body?

how to predict the pose in a picture? cause your code only predicts the key points in a pic

main(image_path, save_path)
this code only outputs the key points in a pic

How to prepare target heatmaps?

I want to use the pre-trained model and fine-tune it for another application but I am not able to find the heatmaps preparation code reference? Is it similar to taking an image an converting the (x,y) co-ordinates to heatmaps like for CNN's pose estimation problem? The model output is of (48,64) but my input images are 256, 192. A reference will be helpful. Thanks

undefined symbol: cudaSetupArgument

hi,
after I run Make, I still meet this error when I run the demo.ipynb:
ModuleNotFoundError: No module named 'nms.cpu_nms'
How can i deal with it?

Target heatmaps

When I prepare target heatmaps for the coordinates from an image of 256 * 256 to 64 * 48 the minimum are very small heatmap values ex: 1.3e-35 (problem) range but the maximum is in 0.99 (fine). Do you encounter such a phenomenon while preparing heatmaps?

Is there a way to avoid this? Thanks

About n_head

“For ResNet-S based, d=256, then n_heads = 8 = 256 // 32. For HRNet-S based, d=96, then n_heads = 1 = 96 // 96.”
The values of RESNET and HRNet are 32 and 96 respectively. What are the meanings of these values（32 and 96）? Are they 96 for hrnet-s-w32 and w48?

Transfer to other Datasets

Hi,

Can TransPose be transferred to other datasets, e.g. MPII and PoseTrack? If yes, how should I perform the fine-tuning?

Many thanks in advance!

Using cache found in /root/.cache/torch/hub/yangsenius_TransPose_main /root/.cache/torch/hub/yangsenius_TransPose_main/lib/models/transpose_r.py:333: UserWarning:

The below message occurs while loading the pretrained models from torch

Using cache found in /root/.cache/torch/hub/yangsenius_TransPose_main
/root/.cache/torch/hub/yangsenius_TransPose_main/lib/models/transpose_r.py:333: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
dim_t = temperature ** (2 * (dim_t // 2) / one_direction_feats)

Load pretrained weights from url: https://github.com/yangsenius/TransPose/releases/download/Hub/tp_r_256x192_enc4_d256_h1024_mh8.pth

Unfair comparison with SimpleBaseline and SimpleBaseline-darkpose

Nice pioneer~
But it is not fair to compare TransPose-R with the original SimpleBaseline-Res, as the original SimpleBaseline use a weaker augmentation strategy and a shorter training schedule. This will lead to a large gap of 1.5 AP.
I look forward to a fair comparison between them for reference.

no modules named nms.cpu & nms.gpu

when running python tools/test.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml TEST.USE_GT_BBOX True

File ../lib/nms/nms.py", line 13, in
from .cpu_nms import cpu_nms
ModuleNotFoundError: No module named 'nms.cpu_nms'

Only nms.cpu.c and nms.gpu.c exists in the nms folder, not nms.cpu and nms.gpu

Performance in higher resolutions？like 384*288

In your paper, only results of 256*192 images are presented. And TransPose-H-A6 performs better than HRNET-32+DRAKPOSE on COCO validation set according to your paper (75.8AP VS 75.6AP).

Hence, I am curious about —— does it still perform better than HRNET if increase the resolutoin?（like 384*288）

it would be appretiated if you can share the experiment results in different resolution.

TokenPose

Would you release the code of TokenPose?

position embedding

Hi, first thank you for making this work open-source.
I notice that position embeddings are summed up to sequence at each Transformer layer.
But in Bert or ViT, they only conduct this PE sum operation once before sending the sequence to Transformer encoder.
I wonder what's the motivation to design like this.

How to run this network with camera video?

Has this been tested on live cam feed or for realtime apps? Thanks

Get Http 404 error when loading TransPose-H-A4 from Torch Hub

Hello.

When I tried to load TransPose-H-A4 models with pretrained weights on COCO train2017 dataset from Torch Hub, I got HTTP Error 404: Not Found error. It seems that the yaml_url in tph_a4_256x192() of hubconf.py is not valid.

Higher accuracy when running test than in README

Hi, is the README accuracy outdated?
I ran the test model (TP_R_256x192_d256_h1024_enc4_mh8) as-is with latest pytorch:

Test: [0/794] Time 1.056 (1.056) Loss 0.0001 (0.0001) Accuracy 0.977 (0.977)
Test: [100/794] Time 0.077 (0.092) Loss 0.0003 (0.0003) Accuracy 0.898 (0.866)
Test: [200/794] Time 0.082 (0.086) Loss 0.0004 (0.0003) Accuracy 0.841 (0.869)
Test: [300/794] Time 0.078 (0.084) Loss 0.0002 (0.0003) Accuracy 0.965 (0.866)
Test: [400/794] Time 0.078 (0.083) Loss 0.0002 (0.0003) Accuracy 0.866 (0.866)
Test: [500/794] Time 0.082 (0.082) Loss 0.0002 (0.0003) Accuracy 0.947 (0.868)
Test: [600/794] Time 0.080 (0.082) Loss 0.0006 (0.0003) Accuracy 0.841 (0.867)
Test: [700/794] Time 0.078 (0.081) Loss 0.0004 (0.0003) Accuracy 0.918 (0.867)
=> writing results json to output/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8/results/keypoints_val2017_results_0.json
Loading and preparing results...
DONE (t=0.33s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type keypoints
DONE (t=2.71s).
Accumulating evaluation results...
DONE (t=0.12s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.751
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.926
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.826
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.719
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.796
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.778
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.932
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.841
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.743
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.830

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)

transpose_r 0.751 0.926 0.826 0.719 0.796 0.778 0.932 0.841 0.743 0.830

Arch	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
transpose_r	0.751	0.926	0.826	0.719	0.796	0.778	0.932	0.841	0.743	0.830

From README:

Model	Input size	FPS*	GFLOPs	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
TransPose-R-A3	256x192	141	8.0	0.717	0.889	0.788	0.680	0.786	0.771	0.930	0.836	0.727	0.835
TransPose-R-A4	256x192	138	8.9	0.726	0.891	0.799	0.688	0.798	0.780	0.931	0.845	0.735	0.844

Unable to load the model 'tp_r_256x192_enc4_d256_h1024_mh8.pth'

There is an error:

=> loading model from models/pytorch/transpose_coco/tp_r_256x192_enc4_d256_h1024_mh8.pth
Traceback (most recent call last):
File "/data3/wutianyi/anaconda3/envs/ipertest/lib/python3.7/tarfile.py", line 187, in nti
n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'v2\nq\x03((X'

It seems that there is something wrong with the pth file.

When i run test.py in my dataset, AP(m) and AR(m) are -1, Could you tell me how to solve the problem?

When i run test.py in my dataset, AP(m) and AR(m) are -1.

TransPoseV2?

What's the difference between TransPose and TransPoseV2, which can be found in the repo release?

Pose Score

Hi, Can you tell me how to calculate the pose score, using this model?

Fine-tuning model does'nt learn while training

. I am using these parameters found in the main git but Fine-tuning model doesn't seem to learn from new data

criterion = torch.nn.MSELoss(reduction="mean")
pretrain_part = [param for name, param in model.named_parameters()if 'final_layer' not in name]
optimizer = torch.optim.Adam([ {'params': pretrain_part, 'lr':1e-5 },

model loading

#Load pre-trained model and fine-tune
model = torch.hub.load('yangsenius/TransPose:main', 
                       'tpr_a4_256x192',
                       pretrained=True)
for param in model.parameters():
    param.requires_grad = False

model.final_layer =  torch.nn.Sequential(torch.nn.Conv2d(256, 18,1))                                    
model = model.to(device)

#Get model summary
summary(model,
          input_size=(3, 256, 192),
          batch_size=1
          )

Loss:
Epoch:0, time22.861s, loss0.49620020389556885
Epoch:1, time23.730s, loss0.28467278741300106
Epoch:2, time23.186s, loss0.19562865188345313
Epoch:3, time23.089s, loss0.15849934425204992
Epoch:4, time23.006s, loss0.1431811167858541
Epoch:5, time22.985s, loss0.1366584706120193
Epoch:6, time23.165s, loss0.13364548794925213
Epoch:7, time23.167s, loss0.13212952483445406
Epoch:8, time22.938s, loss0.13107420597225428
Epoch:9, time22.993s, loss0.13025714457035065
Epoch:10, time22.988s, loss0.12950784899294376
Epoch:11, time22.960s, loss0.12873529829084873
Epoch:12, time22.971s, loss0.12812337931245565
Epoch:13, time23.206s, loss0.12742456002160907
Epoch:14, time23.180s, loss0.12706445762887597

How to test on our personal images?

Hi, thanks for your amazing work! I want to detect the keypoints of my images, but I don't know how to do it, can you give me some guidance on how to test on my personal images?

Thanks again for your work!

Use different backbones

Hello,

I want to replace ResNet in TransPose-R with different backbones. How should I do it correctly? Should I import the backbone and replace the following code with the forward pass of the new backbone?

TransPose/lib/models/transpose_r.py

Lines 403 to 410 in dab9007

    
           x = self.conv1(x) 
        
           x = self.bn1(x) 
        
           x = self.relu(x) 
        
           x = self.maxpool(x) 
        
           x = self.layer1(x) 
        
           x = self.layer2(x) 
        
           x = self.reduce(x)

Thanks in advance!

COCO test-dev

Training on COCO train2017 dataset：python tools/train.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml
Testing on COCO val2017 dataset：python tools/test.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml TEST.USE_GT_BBOX True
How to get the result of COCO test-dev？

Training log record

Could you provide the traning log of Transpose?

Pretrained model loss stuck around a point? how many epochs to train model?

How many epochs was the Transpose hA4 pre-trained model fine-tuned on the MPII dataset to get to the benchmarks in the paper?
I am using: the following parameters similar to the paper. but on a dataset with 10K images here
model_tp = torch.hub.load('yangsenius/TransPose:main',
'tph_a4_256x192',
pretrained=True)
model_tp.final_layer = torch.nn.Sequential(torch.nn.Conv2d(96, 18, kernel_size=1))

#Load parameters
model = model_tp.to(device)
pretrain_part = [param for name, param in model.named_parameters()if 'final_layer' not in name]
optimizer = torch.optim.Adam([ {'params': pretrain_part, 'lr':1e-5 },
{'params': model.final_layer.parameters(), 'lr': 1e-4}])
criterion = torch.nn.MSELoss(reduction="mean")

Any suggestion to improve would be helpful this situation. Thanks
I am training trying to fine-tune it but the loss doesn't decrease:
Training model
Epoch:0, loss2.804723664186895, time taken:539.878s
Epoch:1, loss2.263692114269361, time taken:542.564s
Epoch:2, loss1.8802592728752643, time taken:542.661s
Epoch:3, loss1.5531523590907454, time taken:543.041s
Epoch:4, loss1.3379272652091458, time taken:543.445s
Epoch:5, loss1.1180460024625063, time taken:538.449s
Epoch:6, loss0.9673018065514043, time taken:534.550s
Epoch:7, loss0.8572808737517335, time taken:538.618s
Epoch:8, loss0.7790990431094542, time taken:535.940s
Epoch:9, loss0.7243237162474543, time taken:536.291s
Epoch:10, loss0.6794152171351016, time taken:535.745s
Epoch:11, loss0.6420647234190255, time taken:532.800s
Epoch:12, loss0.6094503253116272, time taken:531.308s
Epoch:13, loss0.5824214839958586, time taken:530.418s
Epoch:14, loss0.5580684408778325, time taken:530.618s
Epoch:15, loss0.538073766452726, time taken:531.255s
Epoch:16, loss0.5198041790281422, time taken:531.875s
Epoch:17, loss0.5046796562382951, time taken:529.682s
Epoch:18, loss0.49001771898474544, time taken:529.585s
Epoch:19, loss0.4768067048571538, time taken:530.031s
Epoch:20, loss0.46674167667515576, time taken:534.574s
Epoch:21, loss0.45518148655537516, time taken:532.242s
Epoch:22, loss0.4449854488193523, time taken:532.336s
Epoch:23, loss0.4369037283177022, time taken:533.899s
Epoch:24, loss0.4278696861874778, time taken:532.454s
Epoch:25, loss0.4207416394201573, time taken:538.248s
Epoch:26, loss0.41212902366532944, time taken:541.508s
Epoch:27, loss0.4052599307906348, time taken:540.419s
Epoch:28, loss0.3998840279818978, time taken:541.615s
Epoch:29, loss0.3926734702545218, time taken:541.612s
Epoch:30, loss0.3866453653026838, time taken:541.235s
Epoch:31, loss0.38077057831105776, time taken:540.944s
Epoch:32, loss0.37572325009386986, time taken:540.582s
Epoch:33, loss0.3709150122012943, time taken:540.616s
Epoch:34, loss0.36646912069409154, time taken:540.807s
Epoch:35, loss0.3614582328009419, time taken:541.298s
Epoch:36, loss0.35673171386588365, time taken:537.836s
Epoch:37, loss0.3524343741883058, time taken:538.538s
Epoch:38, loss0.34845523245166987, time taken:539.272s

Question about TransformerEncoder input

Hi, first off thank you for this great work!

I'm trying to implement the Transformer part of your work to a mask r-cnn model. Using a Swin backbone, the RPN gives me n bounding box proposal and for each proposal I have the bbox features shape=[n, 256, 14, 14] extracted by the backbone. Now for each bbox, I would like to get the keypoints with :

    def forward(self, x):
        # x = self.conv1(x)
        # x = self.bn1(x)
        # x = self.relu(x)
        # x = self.maxpool(x)

        # x = self.layer1(x)
        # x = self.layer2(x)
        # x = self.reduce(x)

        n, c, h, w = x.shape
        x = x.flatten(2).permute(2, 0, 1)
        x = self.global_encoder(x, pos=self.pos_embedding)
        x = x.permute(1, 2, 0).contiguous().view(n, c, h, w)
        x = self.deconv_layers(x)
        x = self.final_layer(x)

        return x

I'm having an error at the line self.global_encoder(x, pos=self.pos_embedding)
The size of tensor a (196) must match the size of tensor b (1024) at non-singleton dimension 0
The x input in self.global_encoder(x, pos=self.pos_embedding) is of shape [256, n, 196], which seems wrong? I tried with shape [n, 256, 196] but it doesnt work either. What am I missing?

1 Get the confidence Score of each keypoint

hi there. Is it possible to get the confidence score for the predicted key point for pre-trained model and custom trained model
what modifications are required to get the probability or confidence scores with the predicted keypoint

Have Backbone parameters been fixed when training?

Generally speaking, the backbone will fix its parameters when extracting features.
But I don't seem to see fixed-parameter operations when loading the backbone.

Multiple GPUs training

Hello. I have some questions regarding training with multiple GPUs.

GPU setting in config files

In README

We trained our different models on different hardware platforms: 2 x RTX2080Ti GPUs (TP-R-A3, TP-R-A4), 4 x TiTan XP GPUs (TP-H-S, TP-H-A4), and 4 x Tesla P40 GPUs (TP-H-A6).

However, it seems that this note does not match some config files in folder TransPose/experiments/coco/:

In TransPose/experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml, only 1 GPU is used (instead of 2)
line 7:
```
GPUS: (0,)
```
In TransPose/experiments/coco/transpose_h/TP_H_w32_256x192_stage3_1_4_d64_h128_relu_enc4_mh1.yaml, 2 GPUs instead of 4
In TP_H_w48_256x192_stage3_1_4_d64_h128_relu_enc4_mh1.yaml, TP_H_w48_256x192_stage3_1_4_d96_h192_relu_enc4_mh1.yaml, TP_H_w48_256x192_stage3_1_4_d96_h192_relu_enc5_mh1.yaml, only 1 GPU instead of 4

Maybe the GPU setting is not correct in these config files?

Scaling the batch size and learning rate

As mentioned in #11,

From my experience, the performances of transpose-r models are very sensitive to the initial learning rate. I did not train transpose-r-a4 on 4 or 8 GPUs. I suggest you increase the initial learning rate a little bit at such conditions (with larger batchsize).

Currently I can use 4 RTX2080Ti GPUs for training. Do you have any suggestion for scaling the batch size and learning rate by multiple GPUs training?

Many Thanks in advance!

Only 1 GPU is used for training

I noticed that only 1 GPU is used to train TransPose-R-A4 and lr=0.0001.
Should I change the lr if I want to use 4 or 8 gpus?
Or just keep the same?
Thanks for your reply.

MPII test

How is the Total metric of mpii obtained? In the code, I only see Ubody metric.

How long does transpose model take to train a epoch?

Why transpose models take a long time to train a epoch. I want to ask, how long do people usually train a round and did I forget to adjust any parameters

How to generate coco keypoint format (json file)?

Hy nice repository.
I'm curious how you can generate a json file from the coco keypoint dataset?. Is there a conversion code?
You linked to this repository, but there I found the dataset has been converted to json format.
In that repository I also found the coco.py code, is this the code used to generate the json file?

Thanks

The meaning of Head？

Nice work. I have a question about what is the head in the pipeline?

There are several meanings of Head: For example, one is the multi-head in the transformer indicates the relationship between different patches; but, my understanding is that, it is the output of deconvolution, such as a 64*48 matrix which is used to calculate the heatmap loss. Is it right?

Heatmaps generation (Jointsdataset.py)

def generate_target(self, joints, joints_vis):
        '''
        :param joints:  [num_joints, 3]
        :param joints_vis: [num_joints, 3]
        :return: target, target_weight(1: visible, 0: invisible)
        '''
        target_weight = np.ones((self.num_joints, 1), dtype=np.float32)
        target_weight[:, 0] = joints_vis[:, 0]

        assert self.target_type == 'gaussian', \
            'Only support gaussian map now!'

        if self.target_type == 'gaussian':
            target = np.zeros((self.num_joints,
                               self.heatmap_size[1],
                               self.heatmap_size[0]),
                              dtype=np.float32)

            tmp_size = self.sigma * 3

            for joint_id in range(self.num_joints):
                target_weight[joint_id] = \
                    self.adjust_target_weight(joints[joint_id], target_weight[joint_id], tmp_size)
                
                if target_weight[joint_id] == 0:
                    continue

                mu_x = joints[joint_id][0]
                mu_y = joints[joint_id][1]
                
                x = np.arange(0, self.heatmap_size[0], 1, np.float32)
                y = np.arange(0, self.heatmap_size[1], 1, np.float32)
                y = y[:, np.newaxis]

                v = target_weight[joint_id]
                if v > 0.5:
                    target[joint_id] = np.exp(- ((x - mu_x) ** 2 + (y - mu_y) ** 2) / (2 * self.sigma ** 2))

        if self.use_different_joints_weight:
            target_weight = np.multiply(target_weight, self.joints_weight)

        return target,

Hi, in the above code for dataset preparation, what is the input format for joints, joints_vis?
is it joints = [[x1,y1, v1].......[xn, yn,vn]] - where v is the visibility (0- not present and 1 being visible)
is joints_vis also [[x1,y1, v1].......[xn, yn,vn]] ?? I plan to use to apply it to another dataset where I have the following format [[x1,y1, v1].......[xn, yn,vn]]. Thanks

Implementation of MPII set

Hello ,
I am very happy to see such excellent work. How can I use this project to train and test in MPII? I adjusted some parameters, but the result is not very good. Have you done any work on this data set?

In addition, what is the basis for determining the number of heads for different models?

Thank you!

Can these codes be used in mpii datasets

Static quantization of TransPose model in PyTorch? NotImplementedError: Could not run ‘aten::add.out’

I am trying to do a static quantization of the transpose a4 model in PyTorch but I run into this error. It's coming from the aten::add.out operation. How can I modify this in the pre-trained model to make it suitable to test quantization? Thanks

---> 8 model_static_quantized(x).shape

7 frames

/root/.cache/torch/hub/yangsenius_TransPose_main/lib/models/transpose_h.py in forward(self, x)
99 residual = self.downsample(x)
100
→ 101 out += residual
102 out = self.relu(out)
103

NotImplementedError: Could not run ‘aten::add.out’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::add.out’ is only available for these backends: [CPU, CUDA, Meta, MkldnnCPU, SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

About the image patch

Thank you for your excellent work, I found a problem in reading the code：
In VIT, to handle 2D images, they reshape the image x ∈ R^H×W×C into a sequence of flattened 2D patches xp ∈ R^N×(P2·C), where N = HW/P2 and (P, P) is the resolution of each image patch.
In this method, we reshape the image x ∈ R^H×W×C into a sequence of flattened 2D patches xp ∈ R^C×(HW), then embedding is performed. Is our resolution of each image patch (1, 1)??
What are the benefits of this setup?

	x = self.conv1(x)
	x = self.bn1(x)
	x = self.relu(x)
	x = self.maxpool(x)

	x = self.layer1(x)
	x = self.layer2(x)
	x = self.reduce(x)