Coder Social home page Coder Social logo

yangsenius / transpose Goto Github PK

View Code? Open in Web Editor NEW
354.0 354.0 55.0 4.76 MB

PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.

Home Page: https://github.com/yangsenius/TransPose/releases/download/paper/transpose.pdf

License: MIT License

Python 9.32% Makefile 0.01% Cuda 15.02% C++ 0.01% Jupyter Notebook 75.47% Cython 0.19%
keypoint localization pose-estimation transformer

transpose's People

Contributors

satpalsr avatar senyang-ml avatar yangsenius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transpose's Issues

What parts of the CNN model did you retain?

Thank you for your excellent work, I found a problem in reading the paper and the code:
I saw your paper said :" Backbone. Many common CNNs can be taken as the backbone. For better comparisons, we choose two typical CNN architectures: ResNet [25] and HRNet [51]. We only retain the initial several parts of the original ImageNet pretrained CNNs to extract feature from images. We name them ResNet-S and HRNet-S, the parameters numbers of which are only about 5.5% and 25% of the original CNNs."
But, after reading the code, I still couldn't know What parts of the CNN model you have retained. Can you tell me in detail?

Code for visualization

Thanks for the great work,
I wonder can you please share the code of how you visualize the attention map? or did I miss it somewhere?

Question about the cosine annealing learning rate decay

Hi, thanks for sharing your study.
While I was reading your paper, I wondered why you were using the cosine annealing scheduler.

  1. Is there any special reason why you chose this scheduler?
  2. Did you train all other compared models with this scheduler in COCO and MPII experiments?

I'm just asking because this scheduler is unfamiliar to me in human pose estimation domain.

Request: Tranpose HA-6

Great work. Thanks for making your work publicly available and accessible to all. Will it be possible to request for the Transpose H A6 model to be made accessible via the torch hub just for easy testing purposes?

How to prepare target heatmaps?

I want to use the pre-trained model and fine-tune it for another application but I am not able to find the heatmaps preparation code reference? Is it similar to taking an image an converting the (x,y) co-ordinates to heatmaps like for CNN's pose estimation problem? The model output is of (48,64) but my input images are 256, 192. A reference will be helpful. Thanks

undefined symbol: cudaSetupArgument

hi,
after I run Make, I still meet this error when I run the demo.ipynb:
ModuleNotFoundError: No module named 'nms.cpu_nms'
How can i deal with it?

Target heatmaps

When I prepare target heatmaps for the coordinates from an image of 256 * 256 to 64 * 48 the minimum are very small heatmap values ex: 1.3e-35 (problem) range but the maximum is in 0.99 (fine). Do you encounter such a phenomenon while preparing heatmaps?

Is there a way to avoid this? Thanks

About n_head

“For ResNet-S based, d=256, then n_heads = 8 = 256 // 32. For HRNet-S based, d=96, then n_heads = 1 = 96 // 96.”
The values of RESNET and HRNet are 32 and 96 respectively. What are the meanings of these values(32 and 96)? Are they 96 for hrnet-s-w32 and w48?

Transfer to other Datasets

Hi,

Can TransPose be transferred to other datasets, e.g. MPII and PoseTrack? If yes, how should I perform the fine-tuning?

Many thanks in advance!

Using cache found in /root/.cache/torch/hub/yangsenius_TransPose_main /root/.cache/torch/hub/yangsenius_TransPose_main/lib/models/transpose_r.py:333: UserWarning:

The below message occurs while loading the pretrained models from torch

Using cache found in /root/.cache/torch/hub/yangsenius_TransPose_main
/root/.cache/torch/hub/yangsenius_TransPose_main/lib/models/transpose_r.py:333: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
dim_t = temperature ** (2 * (dim_t // 2) / one_direction_feats)

Load pretrained weights from url: https://github.com/yangsenius/TransPose/releases/download/Hub/tp_r_256x192_enc4_d256_h1024_mh8.pth

Unfair comparison with SimpleBaseline and SimpleBaseline-darkpose

Nice pioneer~
But it is not fair to compare TransPose-R with the original SimpleBaseline-Res, as the original SimpleBaseline use a weaker augmentation strategy and a shorter training schedule. This will lead to a large gap of 1.5 AP.
I look forward to a fair comparison between them for reference.

no modules named nms.cpu & nms.gpu

when running python tools/test.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml TEST.USE_GT_BBOX True

File ../lib/nms/nms.py", line 13, in
from .cpu_nms import cpu_nms
ModuleNotFoundError: No module named 'nms.cpu_nms'

Only nms.cpu.c and nms.gpu.c exists in the nms folder, not nms.cpu and nms.gpu

Performance in higher resolutions?like 384*288

In your paper, only results of 256*192 images are presented. And TransPose-H-A6 performs better than HRNET-32+DRAKPOSE on COCO validation set according to your paper (75.8AP VS 75.6AP).

Hence, I am curious about —— does it still perform better than HRNET if increase the resolutoin?(like 384*288)

it would be appretiated if you can share the experiment results in different resolution.

TokenPose

Would you release the code of TokenPose?

position embedding

Hi, first thank you for making this work open-source.
I notice that position embeddings are summed up to sequence at each Transformer layer.
But in Bert or ViT, they only conduct this PE sum operation once before sending the sequence to Transformer encoder.
I wonder what's the motivation to design like this.

Higher accuracy when running test than in README

Hi, is the README accuracy outdated?
I ran the test model (TP_R_256x192_d256_h1024_enc4_mh8) as-is with latest pytorch:

Test: [0/794] Time 1.056 (1.056) Loss 0.0001 (0.0001) Accuracy 0.977 (0.977)
Test: [100/794] Time 0.077 (0.092) Loss 0.0003 (0.0003) Accuracy 0.898 (0.866)
Test: [200/794] Time 0.082 (0.086) Loss 0.0004 (0.0003) Accuracy 0.841 (0.869)
Test: [300/794] Time 0.078 (0.084) Loss 0.0002 (0.0003) Accuracy 0.965 (0.866)
Test: [400/794] Time 0.078 (0.083) Loss 0.0002 (0.0003) Accuracy 0.866 (0.866)
Test: [500/794] Time 0.082 (0.082) Loss 0.0002 (0.0003) Accuracy 0.947 (0.868)
Test: [600/794] Time 0.080 (0.082) Loss 0.0006 (0.0003) Accuracy 0.841 (0.867)
Test: [700/794] Time 0.078 (0.081) Loss 0.0004 (0.0003) Accuracy 0.918 (0.867)
=> writing results json to output/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8/results/keypoints_val2017_results_0.json
Loading and preparing results...
DONE (t=0.33s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type keypoints
DONE (t=2.71s).
Accumulating evaluation results...
DONE (t=0.12s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.751
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.926
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.826
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.719
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.796
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.778
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.932
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.841
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.743
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.830

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
transpose_r 0.751 0.926 0.826 0.719 0.796 0.778 0.932 0.841 0.743 0.830

From README:

Model Input size FPS* GFLOPs AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
TransPose-R-A3 256x192 141 8.0 0.717 0.889 0.788 0.680 0.786 0.771 0.930 0.836 0.727 0.835
TransPose-R-A4 256x192 138 8.9 0.726 0.891 0.799 0.688 0.798 0.780 0.931 0.845 0.735 0.844

Unable to load the model 'tp_r_256x192_enc4_d256_h1024_mh8.pth'

There is an error:

=> loading model from models/pytorch/transpose_coco/tp_r_256x192_enc4_d256_h1024_mh8.pth
Traceback (most recent call last):
File "/data3/wutianyi/anaconda3/envs/ipertest/lib/python3.7/tarfile.py", line 187, in nti
n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 'v2\nq\x03((X'

It seems that there is something wrong with the pth file.

TransPoseV2?

What's the difference between TransPose and TransPoseV2, which can be found in the repo release?

Pose Score

Hi, Can you tell me how to calculate the pose score, using this model?

Fine-tuning model does'nt learn while training

. I am using these parameters found in the main git but Fine-tuning model doesn't seem to learn from new data

criterion = torch.nn.MSELoss(reduction="mean")
pretrain_part = [param for name, param in model.named_parameters()if 'final_layer' not in name]
optimizer = torch.optim.Adam([ {'params': pretrain_part, 'lr':1e-5 },

model loading

#Load pre-trained model and fine-tune
model = torch.hub.load('yangsenius/TransPose:main', 
                       'tpr_a4_256x192',
                       pretrained=True)
for param in model.parameters():
    param.requires_grad = False

model.final_layer =  torch.nn.Sequential(torch.nn.Conv2d(256, 18,1))                                    
model = model.to(device)

#Get model summary
summary(model,
          input_size=(3, 256, 192),
          batch_size=1
          )

Loss:
Epoch:0, time22.861s, loss0.49620020389556885
Epoch:1, time23.730s, loss0.28467278741300106
Epoch:2, time23.186s, loss0.19562865188345313
Epoch:3, time23.089s, loss0.15849934425204992
Epoch:4, time23.006s, loss0.1431811167858541
Epoch:5, time22.985s, loss0.1366584706120193
Epoch:6, time23.165s, loss0.13364548794925213
Epoch:7, time23.167s, loss0.13212952483445406
Epoch:8, time22.938s, loss0.13107420597225428
Epoch:9, time22.993s, loss0.13025714457035065
Epoch:10, time22.988s, loss0.12950784899294376
Epoch:11, time22.960s, loss0.12873529829084873
Epoch:12, time22.971s, loss0.12812337931245565
Epoch:13, time23.206s, loss0.12742456002160907
Epoch:14, time23.180s, loss0.12706445762887597

How to test on our personal images?

Hi, thanks for your amazing work! I want to detect the keypoints of my images, but I don't know how to do it, can you give me some guidance on how to test on my personal images?

Thanks again for your work!

Use different backbones

Hello,

I want to replace ResNet in TransPose-R with different backbones. How should I do it correctly? Should I import the backbone and replace the following code with the forward pass of the new backbone?

x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.reduce(x)

Thanks in advance!

COCO test-dev

Training on COCO train2017 dataset:python tools/train.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml
Testing on COCO val2017 dataset:python tools/test.py --cfg experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml TEST.USE_GT_BBOX True
How to get the result of COCO test-dev?

Pretrained model loss stuck around a point? how many epochs to train model?

How many epochs was the Transpose hA4 pre-trained model fine-tuned on the MPII dataset to get to the benchmarks in the paper?
I am using: the following parameters similar to the paper. but on a dataset with 10K images here
model_tp = torch.hub.load('yangsenius/TransPose:main',
'tph_a4_256x192',
pretrained=True)
model_tp.final_layer = torch.nn.Sequential(torch.nn.Conv2d(96, 18, kernel_size=1))

#Load parameters
model = model_tp.to(device)
pretrain_part = [param for name, param in model.named_parameters()if 'final_layer' not in name]
optimizer = torch.optim.Adam([ {'params': pretrain_part, 'lr':1e-5 },
{'params': model.final_layer.parameters(), 'lr': 1e-4}])
criterion = torch.nn.MSELoss(reduction="mean")

Any suggestion to improve would be helpful this situation. Thanks
I am training trying to fine-tune it but the loss doesn't decrease:
Training model
Epoch:0, loss2.804723664186895, time taken:539.878s
Epoch:1, loss2.263692114269361, time taken:542.564s
Epoch:2, loss1.8802592728752643, time taken:542.661s
Epoch:3, loss1.5531523590907454, time taken:543.041s
Epoch:4, loss1.3379272652091458, time taken:543.445s
Epoch:5, loss1.1180460024625063, time taken:538.449s
Epoch:6, loss0.9673018065514043, time taken:534.550s
Epoch:7, loss0.8572808737517335, time taken:538.618s
Epoch:8, loss0.7790990431094542, time taken:535.940s
Epoch:9, loss0.7243237162474543, time taken:536.291s
Epoch:10, loss0.6794152171351016, time taken:535.745s
Epoch:11, loss0.6420647234190255, time taken:532.800s
Epoch:12, loss0.6094503253116272, time taken:531.308s
Epoch:13, loss0.5824214839958586, time taken:530.418s
Epoch:14, loss0.5580684408778325, time taken:530.618s
Epoch:15, loss0.538073766452726, time taken:531.255s
Epoch:16, loss0.5198041790281422, time taken:531.875s
Epoch:17, loss0.5046796562382951, time taken:529.682s
Epoch:18, loss0.49001771898474544, time taken:529.585s
Epoch:19, loss0.4768067048571538, time taken:530.031s
Epoch:20, loss0.46674167667515576, time taken:534.574s
Epoch:21, loss0.45518148655537516, time taken:532.242s
Epoch:22, loss0.4449854488193523, time taken:532.336s
Epoch:23, loss0.4369037283177022, time taken:533.899s
Epoch:24, loss0.4278696861874778, time taken:532.454s
Epoch:25, loss0.4207416394201573, time taken:538.248s
Epoch:26, loss0.41212902366532944, time taken:541.508s
Epoch:27, loss0.4052599307906348, time taken:540.419s
Epoch:28, loss0.3998840279818978, time taken:541.615s
Epoch:29, loss0.3926734702545218, time taken:541.612s
Epoch:30, loss0.3866453653026838, time taken:541.235s
Epoch:31, loss0.38077057831105776, time taken:540.944s
Epoch:32, loss0.37572325009386986, time taken:540.582s
Epoch:33, loss0.3709150122012943, time taken:540.616s
Epoch:34, loss0.36646912069409154, time taken:540.807s
Epoch:35, loss0.3614582328009419, time taken:541.298s
Epoch:36, loss0.35673171386588365, time taken:537.836s
Epoch:37, loss0.3524343741883058, time taken:538.538s
Epoch:38, loss0.34845523245166987, time taken:539.272s

Question about TransformerEncoder input

Hi, first off thank you for this great work!

I'm trying to implement the Transformer part of your work to a mask r-cnn model. Using a Swin backbone, the RPN gives me n bounding box proposal and for each proposal I have the bbox features shape=[n, 256, 14, 14] extracted by the backbone. Now for each bbox, I would like to get the keypoints with :

    def forward(self, x):
        # x = self.conv1(x)
        # x = self.bn1(x)
        # x = self.relu(x)
        # x = self.maxpool(x)

        # x = self.layer1(x)
        # x = self.layer2(x)
        # x = self.reduce(x)

        n, c, h, w = x.shape
        x = x.flatten(2).permute(2, 0, 1)
        x = self.global_encoder(x, pos=self.pos_embedding)
        x = x.permute(1, 2, 0).contiguous().view(n, c, h, w)
        x = self.deconv_layers(x)
        x = self.final_layer(x)

        return x

I'm having an error at the line self.global_encoder(x, pos=self.pos_embedding)
The size of tensor a (196) must match the size of tensor b (1024) at non-singleton dimension 0
The x input in self.global_encoder(x, pos=self.pos_embedding) is of shape [256, n, 196], which seems wrong? I tried with shape [n, 256, 196] but it doesnt work either. What am I missing?

Get the confidence Score of each keypoint

hi there. Is it possible to get the confidence score for the predicted key point for pre-trained model and custom trained model
what modifications are required to get the probability or confidence scores with the predicted keypoint

Multiple GPUs training

Hello. I have some questions regarding training with multiple GPUs.

GPU setting in config files

In README

We trained our different models on different hardware platforms: 2 x RTX2080Ti GPUs (TP-R-A3, TP-R-A4), 4 x TiTan XP GPUs (TP-H-S, TP-H-A4), and 4 x Tesla P40 GPUs (TP-H-A6).

However, it seems that this note does not match some config files in folder TransPose/experiments/coco/:

  • In TransPose/experiments/coco/transpose_r/TP_R_256x192_d256_h1024_enc4_mh8.yaml, only 1 GPU is used (instead of 2)
    line 7:

    GPUS: (0,)
    
  • In TransPose/experiments/coco/transpose_h/TP_H_w32_256x192_stage3_1_4_d64_h128_relu_enc4_mh1.yaml, 2 GPUs instead of 4

  • In TP_H_w48_256x192_stage3_1_4_d64_h128_relu_enc4_mh1.yaml, TP_H_w48_256x192_stage3_1_4_d96_h192_relu_enc4_mh1.yaml, TP_H_w48_256x192_stage3_1_4_d96_h192_relu_enc5_mh1.yaml, only 1 GPU instead of 4

Maybe the GPU setting is not correct in these config files?

Scaling the batch size and learning rate

As mentioned in #11,

From my experience, the performances of transpose-r models are very sensitive to the initial learning rate. I did not train transpose-r-a4 on 4 or 8 GPUs. I suggest you increase the initial learning rate a little bit at such conditions (with larger batchsize).

Currently I can use 4 RTX2080Ti GPUs for training. Do you have any suggestion for scaling the batch size and learning rate by multiple GPUs training?

Many Thanks in advance!

Only 1 GPU is used for training

I noticed that only 1 GPU is used to train TransPose-R-A4 and lr=0.0001.
Should I change the lr if I want to use 4 or 8 gpus?
Or just keep the same?
Thanks for your reply.

MPII test

How is the Total metric of mpii obtained? In the code, I only see Ubody metric.

How to generate coco keypoint format (json file)?

Hy nice repository.
I'm curious how you can generate a json file from the coco keypoint dataset?. Is there a conversion code?
You linked to this repository, but there I found the dataset has been converted to json format.
In that repository I also found the coco.py code, is this the code used to generate the json file?

Thanks

The meaning of Head?

Nice work. I have a question about what is the head in the pipeline?

There are several meanings of Head: For example, one is the multi-head in the transformer indicates the relationship between different patches; but, my understanding is that, it is the output of deconvolution, such as a 64*48 matrix which is used to calculate the heatmap loss. Is it right?

Heatmaps generation (Jointsdataset.py)

def generate_target(self, joints, joints_vis):
        '''
        :param joints:  [num_joints, 3]
        :param joints_vis: [num_joints, 3]
        :return: target, target_weight(1: visible, 0: invisible)
        '''
        target_weight = np.ones((self.num_joints, 1), dtype=np.float32)
        target_weight[:, 0] = joints_vis[:, 0]

        assert self.target_type == 'gaussian', \
            'Only support gaussian map now!'

        if self.target_type == 'gaussian':
            target = np.zeros((self.num_joints,
                               self.heatmap_size[1],
                               self.heatmap_size[0]),
                              dtype=np.float32)

            tmp_size = self.sigma * 3

            for joint_id in range(self.num_joints):
                target_weight[joint_id] = \
                    self.adjust_target_weight(joints[joint_id], target_weight[joint_id], tmp_size)
                
                if target_weight[joint_id] == 0:
                    continue

                mu_x = joints[joint_id][0]
                mu_y = joints[joint_id][1]
                
                x = np.arange(0, self.heatmap_size[0], 1, np.float32)
                y = np.arange(0, self.heatmap_size[1], 1, np.float32)
                y = y[:, np.newaxis]

                v = target_weight[joint_id]
                if v > 0.5:
                    target[joint_id] = np.exp(- ((x - mu_x) ** 2 + (y - mu_y) ** 2) / (2 * self.sigma ** 2))

        if self.use_different_joints_weight:
            target_weight = np.multiply(target_weight, self.joints_weight)

        return target, 

Hi, in the above code for dataset preparation, what is the input format for joints, joints_vis?
is it joints = [[x1,y1, v1].......[xn, yn,vn]] - where v is the visibility (0- not present and 1 being visible)
is joints_vis also [[x1,y1, v1].......[xn, yn,vn]] ?? I plan to use to apply it to another dataset where I have the following format [[x1,y1, v1].......[xn, yn,vn]]. Thanks

Implementation of MPII set

Hello ,
I am very happy to see such excellent work. How can I use this project to train and test in MPII? I adjusted some parameters, but the result is not very good. Have you done any work on this data set?

In addition, what is the basis for determining the number of heads for different models?

Thank you!

Static quantization of TransPose model in PyTorch? NotImplementedError: Could not run ‘aten::add.out’

I am trying to do a static quantization of the transpose a4 model in PyTorch but I run into this error. It's coming from the aten::add.out operation. How can I modify this in the pre-trained model to make it suitable to test quantization? Thanks

---> 8 model_static_quantized(x).shape

7 frames

/root/.cache/torch/hub/yangsenius_TransPose_main/lib/models/transpose_h.py in forward(self, x)
99 residual = self.downsample(x)
100
→ 101 out += residual
102 out = self.relu(out)
103

NotImplementedError: Could not run ‘aten::add.out’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::add.out’ is only available for these backends: [CPU, CUDA, Meta, MkldnnCPU, SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

About the image patch

Thank you for your excellent work, I found a problem in reading the code:
In VIT, to handle 2D images, they reshape the image x ∈ R^H×W×C into a sequence of flattened 2D patches xp ∈ R^N×(P2·C), where N = HW/P2 and (P, P) is the resolution of each image patch.
In this method, we reshape the image x ∈ R^H×W×C into a sequence of flattened 2D patches xp ∈ R^C×(HW), then embedding is performed. Is our resolution of each image patch (1, 1)??
What are the benefits of this setup?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.