Coder Social home page Coder Social logo

elody-07 / awr-adaptive-weighting-regression Goto Github PK

View Code? Open in Web Editor NEW
125.0 125.0 22.0 44.15 MB

Code for paper <AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation>. Accepted by AAAI 2020.

License: MIT License

Python 100.00%
aaai2020 hand-pose-estimation

awr-adaptive-weighting-regression's People

Contributors

elody-07 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

awr-adaptive-weighting-regression's Issues

train.py not provided

Hi,

Do you have any plans to provide the train.py?
Also, are you planning to publish the pre-trained model on MSRA?

Thanks

对代码部分内容的疑惑

本人在阅读代码的时候,有些疑惑,希望作者可以抽空解答,谢谢!

util/feature_tool.py

Line 54: offset_ht = F.softmax(offset_ht * 30, dim = -1) # (B, jt_num, F*F)
这里的softmax为什么有“30”这个参数?

Line55: dis = kernel_size - offset_ht * kernel_size # (B, jt_num, F*F)
offset_ht是预测出来的heatmap了,为什么要进行kernel_size - offset_ht * kernel_size这个操作。。。

我对3d pose estimation了解不多,这是我第一次深入看相关论文和代码。
根据我的理解,kernel_size - ||offset_vec_gt|| * kernel_size (论文公式3)这个是用来生产heatmap_gt的吧,即距离joint_gt的点越近,其heatmap_gt值越大。如果我的理解正确的话,在test阶段,预测出来的offset_ht应该直接就可以使用了,不明白要进行kernel_size - offset_ht * kernel_size这个操作。

还有Line 27 offset_norm = offset / dis.unsqueeze(2) # (B, jt_num, 3, F, F)
这里计算的应该是GT的offset,这里的offset能归一化吗?假设coord_jt = (0.1, 0.2, 0.2), gt_jt = (0.2, 0.3, 0.4), offset = gt_jt - coord_jt = (0.1, 0.1, 0.2), dis = sqrt(0.12+0.11+0.2**2) = 0.24, offset_norm = offset / dis = (0.1, 0.1, 0.2) / 0.24 = (0.42, 0.42, 0.84). 问题在于你网络预测时offset_pred=(0.42, 0.42, 0.84), 怎么把他反归一化,回到(0.1, 0.1, 0.2)呢?

期望作者能够进行指正,谢谢!

About Aggregation and Challenge Leader Board

Dear Auther:
Thank you for share your code, I have several Question about your work.

   --First about Aggregation part code in 'feature_tool.py' line 53-54
          offset_ht =  (offset_ht * mask).view(batch_size, jt_num, -1) # (B, jt_num, F*F)
          offset_ht = F.softmax(offset_ht * 30, dim = -1) # (B, jt_num, F*F)
      In above code, a mask is used to remove the effect of invalid pixel. 
      But, after Softmax function, ZEROs in offset_ht  caused by mask still have weights. 
      Is that as you expect?

   --Second, you claim an outstanding result in your AAAI paper, expecially on HANDS2017. 
    Did you other post-process method like MODEL ENSEMBLE?

    --Third, the AWR result on LEADERBOARD of HANDS2019 and HANDS2017 decrease a lot recently. 
     Is that an accident?


    Thank you in advance~!

Application Using A Datastream

I am an undergrad trying to utilize this algorithm for augmented reality using a video stream from a Kinect V2 so obviously most of the work I'm doing is copy pasting pieces from nyu_loader.py and converting the dataset to work with a video stream, I've read through the paper and the code surrounding the model, and I believe that I'm mostly good but there are some places that I'm unsure what to do and would appreciate guidance:

  1. I haven't figured out what the alternative coordinate system(uvd) is yet(if there's any references you could guide me to I'd love to read them), and because of that I don't quite understand the first 2 magic number of paras = (588.03, 587.07, 320., 240.) in nyu_loader, obviously the second 2(320., 240.) numbers map to the resolution of the Kinect V1, but I'm unsure what the first 2 are and if I need to change them to work with a Kinect V2 frame?

  2. Is there a place I can find a diagram of the hand joint names and their corresponding positions on the hand? I looked at the NYU hand site but couldn't find them.

  3. When finding hands in a video stream, I won't be given the hand center, do you have any recommendations for finding the center? The paper mentions using a network for finding centers but it doesn't mention which one.

  4. Are there any major problems you can plainly forsee that I'm not seeing yet? Things related to the differences between the Kinect V1 and V2 perhaps?

My background is mostly in using opencv for small projects and this is my first time using a CNN for image processing

关于超参数kernel_size设置的问题

您好,非常感谢您的工作并愿意分享它们。
为什么对于resnet和hourglass使用了不同的kernel_size(分别设置为了1,和0.4)
本人想要把AWR整合到其他的backbone中(比如HR-net),请问对于不同的backbone应该如何调整kernel_size呢?

基于Resnet-50训练的配置

作者您好,我在使用Resnet-50网络来训练时,发现训练难以收敛,是否可以提供更详细的参数配置呢?

dataloader issue from numpy to torch.tensor

Dear Auther:
In released code, method "getitem" return numpy format data as below:
return img[np.newaxis, :].astype(np.float32), jt_xyz.astype(np.float32), jt_uvd.astype(np.float32), center_xyz.astype(np.float32), M.astype(np.float32), cube.astype(np.float32)
But in "test.py", code directly use them as torch tensor as below(Line 59):
input = img.cuda()
jt_uvd_gt = jt_uvd_gt.detach().cpu().numpy()
jt_uvd_pred = jt_uvd_pred.detach().cpu().numpy()
jt_xyz_gt = jt_xyz_gt.detach().cpu().numpy()
center_xyz = center_xyz.detach().cpu().numpy()
M = M.detach().numpy()
cube = cube.detach().numpy()
Is that allowed? During my implementation, this part report bug.
Thanks in advance.

LOG as below:
File "train.py", line 116, in train
input = img.cuda()
AttributeError: 'numpy.ndarray' object has no attribute 'cuda'

Where is the inferencing code?

Your work seems to be really good, and the claims you make on different datasets seems to be really suitable for my project. Which file contains the inferencing code? I can't seem to understand which one it is. I ask this for testing out on some videos to see how the model would perform within a real-time setting on videos.

I want to test your work out with the pretrained model files of MSRA, HANDS 2017. Have you guys been able to retrieve them from your university computer yet?

Please help me out here by guiding me to find and use the inferencing code.

i want train hg_2,about loss

in train.py 121:
loss = (loss_coord + loss_offset)

if i want to try hg_2, loss should be set
loss+ = (loss_coord + loss_offset)

should i do this?PLZ

关于eval_tool.feed()逆变换的问题

您好,非常感谢您的工作并愿意分享它们。
我在尝试复现您的工作时,发现在eval_tool.feed()里,对网络预测出来的结果使用M_inv进行了逆变换,但是ground truth并没有使用M_inv进行逆变换。这是为什么呢?
期待得到您的解答。

Pre-trained Model

Hi there,
When I tried to test the pre-trained model, it said:

loading model from ./results/hourglass_1.pth
{'epoch': 14, 'MPE': 7.700112, 'AUC': 0.8504827899520097}

and the result was:

[epoch -1][MPE 25.175][AUC 0.530]

I tried two times with Google Colab. One time I installed the requirements and the other time I did not. Both gave the same result. Any help is appreciated.

(Btw, the hourglass_1 results (hourglass_1.txt) gives the expected error value (7.7)).

Some missing configs

Hi there,
Thanks for sharing the training code.
I'm trying to recreate your work and it seems that some configs are missing from the config.py file (system, optimizer and scheduler).
What configs did you use for the Hourglass model?
Thanks

训练细节

学长好,我想复现你的结果,想知道在Loss里JointLoss和DenseLoss的超参数分别是多少,学习率是怎样降得.

Training file

Hi, Can you please provide train.py file for a reference training from groundup?

About Center Point File

Dear Author:
In your AAAI 2020 paper, you said "we first train a small separated 2D CNN to attain hand center and extract hand regions from depth images".
During our test, there are some error in the HAND CENTER file provided by V2V-PoseNet
Did you use the same HAND CENTER as V2V-PoseNet provided or you make a new one?
Thanks in advance!

AWR模块没找到

作者你好,我在源代码里面怎么没找到AWR这个模块。请问这个模块在哪里?

About pre-trained model

Dear Author:
Do you have an expect date to release your Pre-trained model? Looking forward for that a lot.
I thought the pre-trained model is already there because you claim lots of detailed experiments ressult in your paper. But why it takes so long... TT
Hoping the released Model can reproduce your excellent performance!
Thank you in advance.

About AWR in PoseNet code

Hi,

I tried to understand the code but it is not clear on where exactly the adaptive weights are being introduced.
Sorry for stupid question but could you point out ?

Checkpoint to report in your paper

Hi,
Thanks for sharing your code.
Which model checkpoint do you use to report the result in your paper? Is it the best epoch? or the last or ... ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.