elody-07 / awr-adaptive-weighting-regression Goto Github PK
View Code? Open in Web Editor NEWCode for paper <AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation>. Accepted by AAAI 2020.
License: MIT License
Code for paper <AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation>. Accepted by AAAI 2020.
License: MIT License
Hi,
Do you have any plans to provide the train.py?
Also, are you planning to publish the pre-trained model on MSRA?
Thanks
本人在阅读代码的时候,有些疑惑,希望作者可以抽空解答,谢谢!
util/feature_tool.py
Line 54: offset_ht = F.softmax(offset_ht * 30, dim = -1) # (B, jt_num, F*F)
这里的softmax为什么有“30”这个参数?
Line55: dis = kernel_size - offset_ht * kernel_size # (B, jt_num, F*F)
offset_ht是预测出来的heatmap了,为什么要进行kernel_size - offset_ht * kernel_size这个操作。。。
我对3d pose estimation了解不多,这是我第一次深入看相关论文和代码。
根据我的理解,kernel_size - ||offset_vec_gt|| * kernel_size (论文公式3)这个是用来生产heatmap_gt的吧,即距离joint_gt的点越近,其heatmap_gt值越大。如果我的理解正确的话,在test阶段,预测出来的offset_ht应该直接就可以使用了,不明白要进行kernel_size - offset_ht * kernel_size这个操作。
还有Line 27 offset_norm = offset / dis.unsqueeze(2) # (B, jt_num, 3, F, F)
这里计算的应该是GT的offset,这里的offset能归一化吗?假设coord_jt = (0.1, 0.2, 0.2), gt_jt = (0.2, 0.3, 0.4), offset = gt_jt - coord_jt = (0.1, 0.1, 0.2), dis = sqrt(0.12+0.11+0.2**2) = 0.24, offset_norm = offset / dis = (0.1, 0.1, 0.2) / 0.24 = (0.42, 0.42, 0.84). 问题在于你网络预测时offset_pred=(0.42, 0.42, 0.84), 怎么把他反归一化,回到(0.1, 0.1, 0.2)呢?
期望作者能够进行指正,谢谢!
Dear Auther:
Thank you for share your code, I have several Question about your work.
--First about Aggregation part code in 'feature_tool.py' line 53-54
offset_ht = (offset_ht * mask).view(batch_size, jt_num, -1) # (B, jt_num, F*F)
offset_ht = F.softmax(offset_ht * 30, dim = -1) # (B, jt_num, F*F)
In above code, a mask is used to remove the effect of invalid pixel.
But, after Softmax function, ZEROs in offset_ht caused by mask still have weights.
Is that as you expect?
--Second, you claim an outstanding result in your AAAI paper, expecially on HANDS2017.
Did you other post-process method like MODEL ENSEMBLE?
--Third, the AWR result on LEADERBOARD of HANDS2019 and HANDS2017 decrease a lot recently.
Is that an accident?
Thank you in advance~!
I am an undergrad trying to utilize this algorithm for augmented reality using a video stream from a Kinect V2 so obviously most of the work I'm doing is copy pasting pieces from nyu_loader.py and converting the dataset to work with a video stream, I've read through the paper and the code surrounding the model, and I believe that I'm mostly good but there are some places that I'm unsure what to do and would appreciate guidance:
I haven't figured out what the alternative coordinate system(uvd) is yet(if there's any references you could guide me to I'd love to read them), and because of that I don't quite understand the first 2 magic number of paras = (588.03, 587.07, 320., 240.) in nyu_loader, obviously the second 2(320., 240.) numbers map to the resolution of the Kinect V1, but I'm unsure what the first 2 are and if I need to change them to work with a Kinect V2 frame?
Is there a place I can find a diagram of the hand joint names and their corresponding positions on the hand? I looked at the NYU hand site but couldn't find them.
When finding hands in a video stream, I won't be given the hand center, do you have any recommendations for finding the center? The paper mentions using a network for finding centers but it doesn't mention which one.
Are there any major problems you can plainly forsee that I'm not seeing yet? Things related to the differences between the Kinect V1 and V2 perhaps?
My background is mostly in using opencv for small projects and this is my first time using a CNN for image processing
您好,非常感谢您的工作并愿意分享它们。
为什么对于resnet和hourglass使用了不同的kernel_size(分别设置为了1,和0.4)
本人想要把AWR整合到其他的backbone中(比如HR-net),请问对于不同的backbone应该如何调整kernel_size呢?
作者您好,我在使用Resnet-50网络来训练时,发现训练难以收敛,是否可以提供更详细的参数配置呢?
Dear Auther:
In released code, method "getitem" return numpy format data as below:
return img[np.newaxis, :].astype(np.float32), jt_xyz.astype(np.float32), jt_uvd.astype(np.float32), center_xyz.astype(np.float32), M.astype(np.float32), cube.astype(np.float32)
But in "test.py", code directly use them as torch tensor as below(Line 59):
input = img.cuda()
jt_uvd_gt = jt_uvd_gt.detach().cpu().numpy()
jt_uvd_pred = jt_uvd_pred.detach().cpu().numpy()
jt_xyz_gt = jt_xyz_gt.detach().cpu().numpy()
center_xyz = center_xyz.detach().cpu().numpy()
M = M.detach().numpy()
cube = cube.detach().numpy()
Is that allowed? During my implementation, this part report bug.
Thanks in advance.
LOG as below:
File "train.py", line 116, in train
input = img.cuda()
AttributeError: 'numpy.ndarray' object has no attribute 'cuda'
Your work seems to be really good, and the claims you make on different datasets seems to be really suitable for my project. Which file contains the inferencing code? I can't seem to understand which one it is. I ask this for testing out on some videos to see how the model would perform within a real-time setting on videos.
I want to test your work out with the pretrained model files of MSRA, HANDS 2017. Have you guys been able to retrieve them from your university computer yet?
Please help me out here by guiding me to find and use the inferencing code.
in train.py 121:
loss = (loss_coord + loss_offset)
if i want to try hg_2, loss should be set
loss+ = (loss_coord + loss_offset)
should i do this?PLZ
您好,非常感谢您的工作并愿意分享它们。
我在尝试复现您的工作时,发现在eval_tool.feed()里,对网络预测出来的结果使用M_inv进行了逆变换,但是ground truth并没有使用M_inv进行逆变换。这是为什么呢?
期待得到您的解答。
Hi there,
When I tried to test the pre-trained model, it said:
loading model from ./results/hourglass_1.pth
{'epoch': 14, 'MPE': 7.700112, 'AUC': 0.8504827899520097}
and the result was:
[epoch -1][MPE 25.175][AUC 0.530]
I tried two times with Google Colab. One time I installed the requirements and the other time I did not. Both gave the same result. Any help is appreciated.
(Btw, the hourglass_1 results (hourglass_1.txt) gives the expected error value (7.7)).
Hi there,
Thanks for sharing the training code.
I'm trying to recreate your work and it seems that some configs are missing from the config.py file (system, optimizer and scheduler).
What configs did you use for the Hourglass model?
Thanks
学长好,我想复现你的结果,想知道在Loss里JointLoss和DenseLoss的超参数分别是多少,学习率是怎样降得.
Hi, Can you please provide train.py file for a reference training from groundup?
Dear Author:
In your AAAI 2020 paper, you said "we first train a small separated 2D CNN to attain hand center and extract hand regions from depth images".
During our test, there are some error in the HAND CENTER file provided by V2V-PoseNet
Did you use the same HAND CENTER as V2V-PoseNet provided or you make a new one?
Thanks in advance!
作者你好,我在源代码里面怎么没找到AWR这个模块。请问这个模块在哪里?
Dear Author:
Do you have an expect date to release your Pre-trained model? Looking forward for that a lot.
I thought the pre-trained model is already there because you claim lots of detailed experiments ressult in your paper. But why it takes so long... TT
Hoping the released Model can reproduce your excellent performance!
Thank you in advance.
Hi,
I tried to understand the code but it is not clear on where exactly the adaptive weights are being introduced.
Sorry for stupid question but could you point out ?
Can you provide the files of the coordinates of hand center point of hands2017、ICLV and MSRA datasets? Thank you so much.
Hi,
Thanks for sharing your code.
Which model checkpoint do you use to report the result in your paper? Is it the best epoch? or the last or ... ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.