Comments (16)
Installation
DensePose relies on caffe2 and detectron framework.
I installed the last version of DensePose on my machine:
System: Ubuntu 18.04
Graphics card: NVIDIA GeForce GTX 1050 Ti
Graphics card memory: 4096 MB
CUDA: 10.0
cuDNN: 7.3.1
Caffe2: Built from source (this version)
Following the instructions, I couldn't successfully install DensePose, but in this very useful blog I found solutions to all the problems I encountered (specifically 2.1, 2.2, 2.7, 2.9).
After solving them, I managed to install DensePose successfully.
Usage
DensePose maps all pixels of an RGB image belonging to humans to the 3D surface of the human body.
It relies on DensePose-RCNN to obtain dense part indexes and coordinates within each of the selected parts (IUV representation).
Note: At the current stage, all the provided tools are in python.
Using the IUV representation
I followed this tutorial to run inference on a dataset acquired using the RealSense. In the example, they use a ResNet-101-FPN, which I couldn't run on my machine due to memory problems (apparently 4 GB is not enough for this network). However I could use the ResNet-50-FPN, using the infer_simple.py
:
python2 tools/infer_simple.py \
--cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml \
--output-dir '/home/vvasco/dev/datasets/r1/latency-dataset/densepose-infer/' \
--image-ext ppm \
--wts https://dl.fbaipublicfiles.com/densepose/DensePose_ResNet50_FPN_s1x-e2e.pkl \
'/home/vvasco/dev/datasets/r1/latency-dataset/img/'
This tool outputs for each image of the input dataset:
- a
*_.pdf
: image containing bounding boxes around people; - a
*_IUV.png
: image containing part indexes I (24 surface patches) and their U and V coordinates; - a
*_INSD.png
: image containing the segmented parts;
This is a comparison between yarpOpenPose
and DensePose
:
yarpOpenPose |
DensePose |
---|---|
![]() |
![]() |
Qualitatively, DensePose seems to work well in terms of separating the human parts correctly, also when moving. However, parts occluded completely disappear (for example the arm and the hand when behind the chair).
Note: I had to change detectron/utils/vis.py
according to this in order to have unempty INDS images.
Mapping IUV values to 3D SMPL models
In addition to the IUV representation, it's also possible to map the predicted points onto a 3D human model SMPL.
This notebook shows how to map IUV values to 3D SMPL model, but relies on a file demo_dp_single_ann.pkl
which doesn't have references on how to construct it.
There is an open PR which does not rely on any file and also speeds up the conversion from IUV to XYZ point on the SMPL. I'm not sure why it is not merged though. I used this fork for mapping, and this is the result on a single image, with 3D points in red and the model in black:
Segmented image | 3D points mapped on SMPL |
---|---|
The face is not fully mapped as it is not fully visible on the image, but the points look correctly mapped onto the different patches of the model. We can also distinguish the frontal part of a person from the posterior part (with yarpOpenPose
this is not possible, unless we associate a face to the skeleton). It looks promising!
However, there are several points might be critical:
- the notebook seems to work when there is a single person: there is a pick index, but it's unclear to me how to select it;
- the time it takes for mapping a single person with the fast implementation is ~8 s;
- the template model is in a fixed position (static).
from assistive-rehab.
I think I didn't get this point. From the snapshots, It looks like this is holding for both OpenPose and DensePose.
Let me expand a bit this point. What I mean is that if there is an occlusion in DensePose, whole body parts get lost (even if they are not entirely occluded). In yarpOpenPose
when key points are missed due to occlusions in 2D, we can still reconstruct them in 3D by applying limb optimization. This might be more difficult when dealing with body parts and would require further investigation.
Here below I've tried to summarize the essential traits you identified; please, correct me if I'm wrong:
DensePose is in Python only, as of now.
In real-time contexts, it provides us with a richer set of body features in 2D compared with OpenPose.
If we want to extract 3D info instead, DensePose seems to be requiring 8 [s] per image.
Exactly! Let me also stress that the examples that I found only deal with a single person and it might not be straightforward to deal with multiple people.
from assistive-rehab.
That's great! Thanks @vvasco for this very exhaustive report.
I think we have now a very clear picture of how DensePose lies with respect to our methodology.
from assistive-rehab.
I found the problem is that this note only works for a single person image. If I input an image with more than 1 people, the num pts on the picked person would be 0.
from assistive-rehab.
Awesome analysis @vvasco 🥇
Here below I've tried to summarize the essential traits you identified; please, correct me if I'm wrong:
- DensePose is in Python only, as of now.
- In real-time contexts, it provides us with a richer set of body features in 2D compared with OpenPose.
- If we want to extract 3D info instead, DensePose seems to be requiring
8 [s]
per image.
from assistive-rehab.
However, parts occluded completely disappear (for example the arm and the hand when behind the chair).
I think I didn't get this point. From the snapshots, It looks like this is holding for both OpenPose and DensePose.
from assistive-rehab.
Hey, I followed your steps but in my case the final points on the smpl is always 0, (picked_person = INDS==1), the output:
'num pts on picked person: (0,)
(0, 3)'
The former visualizations of IUV are all good, I don't know where is the problem, any helps will be appreciated!
from assistive-rehab.
Hi @wine3603, thanks for the interest in this issue!
The problem is exactly the one you spotted: when you have an image with multiple people, while you can create the IUV representation, you cannot map the points on the 3D SMPL model.
The notebook actually includes a pick index that intuitively, according to me, should be used to select a person from an image with multiple people. Instead, in this case what happens is that no points are found, whatever index you select. It only works with images with one person.
from assistive-rehab.
Hey @vvasco , Thanks for your reply, I am trying to find out how was the INDS.png generated.
I am wondering if INDS==0 means the background masks, then does INDS==1 indicate all the human masks or the first human?
from assistive-rehab.
Hi @wine3603, I don't think there is a specific order in the INDS values.
Therefore, INDS=0 on the background, then INDS can have different non-zero values (not necessarily 1) according to the number of people.
For example, if you open this INDS image (if you have Matlab, you can use imread
), you will see that INDS has several values and different than 1.
from assistive-rehab.
Hi @vvasco , Thanks a lot, now I know where was my misunderstanding.
I followed this notebook, and I found that in the In[4],
pick_idx = 1 # PICK PERSON INDEX!
C = np.where(INDS == pick_idx)
I thought the people masks are labeled as different ID numbers so that this png is named as "index".
Now I understand it was not int, this “1” and the backgrounds “0" are boolean indexing.......
In the case of multi-human images, we have to find a way to generate INDS.png with human IDs, do you have any ideas?
from assistive-rehab.
The IUV representation provides the part indexes detected and their pixel coordinates.
So you might first transform this into a temporary representation where you associate all 1s to the detected parts. You should extract the positions of the bounding boxes from the pdf image and you might use these to cluster the detected humans: all 1s belonging to the same bounding box form a cluster. The cluster ID would finally be the human ID.
from assistive-rehab.
Hi, @vvasco , I am trying to map multi-view images for one person to the smpl model.
may I ask what is the "3D points mapped on SMPL" visualizer are you using?
I want to try mapping 4 images from 4 corner cameras.
from assistive-rehab.
Hi @wine3603, I used this notebook for generating the image.
I added this section to the notebook, to make the plot interactive, using plotly library :
import plotly.graph_objs as go
trace1 = go.Scatter3d(
x=Z, y=X, z=Y,
mode = 'markers',
marker=dict(
color='black',
size=0.5
),
)
trace2 = go.Scatter3d(
x=collected_z, y=collected_x, z=collected_y,
mode = 'markers',
marker=dict(
color=np.arange(IUV_pick.shape[0]),
size=1
),
)
data = [trace1,trace2]
layout = go.Layout(
title='Points on the SMPL model',
showlegend=False,
scene = dict(
xaxis = dict(
range=[-1.0, 1.0],
title='z'
),
yaxis = dict(
range=[-1.0, 1.0],
title='x'
),
zaxis = dict(
range=[-1.4, 1.0],
title='y'
),
)
)
fig = dict(data=data, layout=layout)
iplot(fig)
with X,Y,Z
identifying the model and collected_x,collected_y,collected_z
being the points picked on the person.
from assistive-rehab.
@wine3603 IUV is INDEX, U coordinates, V coordinates.
INDS has total of 24 values. Each 24 value represent different parts of the body.
INDS==1 is all the coordinates for the back. So if your picture is facing forward, there wouldn't be any coordinates that correspond to the back
I checked out all the parts of the body and found out which number represent which.
- body(back)
- body (front)
- hand (right)
- hand (left)
- foot (left)
- foot (right)
- thigh (right, back)
- thigh (left , back)
- thigh (right , front)
- thigh (left, front)
- calf (right, back)
- calf (left , back)
- calf (right, front)
- calf (left, front)
- upper arm (left, front)
- upper arm (right ,front)
- upper arm (left, back)
- upper arm (right, back)
- lower arm (left, front)
- lower arm (right ,front)
- lower arm (left, back)
- lower arm (right, back)
- head (right)
- head (left)
so if you want to get all the coordinates of the full body
try INDS >= 1
or if you want to get specific body parts use the numbers that represent each body parts.
from assistive-rehab.
Thanks for your replies! @vvasco @frankkim1108
I don't mean to add more specific body parts to the UV map,
I want to combine 4 uv maps generated from 4 viewpoints around the target body.
Is there a good way to fuse the different uv maps to one mode and handle the
overlapped parts?
from assistive-rehab.
Related Issues (20)
- Remember to update CI for VCPKG Ports name HOT 1
- Update YARP portmonitor initialization HOT 5
- Use `depthimage_compression_zlib` pormonitor HOT 1
- Dependencies for cornerRefinementMethod HOT 2
- `lineDetector` and `skeletonRetriever` are to rely on I/F to get depth camera's params HOT 2
- CI fails because of `curl` build error HOT 4
- Setup and test the TUG demo on R1SN003 HOT 41
- Develop the connection Navigation ➡️ GRACE HOT 2
- Refine and test speech failure detection on R1SN003
- Host docker images on `ghcr.io/robotology` HOT 3
- Update kinematics for use with top camera
- Test the TUG metrics accuracy with and without human planes projection HOT 3
- Expose the available TUG metrics to the end user HOT 8
- Investigate the possibility to replace the Wi-Fi buttons HOT 3
- General upgrades to assistive-rehab docker image HOT 1
- Release `v0.8.0`
- Update website
- Improve the peak finding algorithm for gait estimation HOT 5
- Investigate possible race condition in `googleSpeech` module
- Improve handling of questions by `managerTUG` HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assistive-rehab.