Coder Social home page Coder Social logo

Comments (16)

vvasco avatar vvasco commented on July 19, 2024 5

Installation

DensePose relies on caffe2 and detectron framework.
I installed the last version of DensePose on my machine:

System: Ubuntu 18.04
Graphics card: NVIDIA GeForce GTX 1050 Ti
Graphics card memory: 4096 MB
CUDA: 10.0
cuDNN: 7.3.1
Caffe2: Built from source (this version)

Following the instructions, I couldn't successfully install DensePose, but in this very useful blog I found solutions to all the problems I encountered (specifically 2.1, 2.2, 2.7, 2.9).
After solving them, I managed to install DensePose successfully.

Usage

DensePose maps all pixels of an RGB image belonging to humans to the 3D surface of the human body.
It relies on DensePose-RCNN to obtain dense part indexes and coordinates within each of the selected parts (IUV representation).

Note: At the current stage, all the provided tools are in python.

Using the IUV representation

I followed this tutorial to run inference on a dataset acquired using the RealSense. In the example, they use a ResNet-101-FPN, which I couldn't run on my machine due to memory problems (apparently 4 GB is not enough for this network). However I could use the ResNet-50-FPN, using the infer_simple.py:

python2 tools/infer_simple.py \
    --cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml \
    --output-dir '/home/vvasco/dev/datasets/r1/latency-dataset/densepose-infer/' \
    --image-ext ppm \
    --wts https://dl.fbaipublicfiles.com/densepose/DensePose_ResNet50_FPN_s1x-e2e.pkl \
    '/home/vvasco/dev/datasets/r1/latency-dataset/img/'

This tool outputs for each image of the input dataset:

  1. a *_.pdf: image containing bounding boxes around people;
  2. a *_IUV.png: image containing part indexes I (24 surface patches) and their U and V coordinates;
  3. a *_INSD.png: image containing the segmented parts;

This is a comparison between yarpOpenPose and DensePose:

yarpOpenPose DensePose
yarpopenpose densepose

Qualitatively, DensePose seems to work well in terms of separating the human parts correctly, also when moving. However, parts occluded completely disappear (for example the arm and the hand when behind the chair).

Note: I had to change detectron/utils/vis.py according to this in order to have unempty INDS images.

Mapping IUV values to 3D SMPL models

In addition to the IUV representation, it's also possible to map the predicted points onto a 3D human model SMPL.
This notebook shows how to map IUV values to 3D SMPL model, but relies on a file demo_dp_single_ann.pkl which doesn't have references on how to construct it.
There is an open PR which does not rely on any file and also speeds up the conversion from IUV to XYZ point on the SMPL. I'm not sure why it is not merged though. I used this fork for mapping, and this is the result on a single image, with 3D points in red and the model in black:

Segmented image 3D points mapped on SMPL

The face is not fully mapped as it is not fully visible on the image, but the points look correctly mapped onto the different patches of the model. We can also distinguish the frontal part of a person from the posterior part (with yarpOpenPose this is not possible, unless we associate a face to the skeleton). It looks promising!
However, there are several points might be critical:

  • the notebook seems to work when there is a single person: there is a pick index, but it's unclear to me how to select it;
  • the time it takes for mapping a single person with the fast implementation is ~8 s;
  • the template model is in a fixed position (static).

from assistive-rehab.

vvasco avatar vvasco commented on July 19, 2024 1

I think I didn't get this point. From the snapshots, It looks like this is holding for both OpenPose and DensePose.

Let me expand a bit this point. What I mean is that if there is an occlusion in DensePose, whole body parts get lost (even if they are not entirely occluded). In yarpOpenPose when key points are missed due to occlusions in 2D, we can still reconstruct them in 3D by applying limb optimization. This might be more difficult when dealing with body parts and would require further investigation.

Here below I've tried to summarize the essential traits you identified; please, correct me if I'm wrong:
DensePose is in Python only, as of now.
In real-time contexts, it provides us with a richer set of body features in 2D compared with OpenPose.
If we want to extract 3D info instead, DensePose seems to be requiring 8 [s] per image.

Exactly! Let me also stress that the examples that I found only deal with a single person and it might not be straightforward to deal with multiple people.

from assistive-rehab.

pattacini avatar pattacini commented on July 19, 2024 1

That's great! Thanks @vvasco for this very exhaustive report.
I think we have now a very clear picture of how DensePose lies with respect to our methodology.

from assistive-rehab.

wine3603 avatar wine3603 commented on July 19, 2024 1

I found the problem is that this note only works for a single person image. If I input an image with more than 1 people, the num pts on the picked person would be 0.

from assistive-rehab.

pattacini avatar pattacini commented on July 19, 2024

Awesome analysis @vvasco 🥇

Here below I've tried to summarize the essential traits you identified; please, correct me if I'm wrong:

  • DensePose is in Python only, as of now.
  • In real-time contexts, it provides us with a richer set of body features in 2D compared with OpenPose.
  • If we want to extract 3D info instead, DensePose seems to be requiring 8 [s] per image.

from assistive-rehab.

pattacini avatar pattacini commented on July 19, 2024

However, parts occluded completely disappear (for example the arm and the hand when behind the chair).

I think I didn't get this point. From the snapshots, It looks like this is holding for both OpenPose and DensePose.

from assistive-rehab.

wine3603 avatar wine3603 commented on July 19, 2024

Hey, I followed your steps but in my case the final points on the smpl is always 0, (picked_person = INDS==1), the output:
'num pts on picked person: (0,)
(0, 3)'
The former visualizations of IUV are all good, I don't know where is the problem, any helps will be appreciated!

from assistive-rehab.

vvasco avatar vvasco commented on July 19, 2024

Hi @wine3603, thanks for the interest in this issue!
The problem is exactly the one you spotted: when you have an image with multiple people, while you can create the IUV representation, you cannot map the points on the 3D SMPL model.

The notebook actually includes a pick index that intuitively, according to me, should be used to select a person from an image with multiple people. Instead, in this case what happens is that no points are found, whatever index you select. It only works with images with one person.

from assistive-rehab.

wine3603 avatar wine3603 commented on July 19, 2024

Hey @vvasco , Thanks for your reply, I am trying to find out how was the INDS.png generated.
I am wondering if INDS==0 means the background masks, then does INDS==1 indicate all the human masks or the first human?

from assistive-rehab.

vvasco avatar vvasco commented on July 19, 2024

Hi @wine3603, I don't think there is a specific order in the INDS values.
Therefore, INDS=0 on the background, then INDS can have different non-zero values (not necessarily 1) according to the number of people.

For example, if you open this INDS image (if you have Matlab, you can use imread), you will see that INDS has several values and different than 1.

from assistive-rehab.

wine3603 avatar wine3603 commented on July 19, 2024

Hi @vvasco , Thanks a lot, now I know where was my misunderstanding.
I followed this notebook, and I found that in the In[4],

pick_idx = 1 # PICK PERSON INDEX!
C = np.where(INDS == pick_idx)

I thought the people masks are labeled as different ID numbers so that this png is named as "index".
Now I understand it was not int, this “1” and the backgrounds “0" are boolean indexing.......
In the case of multi-human images, we have to find a way to generate INDS.png with human IDs, do you have any ideas?

from assistive-rehab.

vvasco avatar vvasco commented on July 19, 2024

The IUV representation provides the part indexes detected and their pixel coordinates.
So you might first transform this into a temporary representation where you associate all 1s to the detected parts. You should extract the positions of the bounding boxes from the pdf image and you might use these to cluster the detected humans: all 1s belonging to the same bounding box form a cluster. The cluster ID would finally be the human ID.

from assistive-rehab.

wine3603 avatar wine3603 commented on July 19, 2024

Hi, @vvasco , I am trying to map multi-view images for one person to the smpl model.
may I ask what is the "3D points mapped on SMPL" visualizer are you using?
I want to try mapping 4 images from 4 corner cameras.

from assistive-rehab.

vvasco avatar vvasco commented on July 19, 2024

Hi @wine3603, I used this notebook for generating the image.
I added this section to the notebook, to make the plot interactive, using plotly library :

import plotly.graph_objs as go
trace1 = go.Scatter3d(
    x=Z, y=X, z=Y,
    mode = 'markers',
    marker=dict(
        color='black',
        size=0.5
    ),
)
trace2 = go.Scatter3d(
    x=collected_z, y=collected_x, z=collected_y,
    mode = 'markers',
    marker=dict(
        color=np.arange(IUV_pick.shape[0]),
        size=1
    ),
)
data = [trace1,trace2]
layout = go.Layout(
    title='Points on the SMPL model',
    showlegend=False,
    scene = dict(
        xaxis = dict(
            range=[-1.0, 1.0],
            title='z'
        ),
        yaxis = dict(
            range=[-1.0, 1.0],
            title='x'
        ),
        zaxis = dict(
            range=[-1.4, 1.0],
            title='y'
        ),
    )
)

fig = dict(data=data, layout=layout)
iplot(fig)

with X,Y,Z identifying the model and collected_x,collected_y,collected_z being the points picked on the person.

from assistive-rehab.

frankkim1108 avatar frankkim1108 commented on July 19, 2024

@wine3603 IUV is INDEX, U coordinates, V coordinates.
INDS has total of 24 values. Each 24 value represent different parts of the body.
INDS==1 is all the coordinates for the back. So if your picture is facing forward, there wouldn't be any coordinates that correspond to the back

I checked out all the parts of the body and found out which number represent which.

  1. body(back)
  2. body (front)
  3. hand (right)
  4. hand (left)
  5. foot (left)
  6. foot (right)
  7. thigh (right, back)
  8. thigh (left , back)
  9. thigh (right , front)
  10. thigh (left, front)
  11. calf (right, back)
  12. calf (left , back)
  13. calf (right, front)
  14. calf (left, front)
  15. upper arm (left, front)
  16. upper arm (right ,front)
  17. upper arm (left, back)
  18. upper arm (right, back)
  19. lower arm (left, front)
  20. lower arm (right ,front)
  21. lower arm (left, back)
  22. lower arm (right, back)
  23. head (right)
  24. head (left)

so if you want to get all the coordinates of the full body
try INDS >= 1

or if you want to get specific body parts use the numbers that represent each body parts.

from assistive-rehab.

wine3603 avatar wine3603 commented on July 19, 2024

Thanks for your replies! @vvasco @frankkim1108
I don't mean to add more specific body parts to the UV map,
I want to combine 4 uv maps generated from 4 viewpoints around the target body.
Is there a good way to fuse the different uv maps to one mode and handle the
overlapped parts?

from assistive-rehab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.