Coder Social home page Coder Social logo

fabbrimatteo / loco Goto Github PK

View Code? Open in Web Editor NEW
140.0 5.0 26.0 33.33 MB

This repository contains the source code related to the paper Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

License: Other

Python 94.53% C++ 1.29% Cuda 3.73% C 0.45%

loco's Introduction

Learning on Compressed Output (LoCO)

License: CC BY-NC 4.0

Accepted to CVPR 2020

This repo contains the code related to the paper Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation accepted to CVPR 2020 with the instructions for training and testing our models on the JTA dataset. Here you can also find the code for training the Volumetric Heatmap Autoencoder.

Some Results

Input Prediction

Quick Demo

  • run python demo.py --ex=1 (python >= 3.6)
    • please wait some seconds: it will display some precomputed results. You can change the ex number from 1 to 3 to see different results

Compile Cuda Kernel

  • cd into the folder nms3d and run python setup.py install (python >= 3.6). Make sure to add your cuda directory to your environment variables.

Intructions

  • Download the JTA dataset in <your_jta_path>
  • Run python to_poses.py --out_dir_path='poses' --format='torch' (link) to generate the <your_jta_path>/poses directory
  • Run python to_imgs.py --out_dir_path='frames' --img_format='jpg' (link) to generate the <your_jta_path>/frames directory
  • Download our precomputed codes from here and unzip them into <your_jta_path>
  • Modify the conf/default.yaml configuration file specifying the path to the JTA dataset directory
    • JTA_PATH: <your_jta_path>

Train

  • run python main.py default (python >= 3.6)

Show Visual Results

  • run python show.py default (python >= 3.6)
    • Note that, before showing the results, you must have completed at least one training epoch; however, to achieve results comparable to those reported in the paper, it is advisable to carry out a training of at least 100 epochs

Show Paper Results

  • Download the pretrained weights and extract them into the project folder
  • Modify the conf/pretrained.yaml configuration file specifying the path to the JTA dataset directory
    • JTA_PATH: <your_jta_path>
  • run python show.py pretrained to show qualitative results (python >= 3.6)
  • run python eval.py pretrained to obtain the results reported in the paper (python >= 3.6)

Citation

We believe in open research and we are happy if you find this data useful.
If you use it, please cite our work.

@inproceedings{fabbri2020compressed,
   title     = {Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation},
   author    = {Fabbri, Matteo and Lanzi, Fabio and Calderara, Simone and Alletto, Stefano and Cucchiara, Rita},
   booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
   year      = {2020}
 }

License

LoCO is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

loco's People

Contributors

anonymous-goat avatar fabbrimatteo avatar fabiolanzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

loco's Issues

colab demo?

Hello, Thank you for your great work
It will be much appreciated and will be much more easy to test this Amazing repo if you provide a colab notebook demo of this, looking forward to the demo.
Thank you

Association Algorithm

Thanks for your awesome work!

The code in the coords_to_poses function looks like a greedy heuristic association algorithm. But I note that the supplementary material in the paper describes that the skeleton grouping uses the Hungarian algorithm with cost matrix. Is the mentioned Hungarian algorithm not included in the current code or just the coords_to_poses function?

nms3d import error

Hi. I followed the instruction building the nms3D Cuda Kernel.
I ran "python setup.py install" but I keep getting an error. I did add the Cuda to my env.

" import nms3d_cuda
ImportError: libcudart.so.9.1: cannot open shared object file: No such file or directory "

Those the nms3d only support cuda 9.1??
My enviroment is
Ubuntu18.04 with Cuda 10.2 ,Torch 1.6, Torchvision 0.5, Python3.6.

What torch version and cuda version did you exactly use? and how can I solve the error message.
I really would like to run the program.

Thank you.

the cpp file

hello ,I want to ask that how can I import the 'nms3d_cuda' sucessfully?

Getting error if data is for one person

I tested the demo and demo is working well from given examples. As we can see that there are multi-person in given testing data. But when I'm trying to test it with single person then I'm getting TypeError: 'int' object is not subscriptable.

How I tried with single person, below is detail.

In show_poses line 100 of demo.py I defined show_poses(data[1][00]) instead of show_poses(data[1]), data[1][00] means I'm trying to get keypoints only for first person in list of given data.

LoCO/demo.py

Line 100 in c8296d7

show_poses(data[1])

So I got above mentioned error in line 70 of demo.py

LoCO/demo.py

Line 70 in c8296d7

jas = [j for j in coords if j[0] == type_a] # all joints of type 'type_a'

Would you like to help me to sort out this problem that how I can run this demo only for one person because I'm planning to use this method for one person in future.

Thanks

Compile Cuda Kernel

How to do the task??
I couldn't find the answer you told me last time...Sorry,it seems a waste of your time to tell me again.

How to train VHA?

Hi! Thank you for your great work!
It seems that only training code for code predictor is provided. Could you please release training code for VHA? Many thanks!

Time to process one frame of image

Hello,thank you frist for you give me the key to get the dataset.
I have recurrent your work on my PC.What I want to ask is how much the time to process one frame of image on your "PC" is it?

the cpu memory when running code

when I running the code , the process of 'filter_joints' in post_processing.py , error happened:
"numpy.core._exceptions.MemoryError: Unable to allocate 101. GiB for an array with shape (116194, 116194) and data type float64"
while I think the "distance_matrix" doesn't the the shape of (116194,116194).....I couldn't solve it,can I do not add the part -- "filter_joints" ?

How do you create the Volumetric Heatmps?

I see the code for training the code-predict part using the gt-codes like this:
code_path = self.cnf.jta_path / 'codes' / f'{sequence}_{frame_n}.data'
code = torch.load(code_path, map_location=torch.device('cpu'))
So how do you create the "code"? (I have obtained the dataset of yours)
By runging the code in your VHA _master work?
Thank you very much.

Facing "ModuleNotFoundError: No module named "vtkCommonCorePython" error while running demo.py

Hello,

I am trying to run this code but facing the error "ModuleNotFoundError: No module named "vtkCommonCorePython" while running the demo.py file. The file"vtkCommonCorePython" already present in the "vtkmodules" folder. I also added the "vtkmodules" path to the environment variable as suggested by the stackoverflow link "https://stackoverflow.com/questions/13495285/importerror-no-module-named-vtkcommonpython". Still, the issue is unresolvable. Could you please tell me the which version of "vtk" you have used in the code.

Thanks,
Shubham Gupta

How did you do the eval ?

Hello,

Thanks for your amazing work !

I am trying to obtain the same results as you did on the paper so I used the test function in trainer.py but my recall is always lower. How did you do the eval ? Did you use the same parameters in the eval as in pretrained.yaml ?
Can you add you eval code on github ?

I want to get the data set please!

Dear author, I want to get the data set by sending the email to you, but the email fails to be sent. May I ask what should I do? Can you give me your email address? I appreciate it.

get the dataset and code of the GT?

hi,I want to train the model while I do not have the dataset,so could you tell me how can I get it?
And,I don't know what's the formulation meaning:
k = (-1) * np.sqrt(
fx ** 2 * fy ** 2 + fx ** 2 * cy ** 2 - 2 * fx ** 2 * cy * y2d + fx ** 2 * y2d ** 2 +
fy ** 2 * cx ** 2 - 2 * fy ** 2 * cx * x2d + fy ** 2 * x2d ** 2
)

x3d = ((fy * cam_dist * cx) - (fy * cam_dist * x2d)) / k
y3d = ((fx * cy * cam_dist) - (fx * cam_dist * y2d)) / k
z3d = -(fx * fy * cam_dist) / k

return x3d, y3d, z3d

I know it is to get the 3D coordinate in the camera coordinate system,but I can't understand it..Maybe you can give me some suggest
Thank you very much !!!!!

some pro while train

Thanks for your amazing work!
I meet a problem when I run train as your instruction by using codes you provided,
It likes that:
FileNotFoundError : [ Error 2] No such file or directory: Path / LoCO /codes/poses/val') LoCO is my project path
Maybe is the data problem or what, looking forward to hearing from you

how does the to3d func work?

Hi, thanks for sharing the great work.
I have a question about heatmap. I noticed that you encode the distance of the joints to the camera cam_dist = np.sqrt(joint.x3d ** 2 + joint.y3d ** 2 + joint.z3d ** 2) , why not simply encode the z-depth value of the joints joint.z3d? any particular reason?

Besides, could you elaborate a little bit on

LoCO/utils.py

Line 97 in 7ebaca7

def to3d(x2d, y2d, cam_dist, fx, fy, cx, cy):
, basically what does this k value mean?

LoCO/utils.py

Line 113 in 8371ef0

k = (-1) * np.sqrt(

download the dataset

Hello,it's very glad for you to watch this email.I'm Zhang Xin,a first-year graduate student at Zhejiang University of Technology in Zhejiang Province, China.These days I read the paper "Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation" and I want to train the model by myself while I don't have the dataset.

With this email we declare that we will use the JTA Dataset for research and educational purposes only, since we are aware that commercial use is prohibited. We also undertake to purchase a copy of Grand Theft Auto V.

(But I don’t think I have extra money to buy this game,because I'm poor......Maybe my English is not very good,thank you ~)
and my email is [email protected] or [email protected]

Failed to align images with egomotion

After constructing point clouds and applying egomotion to them, point clouds are not aligned.

To reproduce:

Install open3d==0.9.0, throughout the code import open3d as o3d

INTRINSICS = np.array([[1158, 0, 960], [0, 1158, 540], [0, 0, 1]], dtype=np.float64)

def to3d(x2d, y2d, cam_dist, fx, fy, cx, cy):
    k = (-1) * np.sqrt(
        fx ** 2 * fy ** 2 + fx ** 2 * cy ** 2 - 2 * fx ** 2 * cy * y2d + fx ** 2 * y2d ** 2 +
        fy ** 2 * cx ** 2 - 2 * fy ** 2 * cx * x2d + fy ** 2 * x2d ** 2
    )
    x3d = ((fy * cam_dist * cx) - (fy * cam_dist * x2d)) / k
    y3d = ((fx * cy * cam_dist) - (fx * cam_dist * y2d)) / k
    z3d = -(fx * fy * cam_dist) / k
    return x3d, y3d, z3d


def create_cloud(image, depth, intrinsic):
    """ Creates a point cloud from the whole image and depth map
    Args:
        image (ndarray): color image
        depth (ndarray): depth map
        intrinsics (ndarray): intrinsic parameters
    Returns:
        pt_clouds (PointCloud): resulting point clouds
    """
    fx, fy = intrinsics[0][0], intrinsics[1][1]
    cx, cy = intrinsics[0][2], intrinsics[1][2]
    image = (image - image.min()) / (image.max() - image.min())
    u, v = np.where(depth != 0)
    z = depth[depth != 0]
    x3d, y3d, z3d = to3d(u, v, z, fx, fy, cx, cy)
    pts = np.vstack((y3d, x3d, z3d)).T
    pt_cloud = o3d.geometry.PointCloud()
    pt_cloud.points = o3d.utility.Vector3dVector(pts)
    pt_cloud.colors = o3d.utility.Vector3dVector(image[u, v, ...])
    return pt_cloud

Auxiliary function for visualization of the point clouds

def plot_ptcloud(point_clouds, show_frame=True):
    """visualizes point cloud
    Args:
        point_cloud (PointCloud): point cloud to visualize
    """
    # rotate down up
    if not isinstance(point_clouds, list):
        point_clouds = [point_clouds]
    if show_frame:
        mesh_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(
            size=1, origin=[0, 0, 0]
        )
        point_clouds = point_clouds + [mesh_frame]
    o3d.visualization.draw_geometries(point_clouds)

Auxiliary functions for reading the egomotion file


def rt2transformation(rotation, translation):
    """Converts rotation and translation to transformation in homogenenous coordinates
    Args:
        rotation (ndarray): 3x3 matrix
        translation (ndarry): (3, ) translation vector
    Returns:
        trans (ndarray): (4, 4) transformation matrix
    """
    trans = np.concatenate((rotation, translation[:, None]), axis=1)
    trans = np.concatenate((trans, np.array([0, 0, 0, 1])[None, :]), axis=0)
    return trans


def angles2rot(alpha, beta, gamma):
    """Converts radians to ration matrix
    Args:
        alpha, beta, gamma (float): rotations around x, y, z axis
    Returns:
        (ndarray): 3x3 rotation matrix
    """
    rx = np.array(
        [[1, 0, 0], [0, cos(alpha), -sin(alpha)], [0, sin(alpha), cos(alpha)]]
    )
    ry = np.array([[cos(beta), 0, sin(beta)], [0, 1, 0], [-sin(beta), 0, cos(beta)]])
    rz = np.array(
        [[cos(gamma), -sin(gamma), 0], [sin(gamma), cos(gamma), 0], [0, 0, 1]]
    )
    return rz.dot(ry.dot(rx))


def read_motsynth_egomotion_file(path):
    """Reads segmentation mot file with absolute positioning
    Args:
        path (str): path to the ground truth egomotion file
    Returns:
        egomotion (ndarray): array in [n_frames, 4, 4] format
    """
    ego_file = open(path, "r")
    ego_lines = ego_file.readlines()
    transformations = np.zeros((len(ego_lines), 4, 4), dtype=np.float64)
    for i, line in enumerate(ego_lines):
        line = list(map(float, line.split(" ")))
        angles = np.array(line[:3], dtype=np.float64)
        rotation = angles2rot(angles[0], -angles[2], angles[1])  # fix dataset bug
        translation = np.array([line[3], -line[5], line[4]], dtype=np.float64)
        # translation = np.array(line[3:6], dtype=np.float64)
        transformations[i, ...] = rt2transformation(rotation, translation)
    ego_file.close()
    return transformations

The file containing egomotion for sequence 045 and the function which was used to create it

def create_egomotion_file(ann, output_path):
    """ Creates file with camera egomotion """
    egomotion_output_file = open(str(output_path / "gt" / "egomotion.txt"), "w")
    for img_ann in ann["images"]:
        rotation = img_ann["cam_world_rot"]
        translation = img_ann["cam_world_pos"]
        fov = img_ann["cam_fov"]
        print(
            "{} {} {} {} {} {} {}".format(*rotation, *translation, fov),
            file=egomotion_output_file,
        )
    egomotion_output_file.close()

egomotion.txt

Function for reading the motsynth depth image (0-255) range

def load_motsynth_depth_image(img_path, shape=None):
    """Load depth image from .png file
    Args:
        img_path (str): path to the image
    Returns:
        ndarray: depth map
    """
    depth_img = Image.open(img_path).convert("L")
    if shape is not None:
        depth_img = depth_img.resize(shape, Image.NEAREST)
    depth_img = np.array(depth_img)
    depth_img = 255 - depth_img
    depth_img = depth_img / 12  # 1 meter is approximately 12 pixels
    return np.asarray(depth_img, dtype=np.float32)

Training samples

How many samples of CMU panoptic dataset are used for training?

The paper mentions that the experiments about CMU Panoptic follow the test protocol defined in Monocular 3d pose and shape estimation of multiple people in natural scenes.

I read that paper but it only describes the setup for testing. Does it means all of samples (downloaded by panoptic-toolbox) except those hold out from Haggling, Mafia, Ultimatum and Pizza of CMU panoptic are taken as training samples?

something about the model

I know that always asking you some irrelevant questions may waste your time, but I really hope to be able to reproduce your results on my own computer.
I have followed the steps to retrain the code-predcitor part, but the f1-score  is not particularly high during the test. I tried to use your pre-trained model and my own retrained VHA model. The effect was not as good as I expected. Due to the limited storage capacity, I only extracted some video frames and made a "bottleneck code" for code-predictor training. Will this cause the training model to be inaccurate? At present, I can't think of any other way besides tuning parameters...
Thank you very very very very much !!!

My image model inference?

Hello. I ran the demo code and figured out that it only visualizes
the already estimated 2D poses and 3D Poses.

Is there a way to input my own image to the model
and then visualize the 2D image results and 3D Poses which uses mlab(In the Demo.py).??
How can I obtain the 3D pose data just like "1_res.data" in the demo dir.

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.