Coder Social home page Coder Social logo

paulpanwang / pope Goto Github PK

View Code? Open in Web Editor NEW
130.0 10.0 12.0 214.73 MB

Welcome to the project repository for POPE (Promptable Pose Estimation), a state-of-the-art technique for 6-DoF pose estimation of any object in any scene using a single reference.

Home Page: https://paulpanwang.github.io/POPE/

Shell 0.25% Python 96.54% JavaScript 0.59% TypeScript 2.17% HTML 0.08% SCSS 0.01% Dockerfile 0.36%
pose-estimation segment-anything dinov2 image-matching

pope's Introduction

POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference

Project Page Paper

1The University of Texas at Austin   2ByteDance   *denotes equal contribution

Welcome to the project repository for POPE (Promptable Pose Estimation), a state-of-the-art technique for 6-DoF pose estimation of any object in any scene using a single reference.

Preparation

Installation

Docker setup

Please check docker/README.MD

OR you can follow the steps below:

The code is tested with python 3.9, cuda == 11.3, pytorch == 1.10.1. Additionally dependencies include:

h5py
kornia
torch
torchvision
omegaconf
torchmetrics==0.10.3
fvcore
iopath
submitit
pathlib
transforms3d
numpy
plyfile
easydict
scikit-image
matplotlib
pyyaml
tabulate
numpy
tqdm
loguru
opencv-python
--extra-index-url https://pypi.nvidia.com
pip3 install -r ./requirements.txt

Download model checkpoints

download SegmentAnything Model to weights

wget   https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth  -O weights/sam_vit_h_4b8939.pth

download DINOv2 Model to weights

wget  https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth -O   weights/dinov2_vits14.pth

Prepare datasets (Updated dataset download links)

Download datasets from the Hugging Face Website: download OnePose/OnePose_LowTexture datasets from here YCB-Video and LINEMOD dataset from here, and extract them into ./data.

If you want to evaluate on LINEMOD dataset, download the real training data, test data and 3D object models from CDPN, and detection results by YOLOv5 from here. Then extract them into ./data

The directory should be organized in the following structure:

    |--📂data
    |       |--- 📂ycbv
    |       |--- 📂OnePose_LowTexture
    |       |--- 📂demos
    |       |--- 📂onepose
    |       |--- 📂LM_dataset
    |       |      |--- 📂bbox_2d
    |       |      |--- 📂corlor
    |       |      |--- 📂color_full
    |       |      |--- 📂intrin
    |       |      |--- 📂intrin_ba
    |       |      |--- 📂poses_ba
    |       |      |--- 📜box3d_corners.txt
    

Demos

Thank you for your attention, and I apologize for the excessive use of hard-coded values in the code. We have now organized the code structure and README to make it more user-friendly.

The code has been recently tidied up for release and could perhaps contain tiny bugs. Please feel free to open an issue.

bash demo.sh
# Demo1: visual DINOv2 feature
python3 visual_dinov2.py

# Demo2: visual Segment Anything Model
python3 visual_sam.py
# Demo2: visual 3D BBox
python3 visual_3dbbox.py

Evaluation

python3 eval_linemod_json.py
python3 eval_onepose_json.py
python3 eval_ycb_json.py

Zero-shot Promtable Pose Estimation

Some Visual Examples of Promptable Object Pose Estimation Test Cases on Outdoor, indoor and scene with severe occlutions.

We also conduct a more challenging evaluation using an edge map as the reference, which further demonstrates the robustness of POPE(DINOv2 and Matcher).

Application on Novel View Synthesis

We show the Application of Novel View Synthesis, by leveraging the estimated object poses, our method generate photo-realistic rendering results. we employ the estimated multi-view poses obtained from our POPE model, in combi nation with a pre-trained and generalizable Neural Radiance Field (GNT and Render)

Comparison based on Video and Image

We show Visualizations on LINEMOD, YCB-Video, OnePose and OnePose++ datasets, with the comparison with LoFTR and Gen6D.

Citation

If you find this repo is helpful, please consider citing:

@article{fan2023pope,
  title={POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference},
  author={Fan, Zhiwen and Pan, Panwang and Wang, Peihao and Jiang, Yifan and Xu, Dejia and Jiang, Hanwen and Wang, Zhangyang},
  journal={arXiv preprint arXiv:2305.15727},
  year={2023}
}

pope's People

Contributors

athinkingneal avatar ir1d avatar paulpanwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pope's Issues

Relative pose vs Actual pose

Hi,

I am new to pose estimation, so this question might be stupid. But I wanted to ask that your method POPE gives relative position and methods like OnePose provide the actual pose, is that correct?
I would greatly appreciate your help!

no module name einops

Hello, i followed your tutorial, but when i run python3 visual_sam.py, i got this error: Traceback (most recent call last):
File "/home/apicoo3569/POPE/visual_sam.py", line 1, in
from pope_model_api import *
File "/home/apicoo3569/POPE/pope_model_api.py", line 53, in
from src.matcher import Matcher, default_cfg
File "/home/apicoo3569/POPE/src/matcher/init.py", line 1, in
from .matcher import Matcher
File "/home/apicoo3569/POPE/src/matcher/matcher.py", line 3, in
from einops.einops import rearrange
ModuleNotFoundError: No module named 'einops'

please help me to fix. thank you

The generation of data pairs

Thank you for making the awesome project open-source!

I've noticed that the data pairs are pre-defined in the JSON files (like LMO). So how do you generate these data pairs, by random or according to some principles?

Looking forward to your reply.

Estimate the 6DoF Object Pose

Hi,

Thanks for the nice work!

I noticed that only the relative 3D rotation accuracy is evaluated and reported in the paper, how about the relative 3D translation?
Is it possible to estimate the full 6DoF pose using POPE?

Bounding Box Visualization

I saw that you mentioned visualization in the paper, "It is important to note that the visualization of object boxes incorporates ground-truth translation to address scale ambiguity."

Q. Does it also need the 3D size of the object to visualize like below? Or do you not need the 3D size of the matched CAD Model to estimate the bounding box with the pose?

POPE/visual_3dbbox.py

Lines 31 to 41 in 92c5cdb

x, y , z = 3.793429999999999719e-02, 3.879959999999999659e-02 ,4.588450000000000167e-02
_3d_bbox = np.array([
[-x, -y , -z],
[-x, -y , z],
[-x, y , z],
[-x, y , -z],
[x, -y , -z],
[x, -y , z],
[x, y , z],
[x, y , -z],
])

About demo

Hello, I would like to run your code. A simple demo or sample script or inference on a complete data set(, etc.) is fine. Could you tell me the corresponding running command or method?

Thanks in advance for your help.

Explanation on the demo

Thanks for the contribution and releasing the code for this project, the work done is really interesting.

Regarding the visual_3dbbox.py demo, could you explain what are the prompt.txt and target.txt? I look forward to testing on other prompt and target images.

CrossAtten(default_cfg['coarse'], token_dim, ['cross']*2 ) TypeError: __init__() takes 2 positional arguments but 4 were given

Namespace(w_tr=10.0, w_rot=10.0, warmup=10000, batch=32, steps=120000, lr=0.003, clip=2.5, weight_decay=1e-05, num_workers=4, no_ddp=True, gpus=4, ckpt='', name='bla', exp=None, use_mini_dataset=False, dataset='objverse', no_pos_encoding=False, noess=False, cross_features=False, use_single_softmax=False, l1_pos_encoding=False)
xFormers not available
xFormers not available
Traceback (most recent call last):
File "/data/users/liming/CV/POPE/train_dinov2_pose.py", line 243, in
train(args.gpus, args)
File "/data/users/liming/CV/POPE/train_dinov2_pose.py", line 50, in train
model = DINOv2Poser(default_cfg)
File "/data/users/liming/CV/POPE/models/dinov2_regression_modelv3.py", line 105, in init
self.cross_attentionAll = CrossAtten(default_cfg['coarse'], token_dim, ['cross']*2 )
TypeError: init() takes 2 positional arguments but 4 were given

Runnable Dockerfile

Hi, I have build a dockerfile that can pass current codebase test command, should I just submit a pull request?

P.S.

the codebase is runnable but still some problems remain like the unavailable xFormers

question about dinov2

Hello, I have a question about how to use DINOv2. Could you please help me?I instantiated a vit_small ViT model and tried to load the pretrained weights using the load_pretrained_weights function from utils. Here's the code I wrote:
self.vit_model = vits.dict'vit_small'
load_pretrained_weights(self.vit_model,'https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth', None)
However, I encountered the following error:
Traceback (most recent call last):
File "/data/PycharmProjects/train.py", line 124, in
model = model(aff_classes=args.num_classes)
File "/data/PycharmProjects/models/locate.py", line 89, in init
load_pretrained_weights(self.vit_model, pretrained_url, None)
File "/data/PycharmProjects/models/dinov2/dinov2/utils/utils.py", line 32, in load_pretrained_weights
msg = model.load_state_dict(state_dict, strict=False)
File "/home/ustc/anaconda3/envs/locate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DinoVisionTransformer:
size mismatch for pos_embed: copying a param with shape torch.Size([1, 1370, 384]) from checkpoint, the shape in current model is torch.Size([1, 257, 384]).

Could you please help me understand what might be causing this issue? Thank you for your assistance.

About using multiple images

In your work, it was mentioned that better results can be achieved by inputting multiple images. I would like to evaluate multiple images as input, how to proceed.

Thanks in advance for your help!

How to calculate K0

Hello, I am currently trying to use your algorithm. I was wondering how you calculated K0? This is the intrinsic matrix for the reference image. The reference image in turn is a section of a scene image, right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.