Coder Social home page Coder Social logo

athinkingneal / foundationpose Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvlabs/foundationpose

0.0 0.0 0.0 120.66 MB

[CVPR 2024] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Home Page: https://nvlabs.github.io/FoundationPose/

License: Other

Shell 0.22% C++ 2.33% Python 84.44% C 0.37% Cuda 12.39% CMake 0.24%

foundationpose's Introduction

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

[Paper] [Website]

Contributors: Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions.


๐Ÿฅ‡ No. 1 on the world-wide BOP leaderboard (as of 2024/03) for model-based novel object pose estimation.

Demos

Robotic Applications:

robot_mustard.mp4

AR Applications:

ar_maze_c.mp4

Results on YCB-Video dataset:

ycbv_tracking_c.mp4

Bibtex

@InProceedings{foundationposewen2024,
author        = {Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield},
title         = {{FoundationPose}: Unified 6D Pose Estimation and Tracking of Novel Objects},
booktitle     = {CVPR},
year          = {2024},
}

If you find the model-free setup useful, please also consider cite:

@InProceedings{bundlesdfwen2023,
author        = {Bowen Wen and Jonathan Tremblay and Valts Blukis and Stephen Tyree and Thomas M\"{u}ller and Alex Evans and Dieter Fox and Jan Kautz and Stan Birchfield},
title         = {{BundleSDF}: {N}eural 6-{DoF} Tracking and {3D} Reconstruction of Unknown Objects},
booktitle     = {CVPR},
year          = {2023},
}

Data prepare

  1. Download all network weights from here and put them under the folder weights/

  2. Download demo data and extract them under the folder demo_data/

  3. [Optional] Download our large-scale training data: "FoundationPose Dataset"

  4. [Optional] Download our preprocessed reference views here in order to run model-free few-shot version.

Env setup option 1: docker (recommended)

cd docker/
docker pull wenbowen123/foundationpose && docker tag wenbowen123/foundationpose foundationpose  # Or to build from scratch: docker build --network host -t foundationpose .
bash docker/run_container.sh

If it's the first time you launch the container, you need to build extensions.

bash build_all.sh

Later you can execute into the container without re-build.

docker exec -it foundationpose bash

Env setup option 2: conda (experimental)

create -n foundationpose python=3.8
conda activate foundationpose
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install scipy joblib scikit-learn ruamel.yaml trimesh pyyaml opencv-python imageio open3d transformations warp-lang einops kornia pyrender pysdf
pip install git+https://github.com/facebookresearch/segment-anything.git
git clone https://github.com/NVlabs/nvdiffrast
cd nvdiffrast && pip install .
pip install scikit-image meshcat webdataset omegaconf pypng Panda3D simplejson bokeh roma seaborn pin opencv-contrib-python openpyxl torchnet Panda3D bokeh wandb colorama GPUtil imgaug Ninja xlsxwriter timm albumentations xatlas rtree nodejs jupyterlab objaverse g4f ultralytics==8.0.120 pycocotools py-spy pybullet videoio numba
pip install -U git+https://github.com/lilohuang/PyTurboJPEG.git
conda install -y -c anaconda h5py
cd foundationpose/ && bash build_all.sh

Run model-based demo

The paths have been set in argparse by default. If you need to change the scene, you can pass the args accordingly. By running on the demo data, you should be able to see the robot manipulating the mustard bottle. Pose estimation is conducted on the first frame, then it automatically switches to tracking mode for the rest of the video. The resulting visualizations will be saved to the debug_dir specified in the argparse. (Note the first time running could be slower due to online compilation)

python run_demo.py

Feel free to try on other objects (no need to retrain) such as driller, by changing the paths in argparse.

Run on public datasets (LINEMOD, YCB-Video)

For this you first need to download LINEMOD dataset and YCB-Video dataset.

To run model-based version on these two datasets respectively, set the paths based on where you download. The results will be saved to debug folder

python run_linemod.py --linemod_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD --use_reconstructed_mesh 0

python run_ycb_video.py --ycbv_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB-Video --use_reconstructed_mesh 0

To run model-free few-shot version. You first need to train Neural Object Field. ref_view_dir is based on where you download in the above "Data prepare" section. Set the dataset flag to your interested dataset.

python bundlesdf/run_nerf.py --ref_view_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video/bowen_addon/ref_views_16 --dataset ycbv

Then run the similar command as the model-based version with some small modifications. Here we are using YCB-Video as example:

python run_ycb_video.py --ycbv_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB-Video --use_reconstructed_mesh 1 --ref_view_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video/bowen_addon/ref_views_16

Training data download

Our training data include scenes using 3D assets from GSO and Objaverse, rendered with high quality photo-realism and large domain randomization. Each data point includes RGB, depth, object pose, camera pose, instance segmentation, 2D bounding box. [Google Drive]

Notes

Due to the legal restrictions of Stable-Diffusion that is trained on LAION dataset, we are not able to release the diffusion-based texture augmented data, nor the pretrained weights using it. We thus release the version without training on diffusion-augmented data. Slight performance degradation is expected.

Acknowledgement

We would like to thank Jeff Smith for helping with the code release; NVIDIA Isaac Sim and Omniverse team for the support on synthetic data generation; Tianshi Cao for the valuable discussions.

License

The code and data are released under the NVIDIA Source Code License. Copyright ยฉ 2024, NVIDIA Corporation. All rights reserved.

Contact

For questions, please contact Bowen Wen.

foundationpose's People

Contributors

wenbowen123 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.