Coder Social home page Coder Social logo

yashbhalgat / contrastive-lift Goto Github PK

View Code? Open in Web Editor NEW
54.0 5.0 1.0 991 KB

[NeurIPS 2023 Spotlight] Code for "Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion"

Home Page: https://www.robots.ox.ac.uk/~vgg/research/contrastive-lift/

Python 100.00%
3d-reconstruction computer-vision instance-segmentation multi-view-learning multi-view-stereo nerf pytorch rendering neurips neurips-2023

contrastive-lift's Introduction

Contrastive-Lift (NeurIPS 2023 Spotlight)

Yash Bhalgat, Iro Laina, João F. Henriques, Andrew Zisserman, Andrea Vedaldi

TL;DR: Our paper presents a novel "slow-fast" contrastive fusion method to lift 2D predictions to 3D for scalable instance segmentation, achieving significant improvements without requiring an upper bound on the number of objects in the scene.

teaser_2


image

Data and Pretrained checkpoints

You can download the Messy Rooms dataset from here. For all other datasets, refer to the instructions provided in Panoptic-Lifting

NOTE: In this codebase, the term "MOS" stands for "Many Object Scenes", which was the original name of the "Messy Rooms" dataset as referenced in the paper.

You can download the pretrained checkpoints from here.

Inference and Evaluation

Download the pretrained checkpoints and place them in the pretrained_checkpoints folder. Then, run the following commands to evaluate the pretrained models:

python3 inference/render_panopli.py --ckpt_path pretrained_checkpoints/<SCENE NAME>/checkpoints/<CKPT NAME>.ckpt --cached_centroids_path pretrained_checkpoints/<SCENE NAME>/checkpoints/all_centroids.pkl

This will render the outputs to runs/<experiment> folder. To calculate the metrics, run the following command:

python inference/evaluate.py --root_path ./data/<SCENE DATA PATH> --exp_path runs/<experiment>

Citation

If you find this work useful in your research, please cite our paper:

@inproceedings{
  bhalgat2023contrastive,
  title={Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion},
  author={Bhalgat, Yash and Laina, Iro and Henriques, Jo{\~a}o F and Zisserman, Andrew and Vedaldi, Andrea},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023},
  url={https://openreview.net/forum?id=bbbbbov4Xu}
}

Thanks

This code is based on Panoptic-Lifting and TensoRF codebases. We thank the authors for releasing their code.

contrastive-lift's People

Contributors

yashbhalgat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

miaowu99

contrastive-lift's Issues

Pre-trained models on other datasets

Dear author,

Thanks for your previous help and providing the pretrained models on ScanNet dataset. I would really appreciate it if you could further provide the pretrained model (Constrastive-Lifting and Panoptic-Lifting) on the other datasets (especially the MOS dataset) if possible since I only have limited computation.
Many thanks in advance!!

Bests,
Runsong

Poor performance on duplicate objects

Hello! Really good work.

I was benchmarking your method wrt to Replica (i set the batch size to 4096 and segment loss to 0.75 as they did in Panopli; i also increased max instances to 25 which is also used in Panopli).

It seems the PQ is similar (perhaps due to semantics performing well?), but CL often is able to delineate duplicate adjacent objects. For example, here is result from training on replica-vmap office3 split (slow-fast, above) compared to Panopli (linear assignment; below). Do you have any intuition why this is the case? Thank you!

CL:
image
image

PL:
image
image

When will the code be available?

Dear author,

Thanks for your great work.
I want to test your methods in my experiments and I want to know when the code will be available? Besides, Does the proposed method take the similar time as panoptic-lifting for optimization time on ScanNet. BTW, I found the panoptic-lifting takes 36 hours on my single 3090 RTX for one ScanNet scene (042302).
Looking forward to your reply. Many thanks in advance.

Bests,
Runsong

Issue with training

Hello, thank you for this great work!

for this line running on an itw dataset:

loss_semantics = (self.loss_semantics(output_semantics, probs) * confs).mean()

gives the error:

RuntimeError: 0D or 1D target tensor expected, multi-target not supported

Also requirements.txt has duplicate packages and is not complete i believe

missing mesh.ply while running preprocess_replica.py

I downloaded replica dataset provided by Semantic_Nerf. When I ran preprocess_replica.py, I got a missing mesh.ply file, that is files not provided in the downloaded dataset.

Can you please share where i can download the complete processed dataset or provide the step to create them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.