Coder Social home page Coder Social logo

sayands / sgaligner Goto Github PK

View Code? Open in Web Editor NEW
73.0 7.0 4.0 365.87 MB

[ICCV 2023] SGAligner : 3D Scene Alignment with Scene Graphs

Home Page: https://sayands.github.io/sgaligner/

License: MIT License

Python 99.23% Shell 0.77%
3d-computer-vision 3d-scene-graph

sgaligner's Introduction

SGAligner : 3D Scene Alignment with Scene Graphs

ICCV 2023

Sayan Deb Sarkar1, Ondrej Miksik2, Marc Pollefeys1,2, Daniel Barath1, Iro Armeni1

1ETH Zurich 2Microsoft Mixed Reality & AI Labs

SGAligner aligns 3D scene graphs of environments using multi-modal learning and leverage the output for the downstream task of 3D point cloud registration.

PWC PWC

PyTorch

teaser

[Project Webpage] [Paper]

News ๐Ÿ“ฐ

  • 14. July 2023 : SGAligner accepted to ICCV 2023. ๐Ÿ”ฅ
  • 1. May 2023: SGAligner preprint released on arXiv.
  • 10. April 2023: Code released.

Code Structure ๐ŸŽฌ

โ”œโ”€โ”€ sgaligner
โ”‚   โ”œโ”€โ”€ data-preprocessing            <- subscan generation + preprocessing
โ”‚   โ”œโ”€โ”€ configs                       <- configuration files
โ”‚   โ”œโ”€โ”€ src
โ”‚   โ”‚   โ”‚โ”€โ”€ aligner                   <- SGAligner modules
โ”‚   โ”‚   โ”‚โ”€โ”€ datasets                  <- dataloader for 3RScan subscans
โ”‚   โ”‚   โ”‚โ”€โ”€ engine                    <- trainer classes
โ”‚   โ”‚   โ”‚โ”€โ”€ GeoTransformer            <- geotransformer submodule for registration
โ”‚   โ”‚   โ”‚โ”€โ”€ inference                 <- inference files for alignment + downstream applications
โ”‚   โ”‚   โ”‚โ”€โ”€ trainers                  <- train + validation loop (EVA + SGAligner)
โ”‚   โ”‚โ”€โ”€ utils                         <- util functions
โ”‚   โ”‚โ”€โ”€ README.md                    
โ”‚   โ”‚โ”€โ”€ scripts                       <- bash scripts for data generation + preprocesing + training
โ”‚   โ””โ”€โ”€ output                        <- folder that stores models and logs
โ”‚

Dependencies ๐Ÿ“

The main dependencies of the project are the following:

python: 3.8.15
cuda: 11.6

You can set up a conda environment as follows :

git clone --recurse-submodules -j8 [email protected]:sayands/sgaligner.git
cd sgaligner
conda env create -f req.yml

Please follow the submodule for additional installation requirements and setup of GeoTransformer.

Downloads ๐Ÿ’ง

The pre-trained model and other meta files are available here.

Dataset Generation ๐Ÿ”จ

After installing the dependencies, we preprocess the datasets and provide the benchmarks.

Subscan Pair Generation - 3RScan + 3DSSG

Download 3RScan and 3DSSG. Move all files of 3DSSG to a new files/ directory within Scan3R. The structure should be:

โ”œโ”€โ”€ 3RScan
โ”‚   โ”œโ”€โ”€ files       <- all 3RScan and 3DSSG meta files (NOT the scan data)  
โ”‚   โ”œโ”€โ”€ scenes      <- scans
โ”‚   โ””โ”€โ”€ out         <- Default output directory for generated subscans (created when running pre-processing)

To generate labels.instances.align.annotated.v2.ply for each 3RScan scan, please refer to the repo from here.

Change the absolute paths in utils/define.py.

First, we create sub-scans from each 3RScan scan using the ground truth scene Graphs from the 3DSSG dataset and then calculate the pairwise overlap ratio for the subscans in a scan. Finally, we preprocess the data for our framework. The relevant code can be found in the data-preprocessing/ directory. You can use the following command to generate the subscans.

bash scripts/generate_data_scan3r_gt.sh

Note To adhere to our evaluation procedure, please do not change the seed value in the files in configs/ directory.

Generating Overlapping and Non-Overlapping Subscan Pairs

To generate overlapping and non-overlapping pairs, use :

python preprocessing/gen_all_pairs_fileset.py

This will create a fileset with the same number of randomly chosen non-overlapping pairs from the generated subscans as overlapping pairs generated before during subscan generation.

Usage on Predicted Scene Graphs : Coming Soon!

Training ๐Ÿš„

To train SGAligner on 3RScan subscans generated from here, you can use :

cd src
python trainers/trainval_sgaligner.py --config ../configs/scan3r/scan3r_ground_truth.yaml

EVA Training

We also provide training scripts for EVA, used as a baseline after being adapted for scene graph alignment. To train EVA similar to SGAligner on the same data, you can use :

cd src
python trainers/trainval_eva.py --config ../configs/scan3r/scan3r_eva.yaml

We provide config files for the corresponding data in config/ directory. Please change the parameters in the configuration files, if you want to tune the hyper-parameters.

Evaluation ๐Ÿšฆ

Graph Alignment + Point Cloud Registration

cd src
python inference/sgaligner/inference_align_reg.py --config ../configs/scan3r/scan3r_ground_truth.yaml --snapshot <path to SGAligner trained model> --reg_snapshot <path to GeoTransformer model trained on 3DMatch>

Finding Overlapping vs Non-Overlapping Pairs

โ— Run Generating Overlapping and Non-Overlapping Subscan Pairs before.

To run the inference, you need to:

cd src
python inference/sgaligner/inference_find_overlapper.py --config ../configs/scan3r/scan3r_gt_w_wo_overlap.yaml --snapshot <path to SGAligner trained model> --reg_snapshot <path to GeoTransformer model trained on 3DMatch>

3D Point Cloud Mosaicking

First, we generate the subscans per 3RScan scan using :

python data-preprocessing/gen_scan_subscan_mapping.py --split <the split you want to generate the mapping for>

And then, to run the inference, you need to:

cd src
python inference/sgaligner/inference_mosaicking.py --config ../configs/scan3r/scan3r_gt_mosaicking.yaml --snapshot <path to SGAligner trained model> --reg_snapshot <path to GeoTransformer model trained on 3DMatch>

Benchmark ๐Ÿ“ˆ

We provide detailed results and comparisons here.

3D Scene Graph Alignment (Node Matching)

Method Mean Reciprocal Rank Hits@1 Hits@2 Hits@3 Hits@4 Hits@5
EVA 0.867 0.790 0.884 0.938 0.963 0.977
$\mathcal{P}$ 0.884 0.835 0.886 0.921 0.938 0.951
$\mathcal{P}$ + $\mathcal{S}$ 0.897 0.852 0.899 0.931 0.945 0.955
$\mathcal{P}$ + $\mathcal{S}$ + $\mathcal{R}$ 0.911 0.861 0.916 0.947 0.961 0.970
SGAligner 0.950 0.923 0.957 0.974 0.9823 0.987

3D Point Cloud Registration

Method CD RRE RTE FMR RR
GeoTr 0.02247 1.813 2.79 98.94 98.49
Ours, K=1 0.01677 1.425 2.88 99.85 98.79
Ours, K=2 0.01111 1.012 1.67 99.85 99.40
Ours, K=3 0.01525 1.736 2.55 99.85 98.81

TODO ๐Ÿ”œ

  • Add 3D Point Cloud Mosaicking
  • Add Support For EVA
  • Add usage on Predicted Scene Graphs
  • Add scene graph alignment of local 3D scenes to prior 3D maps
  • Add overlapping scene finder with a traditional retrieval method (FPFH + VLAD + KNN)

BibTeX ๐Ÿ™

@article{sarkar2023sgaligner,
      title={SGAligner : 3D Scene Alignment with Scene Graphs}, 
      author={Sayan Deb Sarkar and Ondrej Miksik and Marc Pollefeys and Daniel Barath and Iro Armeni},
      journal={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
      year={2023}
}

Acknowledgments โ™ป๏ธ

In this project we use (parts of) the official implementations of the following works and thank the respective authors for open sourcing their methods:

sgaligner's People

Contributors

sayands avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sgaligner's Issues

How is the graph structure constructed?

Hi @sayands , thanks for your interesting work.
I have question regrading to the structure embedding. In the paper, it says

We represent this information in the form of a structure graph: node features are the relative translation between object instances, and edges are the aforementioned relationships. We calculate relative translation by taking the distance between the object instance consisting of the highest number of relationships and that of any other object instance in the scene.

And I find the related parts of the code in dataset/scan3r.py. The node is relative translation here, while the edges are here.

I'm just confused how the graph structure is built:

  1. For the nodes, since each node can have multiple edges, how to choose one relative translation as a node feature? In Fig2 (b), each node in the visualization is marked with $O_{ij}$. The visualization makes me think each node only contains self-attributes. How do you construct the graph node embeddings? Have you consider using the self-attributes such as semantic info to represent the node?
  2. For the edges, you concatenate src_edges and tar_edges. Does that mean the edges contain both intra-graph and inter-graph relationships? Then how do you construct the edges? For example, if consider inter-graph edges between $G_1$ and $G_2$, do you consider all the possible edges between the two graph? (then you have $|G_1| \cdot |G_2|$ edges).

Thanks for your time.
Glen

Why is the transformation matrix between source and target point cloud a identity(eye) matrix?

Hello, I am a little bit confused about the ground truth transformation between the source and reference point clouds, by running the inference code with the default eye matrix as gt_transform, I can't reproduce the result given in the paper. Will there be an update to add the code for the exact generation method and margin thresholds of the gt_transform in the future?
Thank you very much.

https://github.com/sayands/sgaligner/blob/49ae3e1398e369557878af7c45252817a7abe72f/src/inference/sgaligner/inference_align_reg.py#L153C21-L153C62

Readme steps are not elaborated

Hi, could you please elaborate more in detail reproducing steps. Such as for the 'Data Generation' step its not clear currently -

  1. from where 'train_scans.txt' 'validation_scans.txt' files coming from they are not present in 3dssg.zip file
  2. how are 'labels.instances.align.annotated.v2.pkl' generated
  3. where is this 'obj_attr.pkl' file coming from ?
======== Scan3R Subscan preprocessing with Scene Graphs using config file : configs/scan3r/scan3r_ground_truth.yaml ========
[INFO] Processing subscans from val split
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 843/843 [02:24<00:00,  5.82it/s]
[INFO] Updating Overlap Data..
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1911/1911 [00:00<00:00, 168106.44it/s]
[INFO] Saving val scan ids...
843
[INFO] Starting BOW Feature Calculation For Node Attribute Features...
Traceback (most recent call last):
  File "/home/guttikonda/Documents/SceneGraph/OriginalWorks/sgaligner/preprocessing/scan3r/preprocess.py", line 355, in <module>
    calculate_bow_node_attr_feats(data_write_dir)
  File "/home/guttikonda/Documents/SceneGraph/OriginalWorks/sgaligner/preprocessing/scan3r/preprocess.py", line 312, in calculate_bow_node_attr_feats
    word_2_ix = common.load_pkl_data(define.OBJ_ATTR_FILENAME)
  File "/home/guttikonda/Documents/SceneGraph/OriginalWorks/sgaligner/./utils/common.py", line 14, in load_pkl_data
    with open(filename, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/home/guttikonda/Documents/SceneGraph/datasets/3RScan/out/files/obj_attr.pkl'

Missing parameter cfg.loss.zoom in trainval_sgaligner.py

Hello,

I encountered an issue while running the trainval_sgaligner.py script with the following command:
python trainers/trainval_sgaligner.py --config ../configs/scan3r/scan3r_ground_truth.yaml
At line 28 of the trainval_sgaligner.py file, the code raises an error: AttributeError: loss. It seems that the self.zoom attribute is not properly set due to the absence of the cfg.loss.zoom parameter.

Missing "raw points"

Hi, I am trying to run SGAligner for registration, there is this line of code makes me confused, after generating the data samples as described in the README, I can not find the location of this "raw point" file. I know it should be the sampled point cloud of each scan, but I am not sure if I should change this line to read the raw *.ply file from the original 3RScan or if it should be the downsampled point cloud of the raw *.ply file.
https://github.com/sayands/sgaligner/blob/49ae3e1398e369557878af7c45252817a7abe72f/src/inference/sgaligner/inference_align_reg.py#L144C21-L144C124
Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.