Coder Social home page Coder Social logo

0iui0 / omniseg3d-gs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from oceanying/omniseg3d-gs

0.0 0.0 0.0 12.31 MB

3D Gaussian Splatting adapted version of OmniSeg3D (CVPR2024)

License: MIT License

Shell 0.82% C++ 5.74% Python 69.21% C 0.33% Cuda 23.52% CMake 0.37%

omniseg3d-gs's Introduction

OmniSeg3D-GS: Gaussian-Splatting based OmniSeg3D (CVPR2024)

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying1, Yixuan Yin1, Jinzhi Zhang1, Fan Wang2, Tao Yu1, Ruqi Huang1, Lu Fang1
1Tsinghua Univeristy โ€ƒ 2Alibaba Group.

OmniSeg3D is a framework for multi-object, category-agnostic, and hierarchical segmentation in 3D, the original implementation is based on InstantNGP.

However, OmniSeg3D is not restricted by specific 3D representation. In this repo, we present a guassian-splatting based OmniSeg3D, which enjoys interactive 3D segmentation in real-time. The segmented objects can be saved as .ply format for further visualization and manipulation.

image

Installation

We follow the original environment setting of 3D Guassian-Splatting (SIGGRAPH 2023).

conda create -n gaussian_grouping python=3.8 -y
conda activate gaussian_grouping 

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install plyfile==0.8.1
pip install tqdm scipy wandb opencv-python scikit-learn lpips

pip install submodules/diff-gaussian-rasterization
pip install submodules/simple-knn

Install SAM for 2D segmentation:

git clone https://github.com/facebookresearch/segment-anything.git
cd segment-anything
pip install -e .
mkdir sam_ckpt; cd sam_ckpt
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Data Preparation:

We typically support data prepared as COLMAP format. For more details, please refer to the guidance in our NeRF-based implementation of OmniSeg3D.

Hierarchical Representation Generation

Run the sam model to get the hierarchical representation files.

python run_sam.py --ckpt_path {SAM_CKPT_PATH} --file_path {IMAGE_FOLDER} --gpu_id {GPU_ID}

After running, you will get three folders sam, masks, patches:

  • sam: stores the hierarchical representation as ".npz" files
  • masks and patches: used for visualization or masks quaility evaluation, not needed during training.

Ideal masks should include object-level masks and patches should contain part-level masks. We basically use the default parameter setting for SAM, but you can tune the parameters for customized datasets.

Training:

We train our models on a sinle NVIDIA RTX 3090 Ti GPU (24GB). Smaller scenes may require less memory. Typically, inference requires less than 8GB memory. We utilize a two-stage training strategy. See script/train_omni_360.sh as an example.

dataname=counter
gpu=1
data_path=root_path/to/the/data/folder/of/counter.

# --- Training Gaussian (Color and Density) --- #
CUDA_VISIBLE_DEVICES=${gpu} python train.py \
     -s ${data_path} \
     --images images_4 \
     -r 1 -m output/360_${dataname}_omni_1/rgb \
     --config_file config/gaussian_dataset/train_rgb.json \
     --object_path sam \
     --ip 127.0.0.2

# --- Training Semantic Feature Field --- #
CUDA_VISIBLE_DEVICES=${gpu} python train.py \
     -s ${data_path} \
     --images images_4 \
     -r 1 \
     -m output/360_${dataname}_omni_1/sem_hi \
     --config_file config/gaussian_dataset/train_sem.json \
     --object_path sam \
     --start_checkpoint output/360_${dataname}_omni_1/rgb/chkpnt10000.pth \
     --ip 127.0.0.2

# --- Render Views for Visualization --- #
CUDA_VISIBLE_DEVICES=${gpu} python render_omni.py \
    -m output/360_${dataname}_omni_1/sem_hi \
    --num_classes 256 \
    --images images_4

After specifying the custom information, you can run the file by execute at the root folder:

bash script/train_omni_360.sh

GUI Visualization and Segmentation

Modify the path of the trained point cloud. Then run render_omni_gui.py.

Screenshot 2024-03-25 21:20:50 - omniseg3dgs Screenshot 2024-03-25 21:21:08 - omniseg3dgs Screenshot 2024-03-25 21:21:54 - omniseg3dgs

GUI options:

  • mode option: RGB, score map, and semantic map (you can visualize the consistent global semantic feature).
  • click mode: select object of interest
  • multi-click mode: select multiple points or objects
  • binary threshold: show binarized 2D images with the threshold
  • segment3d: segment the scene with the current threshold (saved .ply file can be found at the root dir)
  • reload: reload the whole scene
  • file selector: load another scene (point cloud)

Operations:

  • left drag: rotate
  • mid drag: pan
  • right click: choose point/objects

Acknowledgements

Thanks for the following project for their valuable contributions:

Citation

If you find this project helpful for your research, please consider citing the report and giving a โญ.

@article{ying2023omniseg3d,
  title={OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning},
  author={Ying, Haiyang and Yin, Yixuan and Zhang, Jinzhi and Wang, Fan and Yu, Tao and Huang, Ruqi and Fang, Lu},
  journal={arXiv preprint arXiv:2311.11666},
  year={2023}
}

omniseg3d-gs's People

Contributors

oceanying avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.