Coder Social home page Coder Social logo

colin97 / openshape_code Goto Github PK

View Code? Open in Web Editor NEW
223.0 8.0 15.0 3.51 MB

official code of “OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding”

Home Page: https://colin97.github.io/OpenShape/

License: Apache License 2.0

Python 100.00%
3d 3d-classification 3d-understanding image-generation point-cloud zero-shot-classification zero-shot-retrieval 3d-shape-retrieval point-cloud-caption

openshape_code's Introduction

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

[project] [paper] [Live Demo]

[News] OpenShape is accepted by NeurIPS 2023. See you in New Orleans!

[News] We have released our checkpoints, training code, and training data!

[News] Live demo released! Thanks HuggingFace🤗 for sponsoring this demo!!

Official code of "OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding".

avatar Left: Zero-shot 3D shape classification on the Objaverse-LVIS (1,156 categories) and ModelNet40 datasets (40 common categories). Right: Our shape representations encode a broad range of semantic and visual concepts. We input two 3D shapes and use their shape embeddings to retrieve the top three shapes whose embeddings are simultaneously closest to both inputs.

Online Demo

Explore the online demo, which currently supports: (a) 3D shape classification (LVIS categories and user-uploaded texts), (b) 3D shape retrieval (from text, image, and 3D point cloud), (c) point cloud captioning, and (d) point cloud-based image generation.

The demo is built with streamlit. If you encounter "connection error", please try to clear your browser cache or use the incognito model.

The code for the demo can be found here and here. The support library (README) also serves as an inference library for models with PointBERT backbone.

Checkpoints

Model Training Data CLIP version Backbone Objaverse-LVIS Zero-Shot Top1 (Top5) ModelNet40 Zero-Shot Top1 (Top5) gravity-axis Notes
pointbert-vitg14-rgb Four datasets OpenCLIP ViT-bigG-14 PointBERT 46.8 (77.0) 84.4 (98.0) z-axis
pointbert-no-lvis Four datasets (no LVIS) OpenCLIP ViT-bigG-14 PointBERT 39.1 (68.9) 85.3 (97.4) z-axis
pointbert-shapenet-only ShapeNet only OpenCLIP ViT-bigG-14 PointBERT 10.8 (25.0) 70.3 (91.3) z-axis
spconv-all Four datasets OpenCLIP ViT-bigG-14 SparseConv 42.7 (72.8) 83.7 (98.4) z-axis
spconv-all-no-lvis Four datasets (no LVIS) OpenCLIP ViT-bigG-14 SparseConv 38.1 (68.2) 84.0 (97.3) z-axis
spconv-shapenet-only ShapeNet only OpenCLIP ViT-bigG-14 SparseConv 12.1 (27.1) 74.1 (89.5) z-axis
pointbert-vitl14-rgb Objaverse (No LVIS) CLIP ViT-L/14 PointBERT N/A N/A y-axis used for image generation demo
pointbert-vitb32-rgb Objaverse CLIP ViT-B/32 PointBERT N/A N/A y-axis used for pc captioning demo

Installation

If you would to run the inference or (and) training locally, you may need to install the dependendices.

  1. Create a conda environment and install pytorch, MinkowskiEngine, and DGL by the following commands or their official guides:
conda create -n OpenShape python=3.9
conda activate OpenShape
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine
conda install -c dglteam/label/cu113 dgl
  1. Install the following packages:
pip install huggingface_hub wandb omegaconf torch_redstone einops tqdm open3d 

Inference

Try the following example code for computing OpenShape embeddings of 3D point clouds and computing 3D-text and 3D-image similarities.

python3 src/example.py

Please normalize the input point cloud and ensure the gravity axis of the point cloud is aligned with the pre-trained models.

Training

  1. The processed training and evaluation data can be found here. Download and uncompress the data by the following command:
python3 download_data.py

The total data size is ~205G and files will be downloaded and uncompressed in parallel. If you don't need training and evaluation on the Objaverse dataset, you can skip that part (~185G).

  1. Run the training by the following command:
wandb login {YOUR_WANDB_ID}
python3 src/main.py dataset.train_batch_size=20 --trial_name bs_20

The default config can be found in src/configs/train.yml, which is trained on a single A100 GPU. You can also change the setting by passing the arguments. Here are some examples for main experiments used in the paper:

python3 src/main.py --trial_name spconv_all
python3 src/main.py --trial_name spconv_no_lvis dataset.train_split=meta_data/split/train_no_lvis.json 
python3 src/main.py --trial_name spconv_shapenet_only dataset.train_split=meta_data/split/ablation/train_shapenet_only.json 
python3 src/main.py --trial_name pointbert_all model.name=PointBERT model.scaling=4 model.use_dense=True training.lr=0.0005 training.lr_decay_rate=0.967 
python3 src/main.py --trial_name pointbert_no_lvis model.name=PointBERT model.scaling=4 model.use_dense=True training.lr=0.0005 training.lr_decay_rate=0.967 dataset.train_split=meta_data/split/train_no_lvis.json 
python3 src/main.py --trial_name pointbert_shapenet_only model.name=PointBERT model.scaling=4 model.use_dense=True training.lr=0.0005 training.lr_decay_rate=0.967 dataset.train_split=meta_data/split/ablation/train_shapenet_only.json 

You can track the training and evaluation (Objaverse-LVIS and ModelNet40) curves on your wandb page.

Data

All data can be found here. Use python3 download_data.py for downloading them.

Training Data

Training data consists of Objaverse/000-xxx.tar.gz, ShapeNet.tar.gz, 3D-FUTURE.tar.gz, and ABO.tar.gz. After uncompression, you will get a numpy file for each shape, which includes:

  • dataset: str, dataset of the shape.
  • group: str, group of the shape.
  • id: str, id of the shape.
  • xyz: numpy array (10000 x 3, [-1,1]), point cloud of the shape.
  • rgb: numpy array (10000 x 3, [0, 1]), color of the point cloud.
  • image_feat: numpy array, image features of 12 rendered images.
  • thumbnail_feat: numpy array, image feature for the thumbnail image.
  • text: list of string, original texts of the shape, constructed using the metadata of the dataset.
  • text_feat: list of dict, text features of the text. "original" indicates the text features without the prompt engineering. "prompt_avg" indicates the averaged text features with the template-based prompt enegineering.
  • blip_caption: str, BLIP caption generated for the thumbnail or rendered images.
  • blip_caption_feat: dict, text feature of the blip_caption.
  • msft_caption: str, Microsoft Azure caption generated for the thumbnail or rendered images.
  • msft_caption_feat: dict, text feature of the msft_caption.
  • retrieval_text: list of str, retrieved texts for the thumbnail or rendered images.
  • retrieval_text_feat: list of dict, text features of the retrieval_text.

All image and text features are extracted using OpenCLIP (ViT-bigG-14, laion2b_s39b_b160k).

Meta Data

meta_data.zip includes the meta data used for training and evaluation (on Objaverse-LVIS, ModelNet40, and ScanObjectNN):

  • split/: list of training shapes. train_all.json indicates training with four datasets (Objaverse, ShapeNet, ABO, and 3D-FUTURE). train_no_lvis.json indicates training with four datasets but Objaverse-LVIS shapes excluded. ablation/train_shapenet_only.json indeicates training with ShapeNet shapes only.
  • gpt4_filtering.json: filtering results of Objaverse raw texts, generated with GPT4.
  • point_feat_knn.npy: KNN indices calculated using shape features, used for hard mining during training.
  • modelnet40/test_split.json: list of ModelNet40 test shapes.
  • modelnet40/test_pc.npy: point clouds of ModelNet40 test shapes, 10000 x 3.
  • modelnet40/cat_name_pt_feat.npy: text features of ModelNet40 category names, prompt engineering used.
  • lvis_cat_name_pt_feat.npy: text features of Objeverse-LVIS category names, prompt engineering used.
  • scanobjectnn/xyz_label.npy: point clouds and labels of ScanObjectNN test shapes.
  • scanobjectnn/cat_name_pt_feat.npy:text features of ScanObjectNN category names, prompt engineering used. All text features are extracted using OpenCLIP (ViT-bigG-14, laion2b_s39b_b160k).

Citation

If you find our code helpful, please cite our paper:

@misc{liu2023openshape,
      title={OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding}, 
      author={Minghua Liu and Ruoxi Shi and Kaiming Kuang and Yinhao Zhu and Xuanlin Li and Shizhong Han and Hong Cai and Fatih Porikli and Hao Su},
      year={2023},
      eprint={2305.10764},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

openshape_code's People

Contributors

colin97 avatar eliphatfs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openshape_code's Issues

Issues about channel of xyz in example.py

Hi, thanks for your codes. I have tried to run python src/examples. However, it returns 'RuntimeError: Given groups=1, weight of size [64, 9, 1, 1], expected input[1, 10, 64, 384] to have 9 channels, but got 10 channels instead', deriving from self.mlp of function 'PointNetSetAbstraction' in modes/pointnet_util.py. The source maybe lie in the channel of xyz (4 in example.py). Is there any solutions?

Thank you very much.

Parameter setting of the zero-shot inference

Hi,

I'm curious about the parameter settings used in the inference period, and I suppose that you use the same setting across the downstream classification datasets as the provided demo. Could you please share more details?

Camera poses for rendered images

Hi authors,

Thanks for this amazing works. May you provide the exactly camera poses for 12 rendered images or the rendering code?

Thank you so much!

About inputs to ModelNet

Hi,
Thanks for your great job! I have a question on the inputs to ModelNet. In the paper, you mentioned that for ModelNet, the input is 10k points without color. But in the code, the self.use_color = config.dataset.use_color in ModelNetTest is set to True, according to the config file. I wonder if it is a mistake or if I miss something. Thanks for your time!

Best,

Why use the relative position encoding

Hi,

I noticed that you have used the relative position of centroids as the input of the position encoding:
entroid_delta = centroids.unsqueeze(-1) - centroids.unsqueeze(-2)
may I know the reason to use relative encoding rather the commonly used absolution position encoding

ClipCaptionModel errors

Hi, @Colin97 when running the following code of caption.py in PATH (openshape-demo-support/openshape/demo)

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
prefix_length = 10
model = ClipCaptionModel(prefix_length)
# print(model.gpt_embedding_size)
model.load_state_dict(torch.load(hf_hub_download('OpenShape/clipcap-cc', 'conceptual_weights.pt', token=True), map_location='cpu'))

report error

RuntimeError: Error(s) in loading state_dict for ClipCaptionModel:
        Unexpected key(s) in state_dict: "gpt.transformer.h.0.attn.bias", "gpt.transformer.h.0.attn.masked_bias", "gpt.transformer.h.1.attn.bias", 
"gpt.transformer.h.1.attn.masked_bias", "gpt.transformer.h.2.attn.bias", "gpt.transformer.h.2.attn.masked_bias", "gpt.transformer.h.3.attn.bias", 
"gpt.transformer.h.3.attn.masked_bias", "gpt.transformer.h.4.attn.bias", "gpt.transformer.h.4.attn.masked_bias", "gpt.transformer.h.5.attn.bias", 
"gpt.transformer.h.5.attn.masked_bias", "gpt.transformer.h.6.attn.bias", "gpt.transformer.h.6.attn.masked_bias", "gpt.transformer.h.7.attn.bias", 
"gpt.transformer.h.7.attn.masked_bias", "gpt.transformer.h.8.attn.bias", "gpt.transformer.h.8.attn.masked_bias", "gpt.transformer.h.9.attn.bias", 
"gpt.transformer.h.9.attn.masked_bias", "gpt.transformer.h.10.attn.bias", "gpt.transformer.h.10.attn.masked_bias", "gpt.transformer.h.11.attn.bias", 
"gpt.transformer.h.11.attn.masked_bias".

what is wrong?

I try to modified src/example.py with PointBert model. But got an error.

Traceback (most recent call last):
File "src/example.py", line 106, in
shape_feat = model(xyz, feat, device='cuda', quantization_size=config.model.voxel_size)
File "/home/zhouziyang/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhouziyang/pyproject/UMVI/openshape/src/models/ppat.py", line 118, in forward
xyz.transpose(-1, -2).contiguous(), features.transpose(-1, -2).contiguous()
File "/home/zhouziyang/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhouziyang/pyproject/UMVI/openshape/src/models/ppat.py", line 97, in forward
centroids, feature = self.sa(xyz, features)
File "/home/zhouziyang/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhouziyang/pyproject/UMVI/openshape/src/models/pointnet_util.py", line 205, in forward
new_points = F.relu(bn(conv(new_points)))
File "/home/zhouziyang/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhouziyang/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/zhouziyang/anaconda3/envs/python37/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 9, 1, 1], expected input[1, 10, 64, 384] to have 9 channels, but got 10 channels instead

render imgs file is incomplete

Hello, I downloaded the render images file,but most compressed files such as 3D-FUTURE.targz can not be successfully uncompress, because it can not find EOF.So, i think the compressed files are not complete ..Can you check the compressed files of render images or provide the site where we can download the mesh(glb files) of the four 3D dataset such that we can use your render script?

Merge Objaverse XL

Do you plan to use Objaverse XL in your framework? When might this become available ?

how to easily retrieve the desired Shape files (Mesh) of objects using your code?

Hello author, thank you for your excellent work. I would like to know how to easily retrieve the desired Mesh files of objects using your code. For example, I only want to retrieve the top N Shape files that match the description of 'a vintage American sports car' based on text. Could you please guide me on how to obtain the corresponding Shape files? Thank you

Inquiry about the Objaverse dataset

Thanks for your nice work.

I downloaded Objaverse dataset using your download script. The final files are 185 GB as you mentioned but I can't extract them (attached image).

  • Can you help me with that?
  • If I extract them how much space on the disk do I need? I tried downloading the original Objaverse files it is about 1.2 TB.
  • In the Readme you mentioned the npy files contain xyz (point clouds) of the shape. Did you sample these points on the original glb files or these are mesh vertices?
  • Could you please add a License, which pre-trained models can we use commercially?
image

Demo application broken

Hello I would like to try your demo application but it seems to be broken:
https://huggingface.co/spaces/OpenShape/openshape-demo

When I open the demo there is an error:
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/home/user/app/app.py", line 20, in
import openshape
File "/home/user/.cache/huggingface/hub/models--OpenShape--openshape-demo-support/snapshots/70dbc29fa30520cb78b4982de671f90600c08685/openshape/init.py", line 4, in
from .ppat_rgb import Projected, PointPatchTransformer
File "/home/user/.cache/huggingface/hub/models--OpenShape--openshape-demo-support/snapshots/70dbc29fa30520cb78b4982de671f90600c08685/openshape/ppat_rgb.py", line 5, in
from .pointnet_util import PointNetSetAbstraction
File "/home/user/.cache/huggingface/hub/models--OpenShape--openshape-demo-support/snapshots/70dbc29fa30520cb78b4982de671f90600c08685/openshape/pointnet_util.py", line 6, in
import dgl.geometry
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/dgl/init.py", line 16, in
from .backend import load_backend, backend_name
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/dgl/backend/init.py", line 109, in
load_backend(get_preferred_backend())
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/dgl/backend/init.py", line 43, in load_backend
from .._ffi.base import load_tensor_adapter # imports DGL C library
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/dgl/_ffi/base.py", line 45, in
_LIB, _LIB_NAME, _DIR_NAME = _load_lib()
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/dgl/_ffi/base.py", line 35, in _load_lib
lib = ctypes.CDLL(lib_path[0])
File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)

scaling up strategy

Thank you so much for such a great job! I have noticed from your paper that scaling up has played a very important role.

May I ask what are the model parameters of SparseConv & PointBERT after scaling up and what is the scaling strategy in detail ?

Bug in Huggingface Demo

When uploading the demo/owl.ply file into the online demo, an error occurred: support/snapshots/70dbc29fa30520cb78b4982de671f90600c08685/openshape/demo/misc_utils.py", line 97, in trimesh_to_pc
assert isinstance(scene_or_mesh, trimesh.Trimesh)
AssertionError

I found out that the data pipeline in the demo only supports non-colored point cloud in the trimesh_to_pc function. Maybe you can modify this function in the demo to support a wider range of input file?

Questions on performance

Hi author,

Thank you for working on the great OpenShape. I have some questions on the performance of the models.

  1. I trained the PointBert on this codebase by running python3 src/main.py --trial_name pointbert_all model.name=PointBERT model.scaling=4 model.use_dense=True training.lr=0.0005 training.lr_decay_rate=0.967. I assumed this is the case without hard negative mining for PointBERT and found that the performance is less promising than using hard negative mining, e.g., -4% on Objaverse LVIS. May I know the performance of PointBERT_all w/o hard negative mining so that I can get a sense if I reproduced the correct results?
  2. For Table 1 and Table 2 in the paper, how do you report the results? I noticed that there is std.dev. in Tab 1 but no such stat for Tab 2.
  3. Regarding to Q1, I'd like to know the effect of hard negative mining for PointBERT under the three benchmarks since Tab 3 in the paper is for SparseConv.

Thanks!

About Custom Mesh Data Preprocessing

Great work! Thanks for your kindly sharing this project with us!

Based on your work, I want to run some shape retrieval demo on my custom mesh data (just assume they are some .obj files). I have noticed that the whole project is based on the point cloud representation. As the paper and repo mentioned, I think I should sample the mesh file into a point clouds file containing 10000 points and align the gravity axis of the mesh.

However, I am not very sure about the preprocessing details. Could you please give me some hints or provide some example codes (maybe some codes when you process the .glb file in the objaverse)?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.