Coder Social home page Coder Social logo

diffpmae's Introduction

DiffPMAE

In this work, we propose an effective point cloud reconstruction architecture, DiffPMAE. Inspired by self-supervised learning concepts, we combine Masked Auto-Encoding and Diffusion Model mechanism to remotely reconstruct point cloud data. DiffPMAE can be extended to many related downstream tasks including point cloud compression, upsampling and completion with minimal modifications.

GitHub repo: https://github.com/DiffPMAE/DiffPMAE

Datasets

We use ShapeNet-55 and ModelNet40 for train and validation of the models and PU1K for upsampling validation. All dataset should be placed in the folder below and that will be read by scripts automatically.
The overall directory structure should be:

│DiffPMAE/
├──dataset/
│   ├──ModelNet/
│   ├──PU1K/
│   └──ShapeNet55/
├──.......

ShapeNet-55:

│ShapeNet55/
├──ShapeNet-55/
│  ├── train.txt
│  └── test.txt
├──shapenet_pc/
│  ├── 02691156-1a04e3eab45ca15dd86060f189eb133.npy
│  ├── 02691156-1a6ad7a24bb89733f412783097373bdc.npy
│  ├── .......

Download: You can download the processed ShapeNet55 dataset from Point-BERT

ModelNet40:

│ModelNet40/
├──modelnet40_shape_names.txt
├──modelnet40_test.txt
├──modelnet40_test_8192pts_fps.dat
├──modelnet40_train.txt
└──modelnet40_train_8192pts_fps.dat

Download: You can download the processed ModelNet40 dataset from Point-BERT

PU1K:

│PU1K/
├──test/
│  ├── input_256/
│  ├── input_512/
│  ├── input_1024/
│  ├── input_2048/
│  │   ├── gt_8192/
│  │   │   ├── 11509_Panda_v4.xyz
│  │   │   ├── .......
│  │   ├── input_2048/  
│  │   │   ├── 11509_Panda_v4.xyz
│  │   │   ├── .......  
│  └── original_meshes/
│  │   ├── 11509_Panda_v4.off
│  │   ├── ....... 
├──train/
│  └── pu1k_poisson_256_poisson_1024_pc_2500_patch50_addpugan.h5

Download: You can download the processed PU1K dataset from PU-GCN

Requirements

python >= 3.7
pytorch >= 1.13.1
CUDA >= 11.6

pip install -r requirements.txt
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Pre-trained Models

Pre-trained models can be downloaded from Google Drive

The overall directory structure should be:

│DiffPMAE/
├──dataset/
├──pretrain_model/
│   ├──completion/
│   ├──compress/
│   ├──pretrain/
│   └──sr/
├──.......

Training

For training, you should train the Encoder first by using the command below. Then use pre-trained Encoder to train a decoder.

For encoder:

CUDA_VISIBLE_DEVICES=<GPU> python train_encoder.py

Hyperparameter setting can be adjusted in train_encoder.py:

# Experiment setting
parser.add_argument('--batch_size', type=int, default=4)
parser.add_argument('--val_batch_size', type=int, default=1)
parser.add_argument('--device', type=str, default='cuda')  # mps for mac
parser.add_argument('--save_dir', type=str, default='./results')
parser.add_argument('--log', type=bool, default=False)

# Grouping setting
parser.add_argument('--mask_type', type=str, default='rand')
parser.add_argument('--mask_ratio', type=float, default=0.75)
parser.add_argument('--group_size', type=int, default=32)
parser.add_argument('--num_group', type=int, default=64)
parser.add_argument('--num_points', type=int, default=2048)
parser.add_argument('--num_output', type=int, default=8192)

# Transformer setting
parser.add_argument('--trans_dim', type=int, default=384)
parser.add_argument('--depth', type=int, default=12)
parser.add_argument('--drop_path_rate', type=float, default=0.1)
parser.add_argument('--num_heads', type=int, default=6)

# Encoder setting
parser.add_argument('--encoder_dims', type=int, default=384)
parser.add_argument('--loss', type=str, default='cdl2')

# sche / optim
parser.add_argument('--learning_rate', type=float, default=0.001)
parser.add_argument('--weight_decay', type=float, default=0.05)
parser.add_argument('--eta_min', type=float, default=0.000001)
parser.add_argument('--t_max', type=float, default=200)

For decoder:

CUDA_VISIBLE_DEVICES=<GPU> python train_decoder.py

To load the pre-trained Encoder, you can change the following in train_decoder.py:

check_point_dir = os.path.join('./pretrain_model/pretrain/encoder.pt')

check_point = torch.load(check_point_dir)['model']
encoder = Encoder_Module(args).to(args.device)
encoder.load_state_dict(check_point)

Hyperparameter setting for Decoder can be adjusted in train_decoder.py:

# Experiment setting
parser.add_argument('--batch_size', type=int, default=32)
parser.add_argument('--val_batch_size', type=int, default=1)
parser.add_argument('--device', type=str, default='cuda')  # mps for mac
parser.add_argument('--log', type=bool, default=True)
parser.add_argument('--save_dir', type=str, default='./results')

# Grouping setting
parser.add_argument('--mask_type', type=str, default='rand')
parser.add_argument('--mask_ratio', type=float, default=0.75)
parser.add_argument('--group_size', type=int, default=32) # points in each group
parser.add_argument('--num_group', type=int, default=64) # number of group
parser.add_argument('--num_points', type=int, default=2048)
parser.add_argument('--num_output', type=int, default=8192)
parser.add_argument('--diffusion_output_size', default=2048)

# Transformer setting
parser.add_argument('--trans_dim', type=int, default=384)
parser.add_argument('--drop_path_rate', type=float, default=0.1)

# Encoder setting
parser.add_argument('--encoder_depth', type=int, default=12)
parser.add_argument('--encoder_num_heads', type=int, default=6)
parser.add_argument('--loss', type=str, default='cdl2')

# Decoder setting
parser.add_argument('--decoder_depth', type=int, default=4)
parser.add_argument('--decoder_num_heads', type=int, default=4)

# diffusion
parser.add_argument('--num_steps', type=int, default=200)
parser.add_argument('--beta_1', type=float, default=1e-4)
parser.add_argument('--beta_T', type=float, default=0.05)
parser.add_argument('--sched_mode', type=str, default='linear')

# sche / optim
parser.add_argument('--learning_rate', type=float, default=0.001)
parser.add_argument('--weight_decay', type=float, default=0.05)
parser.add_argument('--eta_min', type=float, default=0.000001)
parser.add_argument('--t_max', type=float, default=200)

Evaluation

For pre-train model:

python eval_diffpmae.py

For upsampling:

python eval_upsampling.py

For compression:

python eval_compression.py

The configuration for each task can be adjusted in corresponding python file.
For example, the model configuration for pre-train evaluation can be adjusted in eval_diffpmae.py file at L14~45.

For experiment setup:

parser.add_argument('--batch_size', type=int, default=32)
# Batch size
parser.add_argument('--val_batch_size', type=int, default=1)
# Validation size
parser.add_argument('--device', type=str, default='cuda')
parser.add_argument('--log', type=bool, default=True)
# Both trained model and log will not saved when set False.
parser.add_argument('--save_dir', type=str, default='./results')
# The root directory of saved file.

For Grouping setting

parser.add_argument('--mask_type', type=str, default='rand')
# Could be either rand or block
parser.add_argument('--mask_ratio', type=float, default=0.75)
parser.add_argument('--group_size', type=int, default=32) 
# Points in each group
parser.add_argument('--num_group', type=int, default=64) 
# Number of group
parser.add_argument('--num_points', type=int, default=2048)
# Input size of point cloud
parser.add_argument('--num_output', type=int, default=8192)
# Output size of Encoder module
parser.add_argument('--diffusion_output_size', default=2048)
#Output size of Decoder module

For Transformer setting

parser.add_argument('--trans_dim', type=int, default=384)
# Latent size
parser.add_argument('--drop_path_rate', type=float, default=0.1)

For Encoder setting

parser.add_argument('--encoder_depth', type=int, default=12)
# Number of blocks in Encoder Transformer
parser.add_argument('--encoder_num_heads', type=int, default=6)
# Number of heads in each Transformer block
parser.add_argument('--loss', type=str, default='cdl2')

For Decoder setting

parser.add_argument('--decoder_depth', type=int, default=4)
# Number of blocks in Decoder Transformer
parser.add_argument('--decoder_num_heads', type=int, default=4)
# Number of heads in each Transformer block

For diffusion process

parser.add_argument('--num_steps', type=int, default=200)
parser.add_argument('--beta_1', type=float, default=1e-4)
parser.add_argument('--beta_T', type=float, default=0.05)
parser.add_argument('--sched_mode', type=str, default='linear')

For optimizer and scheduler

parser.add_argument('--learning_rate', type=float, default=0.001)
parser.add_argument('--weight_decay', type=float, default=0.05)
parser.add_argument('--eta_min', type=float, default=0.000001)
parser.add_argument('--t_max', type=float, default=200)

Acknowledgements

Our code build based on PointMAE

diffpmae's People

Contributors

diffpmae avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.