princeton-computational-imaging / diffusion-sdf Goto Github PK

View Code? Open in Web Editor NEW

219.0 219.0 17.0 16.13 MB

Official code repository for the paper: “Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions”

License: MIT License

Python 89.91% Makefile 1.33% C++ 2.90% Cuda 5.86%

diffusion-sdf's People

Contributors

Stargazers

Watchers

Forkers

jaedukseo kristofe yupengchengg147 zubair-irshad linjing7 mahmudnahid 63days hirotong jkd2021 tianhaofu dawnborn humayun syeonb hjwdzh sakshikakde hex41434

diffusion-sdf's Issues

pretrained weights release

hi,

Congrats on having this paper accepted to ICCV, do you have an updated plan regarding pretrained weights release?

Best,
Shengyu

About evaluation during training

Hi, thank you for your great work! I am wondering why only a train loader is used during the training. Without evaluation for each epoch, it is hard to decide which epoch will be the best. Will the evaluation be added in the future?

Data preprocessing for grid data

Could you please explain what data/grid_data/acronym/Couch/37cfcafe606611d81246538126da07a8/grid_gt.csv denotes? For example, "0.195652,-0.413043,-0.00724638,0.255061" in the first line, (0.195652,-0.413043,-0.00724638) is the coordinate of a sample point, and 0.255061 is its SDF value? Am I right?

The effect of VAE

Thank you for your work. I am confused about the effect of VAE here. Is VAE used as a speeder for the diffusion model? With the VAE, the diffusion model can learn the latent space z instead of the raw point clouds. is that correct? So ignoring the great computation cost, the VAE can be abandoned.

How long & Single Object

I am trying to use diffusion SDF for modeling a single object (where each instance is relatively similar to the other instances). I think the model is experiencing some sort of mode collapse and mostly producing an average shape (with some small variations of very easy-to-decipher features). Have you had any experience with this issue? Do you have any suggestions?

Related, what is the timeline for training? The readme says a few days to get to a certain loss. Is a few days usually required? Im currently doing tests with ~1 day (up to 10k epochs - only currently using around 2-300 meshes, which has worked successfully for DeepSDF before), am I maybe not waiting long enough?

I've also tried playing with:

LR
batch size
KLD weight
briefly/currently trying higher:
- plane resolution
- point cloud size

Upto 1 day of training these all make small differences, but they mostly are converging on the same/similar solution.

Question about the number of classes

Hi,
I notice that Diffusion-SDF trains a VAE on multiple classes in the paper, but I only find the 'couch' class in this repo. Could I simply train a multi-class VAE using your code? I am wondering if you could provide more details about multi-class training.
Thanks

gensdf loss not falling

It seems that the gensdf loss does not decrease, can you give me some advice on the training of the conditional generation of thrid stage?

Question about data preprocess

May I kindly ask if it's possible for your team to consider open-sourcing the code for your dataset preprocessing as an example for beginners?

Diffusion fails when kld_weight is raised

I built a dataset and trained all 3 stages on it. It gives clean reconstructions but no new items or interpolations: generation after stage 3 is always either a data item, or hash. I want to improve interpolation, but when I raise kld_weight , stage 1 and stage 2 train well and produce accurate reconstruction, but stage 3 fails.

When β is raised, the encoder produces more widely varying latents for the same data item. (For β 1e-5 L1 between latents is ~.05, for β 1e-4 it's ~.25, the trend continues for higher β.) When diffusion is trained on 1 set of extracted latents, it reconstructs them well and it can generate data items from noise.

But in end-to-end training, when the diffusion stage sees newly generated latents it cannot reconstruct them at all. It performs worse than when generating from noise, producing only hash. It never learns, so gensdf loss never stabilize or drops.

Any idea what this problem is?
Thanks for your consideration.

About "grid_gt.csv"&"sdf_data.csv"

Thans for your excellent work!
But there are two question about data:
1、"grid_gt.csv"and"sdf_data.csv" are all from mesh to SDF , what the difference in these two ".csv" files?
2、Which kind of point cloud could be completation,such as ".ply",".pcd" or ".txt"？What can I load them?
Looking forward your reply!

Technical question about point cloud condition

I have a technical question: When using PointNet for feature extraction on partial point clouds to use them as conditions, is your approach to directly extract features using a pre-trained model, or to train PointNet simultaneously with the main diffusion network?

Data Preprocessing

hi,thank you to release this nice work! I want follow this do some further experiments, could you release data preprocessing code?

About grid_gt.csv

How did you extract the grid_gt.csv file (e.g., grid_data/acronym/Couch/37cfcafe606611d81246538126da07a8/grid_gt.csv)? If the file is SDF values and coordinates of uniformly distributed grid, shouldn't the length of the data be a cube number of some integer (e.g., 128^3). The length of the sample file is 468000, which is not a cube number of an integer. Could you clarify how you extracted the grid_gt.csv file?

No reconstruction loss for VAE

Hello,

It seems that you resigned from using reconstruction loss of planar features (it's commented out in BetaVAE object). Could you please elaborate, why we can expect the model to learn sensible latent space without minimizing reconstruction objective? Does SDF objective somehow compensate the reconstruction? Thanks for the answer

num_modulations referenced but not used

Hi,

It seems like there could be something that was omitted or skipped - the num_modulations is defined in an if/else statement for the sdf_decoder but it's never used.

This is presumably to do with it being the same as the hidden_dim in the latent_in == True case - or the mod_net that is commented out. Not sure of the effects (yet) but raising the issue here in case it is useful/helpful.

see here:

Diffusion-SDF/train_sdf/models/archs/sdf_decoder.py

Lines 74 to 84 in 64dc177

    
           if latent_in: 
        
               num_modulations = hidden_dim 
        
               #num_modulations = hidden_dim * (num_layers - 1) 
        
               first_dim_in += latent_size # num_modulations 
        
               mod_act = nn.ReLU() 
        
           else: # use shifting instead of concatenation 
        
               # We modulate features at every *hidden* layer of the base network and 
        
               # therefore have dim_hidden * (num_layers - 1) modulations, since the last layer is not modulated 
        
               num_modulations = hidden_dim * (num_layers - 1) 
        
               mod_act = nn.Identity()

pertained models release

any plan to release the pretrained models? thanks

About the dataset

so I should process the datasets by the other two paper ?

Please add a license file to this repository <eom>

A question about conditional generation

Hello! Thanks for your great work!
I wonder besides the partial point clouds, if other conditions like images are showed in the code?

One Quesiton about Data Preprocessing

Hi, thank you for your great work!
I have a question about the acronym data:
In paper, you keep the categories with at least 20 objects, and obtain 106 classes in the end.
However, when I process the data, I finally get 119 classes. Do you have other filter strategies？
The following is the classes I obtained and the number of objects:
Horse 22
SideChair 76
Loveseat 24
Airplane 33
Vase 172
Gun 55
Cabinet 154
PottedPlant 101
Sideboard 77
Cup 62
Spoon 22
Armoire 98
SingleBed 32
FloorLamp 135
WallLamp 65
OfficeChair 119
Chair 310
Sink 41
Monitor 133
ChestOfDrawers 145
Mug 101
Picture 73
Laptop 183
Hammer 26
PersonStanding 57
Guitar 66
Chaise 39
TV 238
AccentTable 75
Mirror 46
AccentChair 54
Table 90
Bottle 44
MediaPlayer 63
Piano 39
DeskLamp 99
GameTable 25
Book 90
Telephone 35
Barstool 71
Dresser 218
ToyFigure 81
WallArt 58
Fireplace 35
Knife 58
Desktop 75
Speaker 71
Couch 400
Desk 196
StandingClock 28
QueenBed 102
LampPost 25
CellPhone 138
Keyboard 30
RoundTable 32
TableLamp 57
Calculator 37
NintendoDS 21
OfficeSideChair 44
EndTable 92
CoffeeTable 80
Bench 89
Painting 53
Pillow 28
CeilingLamp 60
WineBottle 28
Refrigerator 65
Stapler 38
DrinkingUtensil 30
Bowl 83
FileCabinet 20
LoftBed 25
FoodItem 66
Pencil 82
Plant 76
Recliner 29
Toilet 42
4Shelves 28
TrashBin 120
TissueBox 24
Rug 54
DiningTable 91
Camera 65
Basket 21
WallClock 46
PaperBox 33
PianoKeyboard 33
Curtain 57
Pizza 24
Candle 51
Poster 39
Headphones 24
Printer 44
USBStick 31
Books 24
Pan 36
WallUnit 48
Pen 27
DresserWithMirror 22
WineGlass 20
CeilingFan 20
ComputerMouse 20
Plate 36
LightSwitch 40
ToiletPaper 34
SodaCan 45
Oven 36
Sword 44
Microwave 26
TableClock 38
Stool 27
CerealBox 27
Mattress 21
PictureFrame 29
Nightstand 57
Lamp 30
TvStand 32
Fork 22
Room 20

After stage1 training, can't result meaningful mesh based on random sampled vae latent. is this reasonable

Question about inference time

Hi! really nice and elegant work!

I would like to ask what's the inference time of the reported approach as range/order of magnitude

Thanks in advance

	if latent_in:
	num_modulations = hidden_dim
	#num_modulations = hidden_dim * (num_layers - 1)
	first_dim_in += latent_size # num_modulations
	mod_act = nn.ReLU()

	else: # use shifting instead of concatenation
	# We modulate features at every hidden layer of the base network and
	# therefore have dim_hidden * (num_layers - 1) modulations, since the last layer is not modulated
	num_modulations = hidden_dim * (num_layers - 1)
	mod_act = nn.Identity()