cvignac / digress Goto Github PK
View Code? Open in Web Editor NEWcode for the paper "DiGress: Discrete Denoising diffusion for graph generation"
License: MIT License
code for the paper "DiGress: Discrete Denoising diffusion for graph generation"
License: MIT License
Thanks for sharing your great work!
I have a problem. I want to run experiments using the guacamol dataset, but it's not possible to train on a single GPU(3090).
Do you have a multi-GPU version of the code available, even if it's not well-organized?
I would appreciate it if you could provide some guidance or resources on how to set up multi-GPU training in PyTorch.
Thank you:)
In equation 5, you assume that
It appears that the regressor is trained on the real graphs, where as in the original guided diffusion paper, the classifier is trained on the noisy images. Is this intentional? Many thanks.
Hi Clement,
Nice work here. I am wondering how to load the generated samples which you give (eg. generated_samples_sbm.txt) to a list of networkx graphs. Is there code for this/a package you use for the loading
Why is QM9RegressorDiscrete
designed to not use extra_features
? Couldn't the spectral features of the noisy graph possibly help with the regressor on predicting target properties of a clean graph?
We try to run the code as command in title, but we find that target is [] in qm9_regressor_discrete.py, line 170. Can you provide any suggestions ? Thanks.
In equation (1), you give the true posterior conditioned on
I am not sure how to get this expression from the given definitions. Here is what I have so far:
I am not sure how to get the
Thanks in advance.
Hi,
For MOSES dataset, I used the smiles file you put in the dir '/generated_samples/generated_smiles_moses.txt', and the metrics evaluation code provided by MOSES repo. For DiGress, Scaf metric on MOSES is around 0.9.
For GuacaMol dataset, I used the smiles file you put in the dir '/generated_samples/digress_guacamol_smiles.txt', and the code imported from fcd_torch. For DiGress, FCD metric on GuacaMol is around 1.78.
May I know why there is such a big gap with the results reported in your paper? Can you provide these metric codes in the repo?
I would also like to know if the results of other baseline methods in your paper are tested using the same code?
BTW, why the FCD metric on GuacaMol is indicated as the bigger is better in the Table 4 in your paper?
Thank you very much.
As it's mentioned in the 3.2 DENOISING NETWORK PARAMETRIZATION and Appendix B, a FiLM layer is used to incorporate edge feature and global feature. Why do you use FiLM layer and What other layer did you try for incorporation?
I encounter a strange result during validating. the result is
Starting train epoch...
Epoch X: Val NLL nan -- Val Atom type KL nan -- Val Edge type KL: nan
Val loss: nan Best val loss: 100000000.0000
the NLL is always nan, why?
Hi,
Since your email address is unreachable, I apologise for sending the message here.
To better follow your work, could you please provide the checkpoint of Guacamol dataset?
Thankk you very much!
Dear authors,
I am sorry to bother you. I ran the code by ddp, however, it threw out errors like this "RuntimeError: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1". I also noticed that you said that this branch is not implemented for multi-gpu. Could you please tell me which part should be modified to adapt it for ddp mode?
Best regards,
Lei
orca.cpp: In function ‘int writeResults(int, const char*)’:
orca.cpp:1341:13: warning: control reaches end of non-void function [-Wreturn-type]
1341 | fstream fout;
| ^~~~
orca.cpp: In function ‘int writeEdgeResults(int, const char*)’:
orca.cpp:1374:13: warning: control reaches end of non-void function [-Wreturn-type]
1374 | fstream fout;
| ^~~~
In the KL prior and L_{t-1} terms, you use the log of the probabilities to compute the kl divergence. See for example here. Why do you use logits instead of the probabilities directly? And if you're taking the log of the model's probability, shouldn't you do the same for whatever you're comparing too (e.g. limit_dist in kl_prior or true posterior for L_{t-1} terms)?
Hi,
Great paper, and really appreciate how well-documented the code is! I'm trying to run the discrete model on QM9 (just python main.py
), but my validation NLL seems to get stuck around 68-69 during training, and my test NLL is similarly high. Is this something you observe? Is there a different config / parameter values I should run in order to obtain ~23?
Thank you!
As I was executing the command "python3 guidance/train_qm9_regressor.py +experiment=regressor_model.yaml", an error message "pytorch_lightning.utilities.exceptions.MisconfigurationException: The provided lr scheduler ExponentialLR
doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step
hook with your own logic if you are using a custom LR scheduler." occurred.
I'm wondering if I installed the wrong version of Pytorch (2.0 with CUDA 11.8) and Pytorch-geometric.
Is it possible to solve this by downgrading the version of Pytorch or some other module? Thanks.
[Update 1]
As I tried to train a regressor after installing the guidance version according to the instruction (with Pytorch 1.11+CUDA11.3) of readme, a runtime error occurred: RuntimeError: object has no attribute sparse_csc_tensor
[Update 2]
With Pytorch 1.10+CUDA 11.1, another error occurred as I tried to train a regressor after installing the guidance version according to the instruction: AttributeError: module 'distutils' has no attribute 'version'
I found the common solution is downgrading the version of setuptools, but if I try to downgrade the version of setuptools to <=59.5.0, running the code "train_qm9_regressor.py" will lead to "Segmentation fault (core dumped)"......
Is there anything else I can try? Thanks.
From what I understood, you use this code (until line 111) to align the molecules you plot in a chain. For certain molecules, the alignment causes a segmentation fault. Here is a minimal example to reproduce:
conda create -c conda-forge -n test-digress rdkit python=3.9
conda activate test-digress
conda install pip
pip install imageio==2.26.1
import rdkit
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from rdkit.Geometry import Point3D
import numpy as np
import os
import imageio
def mol_from_graphs(node_list, adjacency_matrix):
"""
Convert graphs to rdkit molecules
node_list: the nodes of a batch of nodes (bs x n)
adjacency_matrix: the adjacency_matrix of the molecule (bs x n x n)
"""
# dictionary to map integer value to the char of atom
atom_decoder = ['C', 'N', 'O', 'F']
atom_decoder = atom_decoder
# create empty editable mol object
mol = Chem.RWMol()
# add atoms to mol and keep track of index
node_to_idx = {}
for i in range(len(node_list)):
if node_list[i] == -1:
continue
a = Chem.Atom(atom_decoder[int(node_list[i])])
molIdx = mol.AddAtom(a)
node_to_idx[i] = molIdx
for ix, row in enumerate(adjacency_matrix):
for iy, bond in enumerate(row):
# only traverse half the symmetric matrix
if iy <= ix:
continue
if bond == 1:
bond_type = Chem.rdchem.BondType.SINGLE
elif bond == 2:
bond_type = Chem.rdchem.BondType.DOUBLE
elif bond == 3:
bond_type = Chem.rdchem.BondType.TRIPLE
elif bond == 4:
bond_type = Chem.rdchem.BondType.AROMATIC
else:
continue
mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)
try:
mol = mol.GetMol()
except rdkit.Chem.KekulizeException:
print("Can't kekulize molecule")
mol = None
return mol
def plot_chain_molecules(mols, name='mols'):
'''
mols: list of RDKit molecule objects.
'''
# # code to align all molecules
final_molecule = mols[-1]
AllChem.Compute2DCoords(final_molecule)
coords = []
for i, atom in enumerate(final_molecule.GetAtoms()):
positions = final_molecule.GetConformer().GetAtomPosition(i)
coords.append((positions.x, positions.y, positions.z))
for i, mol in enumerate(mols):
AllChem.Compute2DCoords(mol)
conf = mol.GetConformer()
for j, atom in enumerate(mol.GetAtoms()):
x, y, z = coords[j]
conf.SetAtomPosition(j, Point3D(x, y, z))
# generate frames for gif
frame_paths = []
for frame in range(len(mols)):
# if frame==0:
print(f'\nframe {frame}\n')
file_name = os.path.join(f'{name}_fram_{frame}.png')
Draw.MolToFile(mols[frame], file_name, size=(300, 300), legend=f"Frame {frame}")
frame_paths.append(file_name)
# save gif
imgs = [imageio.imread(fn) for fn in frame_paths]
gif_path = os.path.join(f'{name}.gif')
print(f'gif_path {gif_path}\n')
imgs.extend([imgs[-1]] * 10)
imageio.mimsave(gif_path, imgs, subrectangles=True, fps=5)
# save chains as grid
img = Draw.MolsToGridImage(mols, molsPerRow=10, subImgSize=(200, 200))
path_img = os.path.join(f'{name}_grid_image.png')
print(f'path_img {path_img}\n')
img.save(path_img)
if __name__ == '__main__':
node_l_ref = np.array([0., 2., 2., 0., 0., 0., 1., 0., 2.], dtype=np.float32)
adj_m_ref = np.array([[0., 0., 0., 0., 1., 0., 1., 0., 2.],
[0., 0., 0., 0., 0., 0., 0., 1., 3.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 3., 0.],
[1., 0., 0., 0., 0., 0., 4., 0., 2.],
[0., 0., 0., 0., 0., 0., 1., 1., 0.],
[1., 0., 0., 0., 4., 1., 0., 0., 1.],
[0., 1., 0., 3., 0., 1., 0., 0., 0.],
[2., 3., 0., 0., 2., 0., 1., 0., 0.]], dtype=np.float32)
mol_ref = mol_from_graphs(node_l_ref, adj_m_ref)
node_l_error = np.array([0., 2., 0., 0., 1., 0., 0., 0., 1.], dtype=np.float32)
adj_m_error = np.array([[0., 4., 4., 0., 0., 3., 0., 4., 4.],
[4., 0., 3., 4., 4., 4., 1., 2., 4.],
[4., 3., 0., 4., 4., 1., 4., 2., 4.],
[0., 4., 4., 0., 4., 4., 4., 3., 4.],
[0., 4., 4., 4., 0., 1., 4., 4., 0.],
[3., 4., 1., 4., 1., 0., 4., 4., 4.],
[0., 1., 4., 4., 4., 4., 0., 4., 4.],
[4., 2., 2., 3., 4., 4., 4., 0., 4.],
[4., 4., 4., 4., 0., 4., 4., 4., 0.]], dtype=np.float32)
mol_error = mol_from_graphs(node_l_error, adj_m_error)
node_l_correct = np.array([0., 0., 0., 1., 0., 0., 0., 0., 0.], dtype=np.float32)
adj_m_correct = np.array([[0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0.],
[1., 0., 0., 0., 1., 1., 1., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=np.float32)
mol_correct = mol_from_graphs(node_l_correct, adj_m_correct)
plot_chain_molecules(mols=[mol_ref, mol_correct], name='correct_aligned')
# plot_chain_molecules(mols=[mol_ref, mol_error], name='seg_fault_aligned')
Note that all molecules I am using here were generated using a partially trained digress model on qm9 with a marginal transition and no extra features. Because the code throws a segmentation fault in these cases, the whole script fails (usually at the testing phase).
So my questions are:
1- Is this alignment necessary at all? I tried plotting without alignment and the molecules look reasonable, but perhaps harder to compare.
2- Did you try other ways of aligning the molecules? Using Rdkit's AlignMol for example.
3- Any tips on avoiding this segmentation error?
Thanks in advance!
It looks like the gradients of the y_mlp_out and all components involving y in the last transformer neural network layer are None. Therefore, this part of the model is not training. The components of other inputs (X and E) seems to be working normally.
To reproduce the behavior, replace the 'trainer' line with this code:
model = model.train()
#print(f'model {count_parameters(model)}')
print('==== Done loading the model...')
optimizer = torch.optim.AdamW(model.parameters(), lr=cfg.train.lr, amsgrad=True,
weight_decay=cfg.train.weight_decay)
train_loader = datamodule.train_dataloader()
losses = []
for epoch in range(cfg.train.n_epochs):
total_loss = 0
for i, train_samples in enumerate(train_loader):
# train_samples = train_samples.to(device)
loss = model.training_step(train_samples, i) # loss for one batch
loss = loss['loss']
loss.backward()
for name, p in model.named_parameters():
if '_y' in name:
print(f'name: {name}, requires {p.requires_grad}, p {p}, grad, {p.grad}\n')
optimizer.step()
optimizer.zero_grad()
total_loss += loss.cpu().detach().numpy()
exit() # just to show None in one iteration
The problem appears to be that the 'y' output is not used when computing the loss. I am not sure how to use a cross-entropy loss on X and E alone but still back-propagating to the layers of 'y'.
It's also not clear why the None gradients only appear in the last layers.
Do you by any chance have a reference for the 6-cycle count formula in Appendix B2? I am having a hard time matching the formula (or the implementation in KNodeCycles.k6_cycle) to that of Chang and Fu 2003. Thank you!
Hi, thanks for the great work!
Looking at NodeEdgeBlock layer it seems that the attention is applied assuming full connectivity in the graph (except for invalid nodes masked out by node_mask). Is this the case? If so, is there a reason why you didnt go with the graph attention layer implementation here that takes into account the connectivity?
Thanks!
Yawar
Hello!
I run the discrete code with the default sbm config (batch_size=512)
python3 main.py dataset=sbm
but end with torch.cuda.OutOfMemoryError: CUDA out of memory.
Server environment:
Then I try to reduce batch_size, and find that batch_size=8 works, and when batch_size >=16, it will have OOM.
I notice that you use batch_size=512 for sbm, so I am wondering is it normal for my run? It will take 6 days to finish sbm experiment if batch_size=8. Is it possible to set a larger batch_size with my current GPU server?
Thanks a lot!
The loss function used in the training is CrossEntropy(model(G_t), G)
, whereas traditionally in diffusion models it is CrossEntropy(model(G_t), G_{t-1})
. Does using the latter make the training worse in your experience? Thank you!
May I know how to change batch size?
I have a question about your use of cross-entropy over nodes/edges when mini-batching graphs. If I understood your implementation correctly, to compute the loss for one minibatch, you compute the cross-entropy of a single node and a single edge in your minibatch of graphs. These cross-entropies are averaged over the entire minibatch, then combined via the following formula:
To me
Thank you for sharing the code. When I set dataset
to comm20
or sbm
in configs/config.yaml
and invoke python main.py
, I encountered the issue described in the title. Is this expected? Thanks.
Hi, I have read your code and tried to use your model on our own dataset. I noticed that you defined some values on nodes and edges. For example,
self.n_nodes = torch.tensor([0, 2.2930e-05, 3.8217e-05, 6.8791e-05, 2.3695e-04, 9.7072e-04, 0.0046472, 0.023985, 0.13666, 0.83337]) self.node_types = torch.tensor([0.7230, 0.1151, 0.1593, 0.0026]) self.edge_types = torch.tensor([0.7261, 0.2384, 0.0274, 0.0081, 0.0])
in the qm9_dataset, QM9infos. I'm wondering how to get these values if I want to apply DiGress to my own dataset. And is there any code to generate these values?
Thanks a lot!
Hello!
I implemented my own dataset class, and it worked quite well.
However, when I try to add graph labels, i.e., populate 'y' in the data loader with one-hot encoded vectors, I get a mismatch in size in the forward function of the transformer model, when computing self.mlp_out_y
.
The problem is not in the data loader itself: the batches are generated correctly. But when I print the tensor shapes in the forward method, 'X' and 'E' have the correct shape but not 'y'. From the second batch, 'y' is just one dimension and is filled with 0s. The first batch seems to be correct.
I thought it was a problem with my implementation, but I tried to change the size of 'y' in the spectre dataset ( y = torch.zeros([1, 2]).float()
) in __getitem__(self, idx)
, for example, and the same problem occured. The first batch is ok, and for the second batch, the shape of 'y' is incorrect in the forward method.
So my two questions are:
Thank you! :)
Is DiscreteDenoisingDiffusion.compute_extra_data()
intended to be differentiable, or does its differentiability not matter?
Hi, thanks for your code so much!
I'm wondering whether I can reproduce the results in the paper by simply running main.py
without changing any hyperparameters? If I have to change the hyperparameters, then which hyperparamerters should I choose?
Thanks so much for your kindest reply!
May I know what orca.cpp is used for?
beta_bar_t = 1 - alpha_bar_t as in code,
instead of beta_bar_t = \prod (1 - alpha_t) in paper(https://openreview.net/pdf?id=UaAD-Nu86WX)
might be an error;
Hi Clement, In sec 5 of the paper, you mentioned "p_η cannot be evaluated for all possible values of
Are there any other method (alternative to your proposed 1st-order approximation) to do classifier-guided discrete diffusion that you are aware of? Is straight-through estimation a viable option? Thank you!
Hi, do you have a pre-built docker image that can be used to run DiGress by any chance? I would highly appreciate that.
Hi Clement, in file src/models/transformer_model.py
line 159, you intend to compute the unnormalized attention scores, i.e. the dot product of the query and key vectors. However, in the code, just the query and key vectors are multiplied, without summing over the feature dimension. This effectively computes a separate attention score for each feature dimension.
On line 184 you comment that the shape of attn
is 'bs, n, n, n_head', although it actually is 'bs, n, n, n_head, df', which can be seen on line 191, where attn
is multiplied with a vector of shape '(bs, 1, n, n_head, df)'.
I couldn't find any comments on this in the paper, so I'm wondering if is on purpose or a bug.
There is a comment on this line pointing out that you're using dx instead of de when computing FiLM from y to E. Shouldn't it be de actually?
I am talking about this line. I noticed Appendix E.2 of (Kingma, 2021) mentions a multiplication by T/2 as an unbiased estimator of the sum. D3PM computes the sum explicitly (if I understood their code correctly).
Is your multiplication by T also a form of unbiased estimator? Can you point me to any literature to understand how to derive this estimator?
Hi Clement, thank you for the constant updates these days in the repository. I wanted to ask you about one strange error I get while running the script python3 main.py +experiment=debug.yaml
I have installed all the necessary packages, so far following your instructions on environment installation. However, before launching the full experiments, I wanted to run the debugging code and see how it performs.
Unfortunately, having outputted the ret = run_job() and 3 successful 100 % processing stages, it starts to give "Invalid molecule obtained" while Converting the QM9 dataset to SMILES for remove_h=True...
Do you know the reasons why this may occur? Thanks for the help!
Hi,
Is there any reason you only handle O+, N+, S+ in molecule_with_partial_charge? I am wondering if it's possible to apply the same logic to all atom types and positive and negative charges?
Best,
Hi Clement. I have some questions about the Guacamol data set.
python3 main.py dataset=guacamol
, I get the error urllib.error.HTTPError: HTTP Error 403: Forbidden
. It looks like it is limited by some anti-crawler tactics. So I downloaded these files directly via the download link provided in class GuacamolDataset
, i.e., 'guacamol_v1_train.smiles', 'guacamol_v1_valid.smiles', 'guacamol_v1_test.smiles'. Then I get the processed data by annotating the code with download_url.Thank you!
Hello,
I work in drug discovery and am very interested in the application of this model in the generation of drug-molecules which contain a predefined motif, as you demonstrate in Appendix E. Would you be able to share a code example of the node and edge feature masking for motif preserving generation?
Wouldn't this require retraining the model on molecules with the given motif such that the noise model preserves the motif during diffusion? From this I could see how by masking/disallowing transitions for edges and nodes in the motif during denoising and letting everything else denoise regularly would result in structures which extend the motif.... or maybe I'm thinking about this wrong and no retraining is needed?
Best,
Talal
Hi Clement,
Thank you for the regular updates on your paper. I actually wanted to ask you about the other possible applications of DiGress instead of Moses, QM9, and so forth mentioned in the experiments. Basically, what I have is the train/valid/test datasets of SMILES strings along with ten reaction types from the US patent literature. I tried to use the abstract_dataset.py to convert strings to graph representations but it caused a lot of bugs.
Do you recommend preparing a MOSES-style .csv file and using moses_dataset.py in a similar fashion (SMILES, Split) for my dataset? FYI, it is USPTO-50k: https://github.com/vsomnath/graphretro/tree/main/datasets/uspto-50k
hi,
you mentioned "Navigate to the ./util/orca directory and compile orca.cpp: g++ -O2 -std=c++11 -o orca orca.cpp", but i did't find ./util/orca.
thanks a lot
Hello, would it be possible to obtain the code implementation for the 'E. Substructure conditioned generation' method discussed in the paper? I'm particularly interested in this section and would greatly appreciate your help. Thank you in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.