cvignac / digress Goto Github PK

View Code? Open in Web Editor NEW

308.0 5.0 66.0 4.54 MB

code for the paper "DiGress: Discrete Denoising diffusion for graph generation"

License: MIT License

Python 69.01% C++ 30.99%

digress's People

Contributors

Stargazers

Watchers

Forkers

ciaran-coleman dongcf danieltlevy stjordanis jesson-wei andrinr pinglmlcv sriramdvt cheraissi inouye-lab changzhijiang jianingf seongjinahn xunein10 mufeili baixiangen filipekstrm jianglin954 hongxinxiang donci31 sajeedmehrab xiaohui9607 najwalb yinqiaozhang paul910 christofer-f hannah-chou jacknugent1529 ailabteam ruio248 mikhailmir steveazzolin andreasbergmeister haoming-codes mattsonthieme hodorxxx lingxiaoshawn kluicer lvchenyangai erenpolat zhangzhishuo angaglia hyunseung-kim flaat xwang38438 ybppp ninicoder sidm1811 raziloo zhaoyl18 brandeispatrick seojisu0305 kangroo12345678 shvetankprakash lspongebobjh codjjj yrymax joonsungkwon olgatticus nepfaff myeongkong jason-young-ai yeeeeeeii darkdragon84

digress's Issues

A Little Question about you paper

Hello, I'm reading about your paper and I really love your work! But I still have a little question about the quation(1) in your paper. Is there any detailed proof of it? Thank you very much!

problem in multi-GPU setting

Thanks for sharing your great work!

I have a problem. I want to run experiments using the guacamol dataset, but it's not possible to train on a single GPU(3090).
Do you have a multi-GPU version of the code available, even if it's not well-organized?
I would appreciate it if you could provide some guidance or resources on how to set up multi-GPU training in PyTorch.

Thank you:)

Question about equation 5

In equation 5, you assume that $p_{\theta}(G^{t-1}|G^t)$ factorizes over nodes and edges independently. Is this assumption fair given that each $p_{\theta}(x_i|G^t)$ is the output of the graph transformer network, which uses all the other node/edge representations to estimate the output distribution for a single node?

Training the regressor guidance

It appears that the regressor is trained on the real graphs, where as in the original guided diffusion paper, the classifier is trained on the noisy images. Is this intentional? Many thanks.

loading generated_samples txt files

Hi Clement,

Nice work here. I am wondering how to load the generated samples which you give (eg. generated_samples_sbm.txt) to a list of networkx graphs. Is there code for this/a package you use for the loading

Regressor model and extra features

Why is QM9RegressorDiscrete designed to not use extra_features? Couldn't the spectral features of the noisy graph possibly help with the regressor on predicting target properties of a clean graph?

Question in “python3 guidance/train_qm9_regressor.py +experiment=regressor_model.yaml”

We try to run the code as command in title, but we find that target is [] in qm9_regressor_discrete.py, line 170. Can you provide any suggestions ？ Thanks.

How long does the code run in your GPU server?

Deriving the closed-form of the conditional true posterior

In equation (1), you give the true posterior conditioned on $x$ as : $q(z^{t-1}|z^t, x) \propto z^t (Q^t)' \odot x \bar{Q}^{t-1}$, which I am guessing comes from Austin (2021)'s Equation 3: $q(z^{t-1}|z^t, x) = \frac{z^t (Q^t)' \odot x \bar{Q}^{t-1}}{x \bar{Q}^{t} (z^t)'}$.

I am not sure how to get this expression from the given definitions. Here is what I have so far:

$$ \begin{align} q(z^{t-1}|z^t, x) &= \frac{q(z^t|z^{t-1},x) q(z^{t-1}|x)}{q(z^t|x)} \\ &= \frac{z^{t-1} Q^t \odot x \bar{Q}^{t-1}}{x \bar{Q}^t} \end{align} $$

I am not sure how to get the $(z^t)'$ at the bottom or get rid of the $z^{t-1}$ at the top. Any help is appreciated!

Thanks in advance.

Questions about Metrics (1. Scaf on MOSES & 2. FCD on GuacaMol)

Hi,

For MOSES dataset, I used the smiles file you put in the dir '/generated_samples/generated_smiles_moses.txt', and the metrics evaluation code provided by MOSES repo. For DiGress, Scaf metric on MOSES is around 0.9.

For GuacaMol dataset, I used the smiles file you put in the dir '/generated_samples/digress_guacamol_smiles.txt', and the code imported from fcd_torch. For DiGress, FCD metric on GuacaMol is around 1.78.

May I know why there is such a big gap with the results reported in your paper? Can you provide these metric codes in the repo?

I would also like to know if the results of other baseline methods in your paper are tested using the same code?

BTW, why the FCD metric on GuacaMol is indicated as the bigger is better in the Table 4 in your paper?

Thank you very much.

Question about FiLM layer

As it's mentioned in the 3.2 DENOISING NETWORK PARAMETRIZATION and Appendix B, a FiLM layer is used to incorporate edge feature and global feature. Why do you use FiLM layer and What other layer did you try for incorporation?

Validation results show nan all the time

I encounter a strange result during validating. the result is

Starting train epoch...
Epoch X: Val NLL nan -- Val Atom type KL nan -- Val Edge type KL: nan
Val loss: nan Best val loss: 100000000.0000

the NLL is always nan, why?

The checkpoint of Guacamol dataset

Hi,

Since your email address is unreachable, I apologise for sending the message here.

To better follow your work, could you please provide the checkpoint of Guacamol dataset?

Thankk you very much!

How to run the code in multi-gpu mode

Dear authors,

I am sorry to bother you. I ran the code by ddp, however, it threw out errors like this "RuntimeError: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1". I also noticed that you said that this branch is not implemented for multi-gpu. Could you please tell me which part should be modified to adapt it for ddp mode?

Best regards,
Lei

error occur when "Navigate to the ./src/analysis/orca directory and compile orca.cpp: `g++ -O2 -std=c++11 -o orca orca.cpp"`

orca.cpp: In function ‘int writeResults(int, const char*)’:
orca.cpp:1341:13: warning: control reaches end of non-void function [-Wreturn-type]
1341 | fstream fout;
| ^~~~
orca.cpp: In function ‘int writeEdgeResults(int, const char*)’:
orca.cpp:1374:13: warning: control reaches end of non-void function [-Wreturn-type]
1374 | fstream fout;
| ^~~~

Logits vs probs when computing NLL?

In the KL prior and L_{t-1} terms, you use the log of the probabilities to compute the kl divergence. See for example here. Why do you use logits instead of the probabilities directly? And if you're taking the log of the model's probability, shouldn't you do the same for whatever you're comparing too (e.g. limit_dist in kl_prior or true posterior for L_{t-1} terms)?

Reproducing NLL on QM9

Hi,

Great paper, and really appreciate how well-documented the code is! I'm trying to run the discrete model on QM9 (just python main.py), but my validation NLL seems to get stuck around 68-69 during training, and my test NLL is similarly high. Is this something you observe? Is there a different config / parameter values I should run in order to obtain ~23?

Thank you!

pytorch_lightning.utilities.exceptions.MisconfigurationException

As I was executing the command "python3 guidance/train_qm9_regressor.py +experiment=regressor_model.yaml", an error message "pytorch_lightning.utilities.exceptions.MisconfigurationException: The provided lr scheduler ExponentialLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler." occurred.

I'm wondering if I installed the wrong version of Pytorch (2.0 with CUDA 11.8) and Pytorch-geometric.
Is it possible to solve this by downgrading the version of Pytorch or some other module? Thanks.

[Update 1]
As I tried to train a regressor after installing the guidance version according to the instruction (with Pytorch 1.11+CUDA11.3) of readme, a runtime error occurred: RuntimeError: object has no attribute sparse_csc_tensor

[Update 2]
With Pytorch 1.10+CUDA 11.1, another error occurred as I tried to train a regressor after installing the guidance version according to the instruction: AttributeError: module 'distutils' has no attribute 'version'

I found the common solution is downgrading the version of setuptools, but if I try to downgrade the version of setuptools to <=59.5.0, running the code "train_qm9_regressor.py" will lead to "Segmentation fault (core dumped)"......
Is there anything else I can try? Thanks.

Segmentation fault in the code aligning molecules for visualization

From what I understood, you use this code (until line 111) to align the molecules you plot in a chain. For certain molecules, the alignment causes a segmentation fault. Here is a minimal example to reproduce:

Install rdkit and imageio following your instructions:

conda create -c conda-forge -n test-digress rdkit python=3.9
conda activate test-digress
conda install pip
pip install imageio==2.26.1

Run this minimal script using the correct example to benchmark, or the seg_fault/error example to see the error. If you comment the alignment code, both examples run normally. Make sure the molecule alignment code is uncommented in both cases:

import rdkit
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from rdkit.Geometry import Point3D
import numpy as np

import os
import imageio

def mol_from_graphs(node_list, adjacency_matrix):
    """
    Convert graphs to rdkit molecules
    node_list: the nodes of a batch of nodes (bs x n)
    adjacency_matrix: the adjacency_matrix of the molecule (bs x n x n)
    """
    # dictionary to map integer value to the char of atom

    atom_decoder = ['C', 'N', 'O', 'F']
    atom_decoder = atom_decoder

    # create empty editable mol object
    mol = Chem.RWMol()

    # add atoms to mol and keep track of index
    node_to_idx = {}
    for i in range(len(node_list)):
        if node_list[i] == -1:
            continue
        a = Chem.Atom(atom_decoder[int(node_list[i])])
        molIdx = mol.AddAtom(a)
        node_to_idx[i] = molIdx

    for ix, row in enumerate(adjacency_matrix):
        for iy, bond in enumerate(row):
            # only traverse half the symmetric matrix
            if iy <= ix:
                continue
            if bond == 1:
                bond_type = Chem.rdchem.BondType.SINGLE
            elif bond == 2:
                bond_type = Chem.rdchem.BondType.DOUBLE
            elif bond == 3:
                bond_type = Chem.rdchem.BondType.TRIPLE
            elif bond == 4:
                bond_type = Chem.rdchem.BondType.AROMATIC
            else:
                continue
            mol.AddBond(node_to_idx[ix], node_to_idx[iy], bond_type)

    try:
        mol = mol.GetMol()
    except rdkit.Chem.KekulizeException:
        print("Can't kekulize molecule")
        mol = None
    return mol

def plot_chain_molecules(mols, name='mols'):
    '''
        mols: list of RDKit molecule objects.
    '''
    # # code to align all molecules
    final_molecule = mols[-1]
    AllChem.Compute2DCoords(final_molecule)
    coords = []
    for i, atom in enumerate(final_molecule.GetAtoms()):
        positions = final_molecule.GetConformer().GetAtomPosition(i)
        coords.append((positions.x, positions.y, positions.z))

    for i, mol in enumerate(mols):
        AllChem.Compute2DCoords(mol)
        conf = mol.GetConformer()
        for j, atom in enumerate(mol.GetAtoms()):
            x, y, z = coords[j]
            conf.SetAtomPosition(j, Point3D(x, y, z))

    # generate frames for gif 
    frame_paths = []
    for frame in range(len(mols)):
        # if frame==0: 
        print(f'\nframe {frame}\n')
        file_name = os.path.join(f'{name}_fram_{frame}.png')
        Draw.MolToFile(mols[frame], file_name, size=(300, 300), legend=f"Frame {frame}")
        frame_paths.append(file_name)

    # save gif
    imgs = [imageio.imread(fn) for fn in frame_paths]
    gif_path = os.path.join(f'{name}.gif')
    print(f'gif_path {gif_path}\n')
    imgs.extend([imgs[-1]] * 10)
    imageio.mimsave(gif_path, imgs, subrectangles=True, fps=5)

    # save chains as grid
    img = Draw.MolsToGridImage(mols, molsPerRow=10, subImgSize=(200, 200))
    path_img = os.path.join(f'{name}_grid_image.png')
    print(f'path_img {path_img}\n')
    img.save(path_img)

if __name__ == '__main__':
    node_l_ref = np.array([0., 2., 2., 0., 0., 0., 1., 0., 2.], dtype=np.float32)
    adj_m_ref = np.array([[0., 0., 0., 0., 1., 0., 1., 0., 2.],
                        [0., 0., 0., 0., 0., 0., 0., 1., 3.],
                        [0., 0., 0., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 0., 0., 0., 0., 0., 3., 0.],
                        [1., 0., 0., 0., 0., 0., 4., 0., 2.],
                        [0., 0., 0., 0., 0., 0., 1., 1., 0.],
                        [1., 0., 0., 0., 4., 1., 0., 0., 1.],
                        [0., 1., 0., 3., 0., 1., 0., 0., 0.],
                        [2., 3., 0., 0., 2., 0., 1., 0., 0.]], dtype=np.float32)
    mol_ref = mol_from_graphs(node_l_ref, adj_m_ref)
    
    node_l_error = np.array([0., 2., 0., 0., 1., 0., 0., 0., 1.], dtype=np.float32)
    adj_m_error = np.array([[0., 4., 4., 0., 0., 3., 0., 4., 4.],
                        [4., 0., 3., 4., 4., 4., 1., 2., 4.],
                        [4., 3., 0., 4., 4., 1., 4., 2., 4.],
                        [0., 4., 4., 0., 4., 4., 4., 3., 4.],
                        [0., 4., 4., 4., 0., 1., 4., 4., 0.],
                        [3., 4., 1., 4., 1., 0., 4., 4., 4.],
                        [0., 1., 4., 4., 4., 4., 0., 4., 4.],
                        [4., 2., 2., 3., 4., 4., 4., 0., 4.],
                        [4., 4., 4., 4., 0., 4., 4., 4., 0.]], dtype=np.float32)
    mol_error = mol_from_graphs(node_l_error, adj_m_error)
    
    node_l_correct = np.array([0., 0., 0., 1., 0., 0., 0., 0., 0.], dtype=np.float32)
    adj_m_correct = np.array([[0., 0., 1., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 0., 0., 0., 0., 0., 1., 0.],
                        [1., 0., 0., 0., 1., 1., 1., 1., 0.],
                        [0., 0., 0., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 1., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 1., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 1., 0., 0., 0., 0., 0., 0.],
                        [0., 1., 1., 0., 0., 0., 0., 0., 0.],
                        [0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=np.float32)
    mol_correct = mol_from_graphs(node_l_correct, adj_m_correct)

    plot_chain_molecules(mols=[mol_ref, mol_correct], name='correct_aligned')
    # plot_chain_molecules(mols=[mol_ref, mol_error], name='seg_fault_aligned')

Note that all molecules I am using here were generated using a partially trained digress model on qm9 with a marginal transition and no extra features. Because the code throws a segmentation fault in these cases, the whole script fails (usually at the testing phase).

So my questions are:
1- Is this alignment necessary at all? I tried plotting without alignment and the molecules look reasonable, but perhaps harder to compare.
2- Did you try other ways of aligning the molecules? Using Rdkit's AlignMol for example.
3- Any tips on avoiding this segmentation error?

Thanks in advance!

None gradients for 'y' layers

It looks like the gradients of the y_mlp_out and all components involving y in the last transformer neural network layer are None. Therefore, this part of the model is not training. The components of other inputs (X and E) seems to be working normally.

To reproduce the behavior, replace the 'trainer' line with this code:

    model = model.train() 
    #print(f'model {count_parameters(model)}')
    print('==== Done loading the model...')

    optimizer = torch.optim.AdamW(model.parameters(), lr=cfg.train.lr, amsgrad=True,
                                  weight_decay=cfg.train.weight_decay)
    
    train_loader = datamodule.train_dataloader()
    losses = []       
    for epoch in range(cfg.train.n_epochs):
        total_loss = 0
        for i, train_samples in enumerate(train_loader):
            # train_samples = train_samples.to(device)
            loss = model.training_step(train_samples, i) # loss for one batch
            loss = loss['loss']
            loss.backward()
            for name, p in model.named_parameters():
                if '_y' in name:
                    print(f'name: {name}, requires {p.requires_grad}, p {p}, grad, {p.grad}\n')
            optimizer.step()
            optimizer.zero_grad()
            total_loss += loss.cpu().detach().numpy()
            exit() # just to show None in one iteration

The problem appears to be that the 'y' output is not used when computing the loss. I am not sure how to use a cross-entropy loss on X and E alone but still back-propagating to the layers of 'y'.

It's also not clear why the None gradients only appear in the last layers.

KNodeCycles.k6_cycle correctness

Do you by any chance have a reference for the 6-cycle count formula in Appendix B2? I am having a hard time matching the formula (or the implementation in KNodeCycles.k6_cycle) to that of Chang and Fu 2003. Thank you!

Does the graph transformer take graph connectivity into account?

Hi, thanks for the great work!

Looking at NodeEdgeBlock layer it seems that the attention is applied assuming full connectivity in the graph (except for invalid nodes masked out by node_mask). Is this the case? If so, is there a reason why you didnt go with the graph attention layer implementation here that takes into account the connectivity?

Thanks!
Yawar

Batch_size can only be set less than 8 for sbm dataset on NVIDIA GeForce RTX 3090

Hello!
I run the discrete code with the default sbm config (batch_size=512)

python3 main.py dataset=sbm

but end with torch.cuda.OutOfMemoryError: CUDA out of memory.

Server environment:

OS: Ubuntu 20
GPU : NVIDIA GeForce RTX 3090 with 24GB memory

Then I try to reduce batch_size, and find that batch_size=8 works, and when batch_size >=16, it will have OOM.

I notice that you use batch_size=512 for sbm, so I am wondering is it normal for my run? It will take 6 days to finish sbm experiment if batch_size=8. Is it possible to set a larger batch_size with my current GPU server?

Thanks a lot!

Loss Function

The loss function used in the training is CrossEntropy(model(G_t), G), whereas traditionally in diffusion models it is CrossEntropy(model(G_t), G_{t-1}). Does using the latter make the training worse in your experience? Thank you!

how to change batch size

May I know how to change batch size?

Cross-entropy in minibatching

I have a question about your use of cross-entropy over nodes/edges when mini-batching graphs. If I understood your implementation correctly, to compute the loss for one minibatch, you compute the cross-entropy of a single node and a single edge in your minibatch of graphs. These cross-entropies are averaged over the entire minibatch, then combined via the following formula: $L_{ce} = L_{nodes} + \lambda L_{edges}$ (same as your equation 3).

To me $L_{ce}$ represents the loss for one graph, so I think you should first sum the losses for nodes and edges per graph, then take the mean of such sums over a minibatch. What do you think?

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Thank you for sharing the code. When I set dataset to comm20 or sbm in configs/config.yaml and invoke python main.py, I encountered the issue described in the title. Is this expected? Thanks.

For the given value on nodes and edges

Hi, I have read your code and tried to use your model on our own dataset. I noticed that you defined some values on nodes and edges. For example,

self.n_nodes = torch.tensor([0, 2.2930e-05, 3.8217e-05, 6.8791e-05, 2.3695e-04, 9.7072e-04, 0.0046472, 0.023985, 0.13666, 0.83337]) self.node_types = torch.tensor([0.7230, 0.1151, 0.1593, 0.0026]) self.edge_types = torch.tensor([0.7261, 0.2384, 0.0274, 0.0081, 0.0])

in the qm9_dataset, QM9infos. I'm wondering how to get these values if I want to apply DiGress to my own dataset. And is there any code to generate these values?
Thanks a lot!

New dataset - graph labels

Hello!

I implemented my own dataset class, and it worked quite well.
However, when I try to add graph labels, i.e., populate 'y' in the data loader with one-hot encoded vectors, I get a mismatch in size in the forward function of the transformer model, when computing self.mlp_out_y.

The problem is not in the data loader itself: the batches are generated correctly. But when I print the tensor shapes in the forward method, 'X' and 'E' have the correct shape but not 'y'. From the second batch, 'y' is just one dimension and is filled with 0s. The first batch seems to be correct.

I thought it was a problem with my implementation, but I tried to change the size of 'y' in the spectre dataset ( y = torch.zeros([1, 2]).float()) in __getitem__(self, idx), for example, and the same problem occured. The first batch is ok, and for the second batch, the shape of 'y' is incorrect in the forward method.

So my two questions are:

Is it possible to train DiGress with categorical graph labels?
If yes, do you know how to overcome this problem?

Thank you! :)

Computing node and edge features

Is DiscreteDenoisingDiffusion.compute_extra_data() intended to be differentiable, or does its differentiability not matter?

questions about the hyperparameters

Hi, thanks for your code so much!
I'm wondering whether I can reproduce the results in the paper by simply running main.py without changing any hyperparameters? If I have to change the hyperparameters, then which hyperparamerters should I choose?
Thanks so much for your kindest reply!

what is orca.cpp used for?

May I know what orca.cpp is used for?

beta_bar_t equation, might be error in paper

beta_bar_t = 1 - alpha_bar_t as in code,

instead of beta_bar_t = \prod (1 - alpha_t) in paper(https://openreview.net/pdf?id=UaAD-Nu86WX)

might be an error;

Approximation of regressor guidance

Hi Clement, In sec 5 of the paper, you mentioned "p_η cannot be evaluated for all possible values of $G_{t-1}$". Is this referring to the fact that $G_{t-1}$ is a probabilistic graph, but p_η was trained on one-hot graphs?

Are there any other method (alternative to your proposed 1st-order approximation) to do classifier-guided discrete diffusion that you are aware of? Is straight-through estimation a viable option? Thank you!

Docker image

Hi, do you have a pre-built docker image that can be used to run DiGress by any chance? I would highly appreciate that.

Bug in self-attention?

Hi Clement, in file src/models/transformer_model.py line 159, you intend to compute the unnormalized attention scores, i.e. the dot product of the query and key vectors. However, in the code, just the query and key vectors are multiplied, without summing over the feature dimension. This effectively computes a separate attention score for each feature dimension.

On line 184 you comment that the shape of attn is 'bs, n, n, n_head', although it actually is 'bs, n, n, n_head, df', which can be seen on line 191, where attn is multiplied with a vector of shape '(bs, 1, n, n_head, df)'.

I couldn't find any comments on this in the paper, so I'm wondering if is on purpose or a bug.

Implementation of FiLM y to E?

There is a comment on this line pointing out that you're using dx instead of de when computing FiLM from y to E. Shouldn't it be de actually?

Use of self.T when computing the L_t terms in the loss

I am talking about this line. I noticed Appendix E.2 of (Kingma, 2021) mentions a multiplication by T/2 as an unbiased estimator of the sum. D3PM computes the sum explicitly (if I understood their code correctly).

Is your multiplication by T also a form of unbiased estimator? Can you point me to any literature to understand how to derive this estimator?

Added instructions for filtering guacamol

Invalid Molecule Obtained

Hi Clement, thank you for the constant updates these days in the repository. I wanted to ask you about one strange error I get while running the script python3 main.py +experiment=debug.yaml

I have installed all the necessary packages, so far following your instructions on environment installation. However, before launching the full experiments, I wanted to run the debugging code and see how it performs.

Unfortunately, having outputted the ret = run_job() and 3 successful 100 % processing stages, it starts to give "Invalid molecule obtained" while Converting the QM9 dataset to SMILES for remove_h=True...

Do you know the reasons why this may occur? Thanks for the help!

Extending molecule with partial charges

Hi,

Is there any reason you only handle O+, N+, S+ in molecule_with_partial_charge? I am wondering if it's possible to apply the same logic to all atom types and positive and negative charges?

Best,

Filtered guacamol

Hi Clement. I have some questions about the Guacamol data set.

When I run python3 main.py dataset=guacamol, I get the error urllib.error.HTTPError: HTTP Error 403: Forbidden. It looks like it is limited by some anti-crawler tactics. So I downloaded these files directly via the download link provided in class GuacamolDataset, i.e., 'guacamol_v1_train.smiles', 'guacamol_v1_valid.smiles', 'guacamol_v1_test.smiles'. Then I get the processed data by annotating the code with download_url.
However, these data are unfiltered data, how could I download the filtered Guacamol data, namely ''new_train.smiles', 'new_val.smiles', 'new_test.smiles''?

Thank you!

Conditional Generation based on Subgraph

Hello,

I work in drug discovery and am very interested in the application of this model in the generation of drug-molecules which contain a predefined motif, as you demonstrate in Appendix E. Would you be able to share a code example of the node and edge feature masking for motif preserving generation?

Wouldn't this require retraining the model on molecules with the given motif such that the noise model preserves the motif during diffusion? From this I could see how by masking/disallowing transitions for edges and nodes in the motif during denoising and letting everything else denoise regularly would result in structures which extend the motif.... or maybe I'm thinking about this wrong and no retraining is needed?

Best,
Talal

TypeError: EMA.on_save_checkpoint: return type `<class 'dict'>` is not a `<class 'NoneType'>`

Hi,

I met the error when I ran the main.py which imports the utils.py file: TypeError: EMA.on_save_checkpoint: return type <class 'dict'> is not a <class 'NoneType'>.

Could you pls help me solve this issue?

Thank you very much.

DiGress for another datasets

Hi Clement,

Thank you for the regular updates on your paper. I actually wanted to ask you about the other possible applications of DiGress instead of Moses, QM9, and so forth mentioned in the experiments. Basically, what I have is the train/valid/test datasets of SMILES strings along with ten reaction types from the US patent literature. I tried to use the abstract_dataset.py to convert strings to graph representations but it caused a lot of bugs.

Do you recommend preparing a MOSES-style .csv file and using moses_dataset.py in a similar fashion (SMILES, Split) for my dataset? FYI, it is USPTO-50k: https://github.com/vsomnath/graphretro/tree/main/datasets/uspto-50k

What is wandb used for

May I know what wandb is used for? Why should I need a key?

where is ./util/orca

hi,

you mentioned "Navigate to the ./util/orca directory and compile orca.cpp: g++ -O2 -std=c++11 -o orca orca.cpp", but i did't find ./util/orca.

thanks a lot

Substructure conditioned generation

Hello, would it be possible to obtain the code implementation for the 'E. Substructure conditioned generation' method discussed in the paper? I'm particularly interested in this section and would greatly appreciate your help. Thank you in advance!

training loss

May I know why you use the cross entropy between the clean graph and the noise graph as the training loss?
In paper like DDMP, it uses the distance between the true noise and the predicted noise. Why don't you use the same idea as DDMP, aka, the true graph noise and the predicted graph noise?

what is compute_extra_data?

May I know what the purpose of function "compute_extra_data"?
I could not find it in the paper.