gospodima / extended-simgnn Goto Github PK

A PyTorch Geometric implementation of SimGNN with some extensions.

License: MIT License

Python 100.00%

pytorch pytorch-geometric graph-neural-networks graph-edit-distance simgnn graph-isomorphism-network deep-learning machine-learning graph-convolutional-networks network-embedding

extended-simgnn's People

Contributors

Stargazers

Watchers

Forkers

wondervictor canqin001 ljc19800331 jiawozhong lordoz234 saneins niharikavadlamudi prophecyqaq bucephiluce yzh-dev zcongfly

extended-simgnn's Issues

About "Fully Connected Layers" Settings

Hi, I noticed that in the SimGNN paper, there was a setting for FCL, "use 4 fully connected layers to reduce the dimension of the concatenated results from the NTN module, from 32 to 16, 16 to 8, 8 to 4, and 4 to 1." However, I found that in this code, there are only two fully connected layers, which are 32 --> 16, and 16--> 1.
When I modify the code according to the Settings in the paper, I find that the performance is worse by ten times, I don't know why, isn't the general Settings in the paper better?
Here is the code in this Repository

Here is my code

About how to test a new dataset？

Hello, your code is very useful to testing the datasets of ADIS. Now, I have a new datasets. If i want to test the new datasets, need I create a new class like GEDDataset using pyG? In that case, could you give me some suggestion and notice about creating new datasets? I would appreciate it if you could reply !

Asking for the best result of the code

I'm happy to get the similar MSE with your result shown in the picture and your code is pretty good. However, it's larger than the result in the thesis(1.189). I hope to discuss about the best running result of the code and the possibility of repeating the results in the thesis. Thanks!

The potential bug in the "calculate_prec_at_k"

Hi,

Your code is very helpful for this topic. While I might find a bug on the function "calculate_prec_at_k" for recall acc calculation. In the "calculate_prec_at_k", there are two potential bugs:

In "calculate_prec_at_k" (line 47-48 of "utils.py"), the correct way should be
best_k_pred = prediction.argsort()[:k] -> best_k_pred = prediction.argsort()[:,:,-1][:k]
best_k_target = target.argsort()[:k] -> best_k_target = target.argsort()[:,:,-1][:k]
The decrese order is correct as the normalized GED is in [1, 0] and 1 means the exactly same.
Also in the "calculate_prec_at_k", there might be multiple ground truth graphs whose quantity is larger than k. In other words, if setting k as 10, the 10th, 11th and 12th similar graphs might share the same (equal) ground truth GED score with the predict graph. In this case, the ground truth graphs should be larger than k, i.e. k=10, and set a threshold to select the other graphs to the "best_k_target". Here are my solutions:

def _calculate_prec_at_k(k, target):
target_increase = np.sort(target)
target_value_sel = (target_increase <= target_increase[k-1]).sum()
if target_value_sel > k:
best_k_target = target.argsort()[:target_value_sel]
else:
best_k_target = target.argsort()[:k]
return best_k_target

Error downloading GEDDataset

*** [[ copied my bug report from pytorch_geometric ]] ***

🐛 Bug

While running main.py from Extended-SimGNN
I am getting an error when downloading GEDDataset from the Google Drive URL
I suspect this happens because the url has invalid characters for filename such as "?"

**Note: I am using Windows 10 pro with venv, without docker or virtualization

To Reproduce

Steps to reproduce the behavior:

run main.py from Extended-SimGNN
alternatively - download any GEDDataset directly from pyg

Expected behavior

the dataset file is supposed to be downloaded from url and save to self.raw_dir for further processing

Environment

OS: Windows 10 Pro
Python version: 3.7.7
PyTorch version: 1.6.0+cpu
CUDA/cuDNN version: N/A
GCC version: N/A
Any other relevant information:

Additional context

Downloading https://drive.google.com/uc?export=download&id=10czBPJDEzEDI2tq7Z7mkBjLhj55F-a2z
Traceback (most recent call last):
File "<ROOT_DIR>/Extended-SimGNN/src/main.py", line 36, in
main()
File "<ROOT_DIR>/Extended-SimGNN/src/main.py", line 15, in main
trainer = SimGNNTrainer(args)
File "<ROOT_DIR>\Extended-SimGNN\src\simgnn.py", line 192, in init
self.process_dataset()
File "<ROOT_DIR>\Extended-SimGNN\src\simgnn.py", line 207, in process_dataset
self.training_graphs = GEDDataset(r'datasets{}'.format(self.args.dataset), self.args.dataset, train=True)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\datasets\ged_dataset.py", line 89, in init
pre_filter)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\in_memory_dataset.py", line 54, in init
pre_filter)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\dataset.py", line 89, in init
self._download()
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\dataset.py", line 141, in _download
self.download()
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\datasets\ged_dataset.py", line 107, in download
path = download_url(self.url.format(name), self.raw_dir)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\download.py", line 33, in download_url
with open(path, 'wb') as f:
OSError: [Errno 22] Invalid argument: 'datasets\AIDS700nef\raw\uc?export=download&id=10czBPJDEzEDI2tq7Z7mkBjLhj55F-a2z'

Dataset Issues

I checked out the dataset already uploaded onto PyTorch Geometric (the GEDDataset Class).
I don't know about you, but the problem I faced was that for all the GEDDatasets - Alkane, Linux and AIDS700nef, the GEDs for a pair of Graphs G1, G2 where both G1 and G2 belong to the TEST SET is set to 'inf'
How did you deal with this?
Also the feature matrix for the nodes (i.e data.x) is only given for the AIDS700nef dataset and not the other two - so how did you deal with that

The Configs to Achieve The Best Performance

Hi,

Did you try different configs on these datasets? I am curious what are the best configs on different datasets.

Thanks.

About the performance reported in the original paper[SimGNN]

Hi, thanks for your contribution to this PyTorch implementation of SimGNN. I'm concerned about whether the provided code of this repo can reach the performance reported in the original SimGNN paper, i.e. 1.189 mse(10^-3) for AIDS dataset.

This should not use the updated "k"

Extended-SimGNN/src/utils.py

Line 56 in 7f0d631

best_k_pred = prediction.argsort()[::-1][:k]

To evaluate the top "k" predicted results, which do not have duplicate values, I think this should use the original "k" instead of the updated one. And to select the top "k" ground truth values, it should consider the duplicate cases and the "k" should be updated when there are duplicate results greater than "k". So, I think the line 56 may be wrong and line 57 is correct.

A BUG

There is something wrong with mean = scatter_('mean', x, batch, size) in function forward in layer.py.
Because of the upgrade of torch-scatter.
I replace mean = scatter_('mean', x, batch, size) by mean = scatter_('mean', x, batch, dim_size=size) and fix it.