Coder Social home page Coder Social logo

gospodima / extended-simgnn Goto Github PK

View Code? Open in Web Editor NEW
53.0 4.0 13.0 796 KB

A PyTorch Geometric implementation of SimGNN with some extensions.

License: MIT License

Python 100.00%
pytorch pytorch-geometric graph-neural-networks graph-edit-distance simgnn graph-isomorphism-network deep-learning machine-learning graph-convolutional-networks network-embedding

extended-simgnn's People

Contributors

gospodima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

extended-simgnn's Issues

About "Fully Connected Layers" Settings

Hi, I noticed that in the SimGNN paper, there was a setting for FCL, "use 4 fully connected layers to reduce the dimension of the concatenated results from the NTN module, from 32 to 16, 16 to 8, 8 to 4, and 4 to 1." However, I found that in this code, there are only two fully connected layers, which are 32 --> 16, and 16--> 1.
When I modify the code according to the Settings in the paper, I find that the performance is worse by ten times, I don't know why, isn't the general Settings in the paper better?
Here is the code in this Repository
lu1
lu2

Here is my code
my1
my2

About how to test a new dataset?

Hello, your code is very useful to testing the datasets of ADIS. Now, I have a new datasets. If i want to test the new datasets, need I create a new class like GEDDataset using pyG? In that case, could you give me some suggestion and notice about creating new datasets? I would appreciate it if you could reply !

Asking for the best result of the code

I'm happy to get the similar MSE with your result shown in the picture and your code is pretty good. However, it's larger than the result in the thesis(1.189). I hope to discuss about the best running result of the code and the possibility of repeating the results in the thesis. Thanks!

The potential bug in the "calculate_prec_at_k"

Hi,

Your code is very helpful for this topic. While I might find a bug on the function "calculate_prec_at_k" for recall acc calculation. In the "calculate_prec_at_k", there are two potential bugs:

  1. In "calculate_prec_at_k" (line 47-48 of "utils.py"), the correct way should be
    best_k_pred = prediction.argsort()[:k] -> best_k_pred = prediction.argsort()[:,:,-1][:k]
    best_k_target = target.argsort()[:k] -> best_k_target = target.argsort()[:,:,-1][:k]
    The decrese order is correct as the normalized GED is in [1, 0] and 1 means the exactly same.

  2. Also in the "calculate_prec_at_k", there might be multiple ground truth graphs whose quantity is larger than k. In other words, if setting k as 10, the 10th, 11th and 12th similar graphs might share the same (equal) ground truth GED score with the predict graph. In this case, the ground truth graphs should be larger than k, i.e. k=10, and set a threshold to select the other graphs to the "best_k_target". Here are my solutions:

def _calculate_prec_at_k(k, target):
target_increase = np.sort(target)
target_value_sel = (target_increase <= target_increase[k-1]).sum()
if target_value_sel > k:
best_k_target = target.argsort()[:target_value_sel]
else:
best_k_target = target.argsort()[:k]
return best_k_target

Error downloading GEDDataset

*** [[ copied my bug report from pytorch_geometric ]] ***

🐛 Bug

While running main.py from Extended-SimGNN
I am getting an error when downloading GEDDataset from the Google Drive URL
I suspect this happens because the url has invalid characters for filename such as "?"

**Note: I am using Windows 10 pro with venv, without docker or virtualization

To Reproduce

Steps to reproduce the behavior:

  1. run main.py from Extended-SimGNN
    alternatively - download any GEDDataset directly from pyg

Expected behavior

the dataset file is supposed to be downloaded from url and save to self.raw_dir for further processing

Environment

  • OS: Windows 10 Pro
  • Python version: 3.7.7
  • PyTorch version: 1.6.0+cpu
  • CUDA/cuDNN version: N/A
  • GCC version: N/A
  • Any other relevant information:

Additional context

Downloading https://drive.google.com/uc?export=download&id=10czBPJDEzEDI2tq7Z7mkBjLhj55F-a2z
Traceback (most recent call last):
File "<ROOT_DIR>/Extended-SimGNN/src/main.py", line 36, in
main()
File "<ROOT_DIR>/Extended-SimGNN/src/main.py", line 15, in main
trainer = SimGNNTrainer(args)
File "<ROOT_DIR>\Extended-SimGNN\src\simgnn.py", line 192, in init
self.process_dataset()
File "<ROOT_DIR>\Extended-SimGNN\src\simgnn.py", line 207, in process_dataset
self.training_graphs = GEDDataset(r'datasets{}'.format(self.args.dataset), self.args.dataset, train=True)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\datasets\ged_dataset.py", line 89, in init
pre_filter)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\in_memory_dataset.py", line 54, in init
pre_filter)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\dataset.py", line 89, in init
self._download()
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\dataset.py", line 141, in _download
self.download()
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\datasets\ged_dataset.py", line 107, in download
path = download_url(self.url.format(name), self.raw_dir)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\download.py", line 33, in download_url
with open(path, 'wb') as f:
OSError: [Errno 22] Invalid argument: 'datasets\AIDS700nef\raw\uc?export=download&id=10czBPJDEzEDI2tq7Z7mkBjLhj55F-a2z'

Dataset Issues

I checked out the dataset already uploaded onto PyTorch Geometric (the GEDDataset Class).
I don't know about you, but the problem I faced was that for all the GEDDatasets - Alkane, Linux and AIDS700nef, the GEDs for a pair of Graphs G1, G2 where both G1 and G2 belong to the TEST SET is set to 'inf'
How did you deal with this?
Also the feature matrix for the nodes (i.e data.x) is only given for the AIDS700nef dataset and not the other two - so how did you deal with that

About the performance reported in the original paper[SimGNN]

Hi, thanks for your contribution to this PyTorch implementation of SimGNN. I'm concerned about whether the provided code of this repo can reach the performance reported in the original SimGNN paper, i.e. 1.189 mse(10^-3) for AIDS dataset.

This should not use the updated "k"

best_k_pred = prediction.argsort()[::-1][:k]

To evaluate the top "k" predicted results, which do not have duplicate values, I think this should use the original "k" instead of the updated one. And to select the top "k" ground truth values, it should consider the duplicate cases and the "k" should be updated when there are duplicate results greater than "k". So, I think the line 56 may be wrong and line 57 is correct.

A BUG

There is something wrong with mean = scatter_('mean', x, batch, size) in function forward in layer.py.
Because of the upgrade of torch-scatter.
I replace mean = scatter_('mean', x, batch, size) by mean = scatter_('mean', x, batch, dim_size=size) and fix it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.