gospodima / extended-simgnn Goto Github PK
View Code? Open in Web Editor NEWA PyTorch Geometric implementation of SimGNN with some extensions.
License: MIT License
A PyTorch Geometric implementation of SimGNN with some extensions.
License: MIT License
Hi, I noticed that in the SimGNN paper, there was a setting for FCL, "use 4 fully connected layers to reduce the dimension of the concatenated results from the NTN module, from 32 to 16, 16 to 8, 8 to 4, and 4 to 1." However, I found that in this code, there are only two fully connected layers, which are 32 --> 16, and 16--> 1.
When I modify the code according to the Settings in the paper, I find that the performance is worse by ten times, I don't know why, isn't the general Settings in the paper better?
Here is the code in this Repository
Hello, your code is very useful to testing the datasets of ADIS. Now, I have a new datasets. If i want to test the new datasets, need I create a new class like GEDDataset using pyG? In that case, could you give me some suggestion and notice about creating new datasets? I would appreciate it if you could reply !
I'm happy to get the similar MSE with your result shown in the picture and your code is pretty good. However, it's larger than the result in the thesis(1.189). I hope to discuss about the best running result of the code and the possibility of repeating the results in the thesis. Thanks!
Hi,
Your code is very helpful for this topic. While I might find a bug on the function "calculate_prec_at_k" for recall acc calculation. In the "calculate_prec_at_k", there are two potential bugs:
In "calculate_prec_at_k" (line 47-48 of "utils.py"), the correct way should be
best_k_pred = prediction.argsort()[:k] -> best_k_pred = prediction.argsort()[:,:,-1][:k]
best_k_target = target.argsort()[:k] -> best_k_target = target.argsort()[:,:,-1][:k]
The decrese order is correct as the normalized GED is in [1, 0] and 1 means the exactly same.
Also in the "calculate_prec_at_k", there might be multiple ground truth graphs whose quantity is larger than k. In other words, if setting k as 10, the 10th, 11th and 12th similar graphs might share the same (equal) ground truth GED score with the predict graph. In this case, the ground truth graphs should be larger than k, i.e. k=10, and set a threshold to select the other graphs to the "best_k_target". Here are my solutions:
def _calculate_prec_at_k(k, target):
target_increase = np.sort(target)
target_value_sel = (target_increase <= target_increase[k-1]).sum()
if target_value_sel > k:
best_k_target = target.argsort()[:target_value_sel]
else:
best_k_target = target.argsort()[:k]
return best_k_target
*** [[ copied my bug report from pytorch_geometric ]] ***
While running main.py from Extended-SimGNN
I am getting an error when downloading GEDDataset from the Google Drive URL
I suspect this happens because the url has invalid characters for filename such as "?"
**Note: I am using Windows 10 pro with venv, without docker or virtualization
Steps to reproduce the behavior:
the dataset file is supposed to be downloaded from url and save to self.raw_dir for further processing
Downloading https://drive.google.com/uc?export=download&id=10czBPJDEzEDI2tq7Z7mkBjLhj55F-a2z
Traceback (most recent call last):
File "<ROOT_DIR>/Extended-SimGNN/src/main.py", line 36, in
main()
File "<ROOT_DIR>/Extended-SimGNN/src/main.py", line 15, in main
trainer = SimGNNTrainer(args)
File "<ROOT_DIR>\Extended-SimGNN\src\simgnn.py", line 192, in init
self.process_dataset()
File "<ROOT_DIR>\Extended-SimGNN\src\simgnn.py", line 207, in process_dataset
self.training_graphs = GEDDataset(r'datasets{}'.format(self.args.dataset), self.args.dataset, train=True)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\datasets\ged_dataset.py", line 89, in init
pre_filter)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\in_memory_dataset.py", line 54, in init
pre_filter)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\dataset.py", line 89, in init
self._download()
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\dataset.py", line 141, in _download
self.download()
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\datasets\ged_dataset.py", line 107, in download
path = download_url(self.url.format(name), self.raw_dir)
File "<ROOT_DIR>\pyg_sandbox\venv\lib\site-packages\torch_geometric\data\download.py", line 33, in download_url
with open(path, 'wb') as f:
OSError: [Errno 22] Invalid argument: 'datasets\AIDS700nef\raw\uc?export=download&id=10czBPJDEzEDI2tq7Z7mkBjLhj55F-a2z'
I checked out the dataset already uploaded onto PyTorch Geometric (the GEDDataset Class).
I don't know about you, but the problem I faced was that for all the GEDDatasets - Alkane, Linux and AIDS700nef, the GEDs for a pair of Graphs G1, G2 where both G1 and G2 belong to the TEST SET is set to 'inf'
How did you deal with this?
Also the feature matrix for the nodes (i.e data.x) is only given for the AIDS700nef dataset and not the other two - so how did you deal with that
Hi,
Did you try different configs on these datasets? I am curious what are the best configs on different datasets.
Thanks.
Hi, thanks for your contribution to this PyTorch implementation of SimGNN. I'm concerned about whether the provided code of this repo can reach the performance reported in the original SimGNN paper, i.e. 1.189 mse(10^-3) for AIDS dataset.
Line 56 in 7f0d631
To evaluate the top "k" predicted results, which do not have duplicate values, I think this should use the original "k" instead of the updated one. And to select the top "k" ground truth values, it should consider the duplicate cases and the "k" should be updated when there are duplicate results greater than "k". So, I think the line 56 may be wrong and line 57 is correct.
There is something wrong with mean = scatter_('mean', x, batch, size)
in function forward in layer.py.
Because of the upgrade of torch-scatter.
I replace mean = scatter_('mean', x, batch, size)
by mean = scatter_('mean', x, batch, dim_size=size)
and fix it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.