Coder Social home page Coder Social logo

sdne's Introduction

SDNE

This repository provides a reference implementation of SDNE as described in the paper:

Structural Deep network Embedding.
Daixin Wang, Peng Cui, Wenwu Zhu
Knowledge Discovery and Data Mining, 2016.

The SDNE algorithm learns a representations for nodes in a graph. Please check the paper for more details.

Basic Usage

$ python main.py -c config/xx.ini

noted: your can just checkout and modify config file or main.py to get what you want.

Input

Your input graph data should be a txt file or a mat file and be under GraphData folder

file format

The txt file should be edgelist and the first line should be N , the number of vertexes and E, the number of edges

The mat file should be the adjacent matrix.

you can save your adjacent matrix using the code below

import scipy.io as sio
sio.savemat("xxx.mat", {"graph_sparse":your_adjacent_matrix})

It is recommended to use mat file and save the adjacent matrix in a sparse form.

txt file sample

5242 14496
0 1
0 2
4 9
...
4525 4526

noted: The nodeID start from 0.
noted: The graph should be an undirected graph, so if (I J) exist in the Input file, (J I) should not.

Citing

If you find SDNE useful in your research, we ask that you cite the following paper:

@inproceedings{Wang:2016:SDN:2939672.2939753,
 author = {Wang, Daixin and Cui, Peng and Zhu, Wenwu},
 title = {Structural Deep Network Embedding},
 booktitle = {Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '16},
 year = {2016},
 isbn = {978-1-4503-4232-2},
 location = {San Francisco, California, USA},
 pages = {1225--1234},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2939672.2939753},
 doi = {10.1145/2939672.2939753},
 acmid = {2939753},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {deep learning, network analysis, network embedding},
} 

sdne's People

Contributors

suanrong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sdne's Issues

关于RBM

您好,读了您的paper感觉十分有意思,现在还没怎么详细看代码,但看到您的model中有一个RBM,请问这个RBM是什么呢?是什么的简写吗?

Does it reproduce the results?

The code is understandable. Great Work.
I tried it with few epochs only as I don't have enough time to complete.
Just a quick question, does this code reproduce the results of the paper?

error in function check_link_reconstruction

Hi! I ran SDNE with the train data provided and I received this error:

if (data.adj_matrix[x][y] == 1):

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

The problem is clearly in the definition of x and y,

x = ind / data.N
y = ind % data.N

given that ind is a float, x will be a float and for sure it cannot be used as index for the adjacency matrix.
Can you tell where I might be going wrong or why is this caused? Moreover, which is the logic behind this assignment?

Thanks

Can I employ this SDNE to weighted edges?

Hi suanrong,

I find that the sample inputs in the datagraph file are all unweighted edges.
Is it possible that I run this SDNE using weighted edges dataset?

Thank you!

Is blogCatalog_embedding.mat training on the optimized parameters?

I run check_multi_label_classification() on blogCatalog_embedding.mat, but i can't get the results shown in paper. Is that i doing something wrong?

My code logic:

  1. load embeddingResult/blogCatalog_embedding.mat using scipy.io
  2. load node labels from file GraphData/blogCatalog3-groups.txt
  3. feed node embedding and labels into function check_multi_label_classification()

the results is about 10 points less than reported in the paper.

The parameter you use in SDNE experiment

I try to reproduce the SDNE visualization result using 20ng dataset, according to the paper, I set alpha 0.2 and beta 1 but can not get visualization picture as you get in SDNE paper, my result different groups boundaries is not very clear and each group not very closed. Would you mind tell me the parameter value you set in SDNE visualization experiment or the t-SNE parameter you use in visualization?

Index error in utils/utils.py

Hi there, thanks for great work!

I am trying to run this:

python main.py -c config/ca-grqc.ini

Traceback (most recent call last):
File "/home/netra/mywork/Assessment/SDNE/main.py", line 95, in
print(fout, epochs, "reconstruction:", check_reconstruction(embedding, train_graph_data, config.check_reconstruction))
File "/home/netra/mywork/Assessment/SDNE/utils/utils.py", line 39, in check_reconstruction
precisionK = get_precisionK(embedding, graph_data, np.max(check_index))
File "/home/netra/mywork/Assessment/SDNE/utils/utils.py", line 32, in get_precisionK
if(data.adj_matrix[x].toarray()[0][y] == 1 or x == y):
File "/home/netra/anaconda3/lib/python3.9/site-packages/scipy/sparse/_index.py", line 33, in getitem
row, col = self._validate_indices(key)
File "/home/netra/anaconda3/lib/python3.9/site-packages/scipy/sparse/_index.py", line 138, in _validate_indices
row = self._asindices(row, M)
File "/home/netra/anaconda3/lib/python3.9/site-packages/scipy/sparse/_index.py", line 162, in _asindices
raise IndexError('Index dimension must be <= 2')
IndexError: Index dimension must be <= 2

算法实现中的损失函数与论文中的损失函数的不同

你好,我看了一下代码实现的损失函数貌似和论文中的损失函数有所不同,请问是为什么呢?
为什么有了负采样就不需要正则化了呢/还有self.loss_xxx代表的是什么呢/
#Loss function
self.loss_2nd = get_2nd_loss(self.X, self.X_reconstruct, config.beta)
self.loss_1st = get_1st_loss(self.H, self.adjacent_matriX)
self.loss_xxx = tf.reduce_sum(tf.pow(self.X_reconstruct,2))
# we don't need the regularizer term, since we have nagetive sampling.
#self.loss_reg = get_reg_loss(self.W, self.b)
#return config.gamma * self.loss_1st + config.alpha * self.loss_2nd + config.reg * self.loss_reg`

Basic Usage Error

I am trying to run the basic usage command but i get the following error
raise NoSectionError(section)
configparser.NoSectionError: No section: 'Graph_Data'

please let me know how to fix it

conforms to SDNE results ?

Hi,
It is great that you have implemented SDNE, surely saved many hours of work.
I just wanted to ask, does this implementation produces the results similar to the reported results?

Embedding nodes after training?

We are performing a link prediction on a paper citation network.
We want to train on the citations from 1990-1997, and then predict links for papers after 1997. For example, we may want to predict links for papers in 1998.

To effectively do this, do we simply supply all the nodes from 1990-1998, and train on links from 1990-1997, and then perform link prediction? Or do we only supply nodes from 1990-1997, train on links from 1990-1997, and then somehow add the 1998 nodes into the embedding space after training? I believe the example in the SDNE paper does something like the former.

SDNE on flickr

I successfully generated the embedding of BlogCatalog, but when I modified Config.py for flickr, I got MemoryError:

name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:01:00.0
Total memory: 11.91GiB
Free memory: 11.54GiB
2017-10-15 10:33:38.133688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-10-15 10:33:38.133695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-10-15 10:33:38.133706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
  File "main.py", line 34, in <module>
    model.do_variables_init(graph_data.sample(graph_data.N).X, config.DBN_init)
  File "/home/yba/Documents/SDNE/graph.py", line 76, in sample
    mini_batch.adjacent_matriX = self.adj_matrix[index][:,index]
MemoryError

This is the only code I changed:

        ## graph data
        self.file_path = "GraphData/flickr.txt"
        #self.file_path = "GraphData/ca-Grqc.txt"
        self.label_file_path = "GraphData/flickr-groups.txt"
        ## embedding data
        self.embedding_filename = "embeddingResult/flickr” 

Please let me know what else I need to change to make it work.

Thanks!

Low performance

Hello, I got a problem(low performance micro=0.14 macro=0.04) when running the SDNE code on blogCatalog Dataset . Set the layer as N-1000-100, alpha =0.2 ,beta =10, reg =1. Could you show your performance on this dataset? Thank you very much .

Missing a .mat file

When I clone this repo and try to run it on example dataset blogcatalog. It shows me an error about a missing .mat file defined inside the config file.

train_graph_file = GraphData/blogCatalog3-small.mat

how is MAP for reconstruction computed in paper?

In the paper, you also reported mean average precision (MAP) on the reconstruction task. Do you compute MAP on a set of held-out links in arxiv and blogcatalog datasets? How is MAP computed as reported in Table 4 of the paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.