suanrong / sdne Goto Github PK

View Code? Open in Web Editor NEW

323.0 10.0 119.0 64.1 MB

This is a implementation of SDNE (Structural Deep Network embedding)

Home Page: http://www.kdd.org/kdd2016/subtopic/view/structural-deep-network-embedding

Python 100.00%

sdne's Introduction

SDNE

This repository provides a reference implementation of SDNE as described in the paper:

Structural Deep network Embedding.
Daixin Wang, Peng Cui, Wenwu Zhu
Knowledge Discovery and Data Mining, 2016.

The SDNE algorithm learns a representations for nodes in a graph. Please check the paper for more details.

Basic Usage

$ python main.py -c config/xx.ini

noted: your can just checkout and modify config file or main.py to get what you want.

Input

Your input graph data should be a txt file or a mat file and be under GraphData folder

file format

The txt file should be edgelist and the first line should be N , the number of vertexes and E, the number of edges

The mat file should be the adjacent matrix.

you can save your adjacent matrix using the code below

import scipy.io as sio
sio.savemat("xxx.mat", {"graph_sparse":your_adjacent_matrix})

It is recommended to use mat file and save the adjacent matrix in a sparse form.

txt file sample

noted: The nodeID start from 0.
noted: The graph should be an undirected graph, so if (I J) exist in the Input file, (J I) should not.

Citing

If you find SDNE useful in your research, we ask that you cite the following paper:

@inproceedings{Wang:2016:SDN:2939672.2939753,
 author = {Wang, Daixin and Cui, Peng and Zhu, Wenwu},
 title = {Structural Deep Network Embedding},
 booktitle = {Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '16},
 year = {2016},
 isbn = {978-1-4503-4232-2},
 location = {San Francisco, California, USA},
 pages = {1225--1234},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2939672.2939753},
 doi = {10.1145/2939672.2939753},
 acmid = {2939753},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {deep learning, network analysis, network embedding},
}

sdne's People

Contributors

Stargazers

Watchers

Forkers

ustclin qss2012 pursueorigin anryyang 690609237 buckeyedm zwytop sherwin29 akkikiki liwzhi songfgh yangqiokay movinghera kmruan-whu ranniee yuhuishishishi zzz0092 guangshengshi tosky001 gaoyz0625 ryfan-rs wdqatualr slinliang lpfan0307 dcfucheng lonlybob librahu haonanli dannywu1996 daniel-tan syd951186545 kakaruihoho akirameia 5million kavinge raymondhliu williamwhe pyalex khashwung dmjvictory daiquanyu zlj9155 oliviazzq shuangyumo xdlequ wentaotao huweiwei0105 chenduyu xioayuge fangego aisunnymin1 zhouyy92 shitouxyz123 bzp92 somous-jhzhao rock9684 wangyangtao chensam94 yannan1212 tmacmilan skyxml andrewo7o coffeeclh djurazzi zhongpu 30lm32 emily2he qinghaizheng1992 vul1688 asdfghjkl510 zhenyuqiu dugzzuli muyurainy wmsout realcatking wuxiaomin0110 petamind zhhhzhang cslele shaun2016 cailbh sienna13 flyingzhy dimongu caesarsar yueyedeai pkuwudi treenewbee0 rain0831 yhjflower wjm199717 aliahmadvand66 jiajiadf zilinglin jerry185 michaelkane1919 pkucp jmj8038 diamond0728 sparsel

sdne's Issues

error in function check_link_reconstruction

Hi! I ran SDNE with the train data provided and I received this error:

if (data.adj_matrix[x][y] == 1):

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

The problem is clearly in the definition of x and y,

x = ind / data.N
y = ind % data.N

given that ind is a float, x will be a float and for sure it cannot be used as index for the adjacency matrix.
Can you tell where I might be going wrong or why is this caused? Moreover, which is the logic behind this assignment?

Thanks

No module named 'rbm'

sorry,what is rbm ,i can't find the module rbm,please help me

What is a highly nonlinear network structure？

Hello, I would like to ask what you mean of the highly nonlinear network structure mentioned in the paper?Thank you.

Is the vector order of the embedding.mat file the same as the node ranking（0.1.2...）？

Does the code support the weight graph?

the input format just support
node_int1 node_int2

not support
node_int1 node_int2 weight

my loss is always bigger than 500000,Is that right?

I checked loss in the iteration, and found that it is bigger than 500000 when i set the batch=32,and it is twice when i set batch=64. And the loss doesn't reduce after some few iteration.thanks

how to find the corresponding node of the generated embedding? I want to find the embedding and it's corresponding node

Index error in utils/utils.py

Hi there, thanks for great work!

I am trying to run this:

python main.py -c config/ca-grqc.ini

Traceback (most recent call last):
File "/home/netra/mywork/Assessment/SDNE/main.py", line 95, in
print(fout, epochs, "reconstruction:", check_reconstruction(embedding, train_graph_data, config.check_reconstruction))
File "/home/netra/mywork/Assessment/SDNE/utils/utils.py", line 39, in check_reconstruction
precisionK = get_precisionK(embedding, graph_data, np.max(check_index))
File "/home/netra/mywork/Assessment/SDNE/utils/utils.py", line 32, in get_precisionK
if(data.adj_matrix[x].toarray()[0][y] == 1 or x == y):
File "/home/netra/anaconda3/lib/python3.9/site-packages/scipy/sparse/_index.py", line 33, in getitem
row, col = self._validate_indices(key)
File "/home/netra/anaconda3/lib/python3.9/site-packages/scipy/sparse/_index.py", line 138, in _validate_indices
row = self._asindices(row, M)
File "/home/netra/anaconda3/lib/python3.9/site-packages/scipy/sparse/_index.py", line 162, in _asindices
raise IndexError('Index dimension must be <= 2')
IndexError: Index dimension must be <= 2

算法实现中的损失函数与论文中的损失函数的不同

你好，我看了一下代码实现的损失函数貌似和论文中的损失函数有所不同，请问是为什么呢？
为什么有了负采样就不需要正则化了呢/还有self.loss_xxx代表的是什么呢/
#Loss function
self.loss_2nd = get_2nd_loss(self.X, self.X_reconstruct, config.beta)
self.loss_1st = get_1st_loss(self.H, self.adjacent_matriX)
self.loss_xxx = tf.reduce_sum(tf.pow(self.X_reconstruct,2))
# we don't need the regularizer term, since we have nagetive sampling.
#self.loss_reg = get_reg_loss(self.W, self.b)
#return config.gamma * self.loss_1st + config.alpha * self.loss_2nd + config.reg * self.loss_reg`

The parameter you use in SDNE experiment

I try to reproduce the SDNE visualization result using 20ng dataset, according to the paper, I set alpha 0.2 and beta 1 but can not get visualization picture as you get in SDNE paper, my result different groups boundaries is not very clear and each group not very closed. Would you mind tell me the parameter value you set in SDNE visualization experiment or the t-SNE parameter you use in visualization?

Missing a .mat file

When I clone this repo and try to run it on example dataset blogcatalog. It shows me an error about a missing .mat file defined inside the config file.

train_graph_file = GraphData/blogCatalog3-small.mat

关于RBM

您好，读了您的paper感觉十分有意思，现在还没怎么详细看代码，但看到您的model中有一个RBM，请问这个RBM是什么呢？是什么的简写吗？

SDNE on flickr

I successfully generated the embedding of BlogCatalog, but when I modified Config.py for flickr, I got MemoryError:

name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.2155
pciBusID 0000:01:00.0
Total memory: 11.91GiB
Free memory: 11.54GiB
2017-10-15 10:33:38.133688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-10-15 10:33:38.133695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-10-15 10:33:38.133706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
  File "main.py", line 34, in <module>
    model.do_variables_init(graph_data.sample(graph_data.N).X, config.DBN_init)
  File "/home/yba/Documents/SDNE/graph.py", line 76, in sample
    mini_batch.adjacent_matriX = self.adj_matrix[index][:,index]
MemoryError

This is the only code I changed:

        ## graph data
        self.file_path = "GraphData/flickr.txt"
        #self.file_path = "GraphData/ca-Grqc.txt"
        self.label_file_path = "GraphData/flickr-groups.txt"
        ## embedding data
        self.embedding_filename = "embeddingResult/flickr”

Please let me know what else I need to change to make it work.

Thanks!

Does this implementation allow for multiple GPU usage?

It seems that I am having issues with running the code on multiple GPUs. No issues with a single GPU . Is the code set up for multiple GPUs as well?

I cannot run it in PYcharm under windows10, is there any solution??

how is MAP for reconstruction computed in paper?

In the paper, you also reported mean average precision (MAP) on the reconstruction task. Do you compute MAP on a set of held-out links in arxiv and blogcatalog datasets? How is MAP computed as reported in Table 4 of the paper?

Embedding nodes after training?

We are performing a link prediction on a paper citation network.
We want to train on the citations from 1990-1997, and then predict links for papers after 1997. For example, we may want to predict links for papers in 1998.

To effectively do this, do we simply supply all the nodes from 1990-1998, and train on links from 1990-1997, and then perform link prediction? Or do we only supply nodes from 1990-1997, train on links from 1990-1997, and then somehow add the 1998 nodes into the embedding space after training? I believe the example in the SDNE paper does something like the former.

Low performance

Hello, I got a problem(low performance micro=0.14 macro=0.04) when running the SDNE code on blogCatalog Dataset . Set the layer as N-1000-100, alpha =0.2 ,beta =10, reg =1. Could you show your performance on this dataset? Thank you very much .

Can I employ this SDNE to weighted edges?

Hi suanrong,

I find that the sample inputs in the datagraph file are all unweighted edges.
Is it possible that I run this SDNE using weighted edges dataset?

Thank you!

load embeddingResult/blogCatalog_embedding.mat using scipy.io
load node labels from file GraphData/blogCatalog3-groups.txt
feed node embedding and labels into function check_multi_label_classification()

the results is about 10 points less than reported in the paper.

edges[u, v] = 0

Some mistakes should be fixed such as here.