Comments (8)
Hi,
DeepWalk generates a fixed number of walks (the default value is 10) starting from each node in the graph, thus every node should appear in some random walks. Can you show me your input graph?
from deepwalk.
Hi,
I saw in the code the default length is 40? But it doesn't matter.
My graph is original Citeseer dataset which downloaded from their website. The total node number should be 3327, but nodes I collected from generated random walks from your code is around 3250, means some nodes is missing.
Is this because the dataset itself? May be the graph of Citeseer isn't all connected? But the output embedding do have 3327 representations, while the walks generated don't cover all nodes, which makes me confused.
from deepwalk.
Even if the input graph is not connected, there should be multiple walks starting from each node. Say we create a graph with 5 nodes and 0 edge as follows, and store it into test.adjlist:
1
2
3
4
5
Run DeepWalk on this graph:
deepwalk --input test.adjlist --output test.embeddings --max-memory-data-size 0
We set max-memory-data-size to 0 to dump the walks to disk.
If you read the walks file, each node still appears exactly 10 times. Thus, I wonder if there is a problem with your code for collecting nodes from random walks.
from deepwalk.
Thanks for your help. It seems that I didn't notice the format
parameter, and manage edgefile with adjlist
para -.-
from deepwalk.
Dear all,
I am getting a similar issue. I run the algorithm on an adjacency matrix of size 84x84 thus representing a directed graph of 84 nodes, using the command:
deepwalk --input graph.csv --output out.csv --undirected false --format adjlist
However, the output matrix is a 23x64.
From the paper I read that the output should be |V| x d, so an 84 x 64 matrix. Am I missing something?
Thank you for your help.
from deepwalk.
Can you paste the content of graph.csv here? Thanks.
from deepwalk.
I copied the content in this pastebin
from deepwalk.
If you are reading in the data as an adjacency list (as specified by --format adjlist
), then the first value in each row should be the source node, while the rest values are the nodes connected to the source node. It seems that your input file does not follow the format of an adjacency list.
from deepwalk.
Related Issues (20)
- default factory changed? HOT 2
- error for the "ValueError: invalid literal for int() with base 10: 'nan'"
- ```concurrent.futures.ProcessPoolExecutor``` may lead to wrong embedding ?
- How to generarte DW in a directed and unconnected graph?
- RuntimeError: dictionary changed size during iteration HOT 1
- About the power-Law distribution figure in the article. HOT 1
- ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec' HOT 3
- How to assess edgelist by using scoring.py?
- ImportError: cannot import name 'Vocab' HOT 2
- nodes number HOT 2
- Make deepwalk usable from within python?
- several code changes during my test
- How to create .mat files for CORA Dataset HOT 1
- Please publish a new release on PyPI HOT 1
- Project dependencies may have API risk issues
- 'NoneType' object has no attribute 'nodes' HOT 1
- TypeError: __init__() got an unexpected keyword argument 'size' HOT 4
- Cannot import deepwalk
- Embedding crowded with some fixed node
- A research for generating PR checklists in Pull Request Template HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepwalk.