Coder Social home page Coder Social logo

thudm / cogdl Goto Github PK

View Code? Open in Web Editor NEW
1.7K 42.0 313.0 10.03 MB

CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)

Home Page: https://cogdl.ai

License: MIT License

Python 93.24% C++ 2.46% Cuda 4.12% Shell 0.05% Makefile 0.02% C 0.11%
graph-neural-networks pytorch graph-embedding node-classification graph-classification link-prediction leaderboard gnn-model

cogdl's People

Contributors

1049451037 avatar aviczhl2 avatar cenyk1230 avatar fishmingyu avatar huangtinglin avatar hwangyeong avatar icycookies avatar jasmine-yu avatar jkx19 avatar july11 avatar khtee avatar kinseys avatar kwyoke avatar li-ziang avatar lykeven avatar malin2223 avatar qibinc avatar qingfei1 avatar sahandfer avatar sengxian avatar shiguang-guo avatar sofyc avatar somefive avatar spkgyk avatar think2try avatar tiagomantunes avatar wzfhaha avatar xll2001 avatar yaofeng1998 avatar yaoxingcheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cogdl's Issues

Problem with testing graph classification models

Hello,
I have been trying out different graph classification models on the available datasets. Running some models on a number of datasets for this task generates 'RuntimeError: There were no tensor arguments to this function' in the beginning or middle of the training phase. In addition, I haven't been able to observe the accuracy that is provided in the README file from training the given models. So I'm wondering what the problem could be, whether it is a bug or a dependancy problem, since I'm also trying to get the results for my newly added graph classification model. Thank you

ModuleNotFoundError

Traceback (most recent call last):
File "train.py", line 14, in
from cogdl import options
ModuleNotFoundError: No module named 'cogdl'

What's the matter, please

Add GraphSAGE (sample) on Reddit

The current GraphSAGE implementation sample neighbors for every nodes in one batch. This only works for toy datasets (e.g., Cora), not for larger datasets (e.g., Reddit)

wrong negative sampling in LINE? and some other suggestions

In LINE's code, there might be a minor error in negative sampling part (if I understand it correctly).
https://github.com/THUDM/cogdl/blob/a69a969020b8aa41cfcd8ac54511984bc5b32d62/cogdl/models/emb/line.py#L133-L137

If index j for negative samples start at 1, then the number of negative sample should be self.negative-1. For example, if you set self.negative=5, 0 is not the negative sample (since it is skipped in the for loop) and 1,2,3,4 are negative samples drawn by alias algorithm. And I also checked original implementation by Jian Tang, the range of negative sampling is set as negative+1 (please see below).
https://github.com/tangjianpku/LINE/blob/d5f840941e0f4026090d1b1feeaf15da38e2b24b/linux/line.cpp#L332-L348

Some other suggestions:

  1. It seems that cora, citeseer, pubmed are not supported in current version. I tried to run on these datasets, but error occurs saying that datasets are not supported. It would be better to provide documentation on how to run tasks on these datasets, or how to add new datasets (e.g., data formats, paths, naming conventions) by myself to run tasks
  2. It would be better if there are more details on: stats of datasets (e.g., labeled or not), which datasets are supported for which tasks.
  3. It would be better to provide some docstrings/comments in source code indicating the meaning of some variables (e.g., input, output).

cogdl on linux tesla K40c

HI, my environment is linux, tesla K40c, pytorch1.4, cuda101, python3.7.
When I run python gcn.py, the error information is
Traceback (most recent call last):
File "gcn.py", line 29, in
task = build_task(args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/init.py", line 54, in build_task
return TASK_REGISTRY[args.task](args, dataset=dataset, model=model)
File "/home/hsc/Desktop/wyt/cogdl-master/cogdl/tasks/node_classification.py", line 35, in init
args.num_features = dataset.num_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 117, in num_features
return self.num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 112, in num_node_features
return self[0].num_node_features
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/data/dataset.py", line 189, in getitem
data = data if self.transform is None else self.transform(data)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/transforms/target_indegree.py", line 27, in call
deg = degree(col, data.num_nodes)
File "/home/hsc/Desktop/wyt/anaconda3/envs/pytorch_1.4/lib/python3.7/site-packages/torch_geometric/utils/degree.py", line 20, in degree
out = torch.zeros((num_nodes), dtype=dtype, device=index.device)
RuntimeError: CUDA error: no kernel image is available for execution on the device

K40c has a compute capability of 3.5. Does it not support PyTorch 1.4?
Thanks!

Possible label leakage problem with the link prediction task

This happens for undirected networks like PPI. The link prediction task class

https://github.com/THUDM/cogdl/blob/4ed7838018400377dae9da30017399f56585208f/cogdl/tasks/link_prediction.py#L116

reads the adjacency matrix directly without removing duplicates, this means the the edge list would have (x, y) and (y, x) at the same time for every edge.

(x, y) and (y, x) are referring to the same edge, producing the same cosine similarity for x and y in evaluation later. However, the training-test splitting process treats them as independent edges. As a result, a great portion of the generated test edges are also in the training set, just with a reversed order of two ends.

I have tried resolving this label leakage issue, and I can only get about 0.8 ROC-AUC on PPI instead of over 0.9 reported in your leader board.

Possible bug in cogdl/tasks/node_classification.py

cogdl/tasks/node_classification.py, line 93 may be wrong? it seems that the missing_rate is not used.

if args.missing_rate >= 0:
    if args.model == "sgcpn":
        assert args.dataset in ["cora", "citeseer", "pubmed"]
        dataset.data = preprocess_data_sgcpn(dataset.data, normalize_feature=True, missing_rate=0)
        adj_slice = torch.tensor(dataset.data.adj.size())
        adj_slice[0] = 0
        dataset.slices["adj"] = adj_slice

Optimize Exception Message

Loading CogDL will report Ninja is required to load C++ extensions, which could be traced back to https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/sample.py#L14

Similarly, the following code also has the same problem.
https://github.com/THUDM/cogdl/blob/94b8bc9bf8215f63eb1f0471457ca831817a9bf3/cogdl/operators/spmm.py#L26

It might be better to add extra message to the print(e) to identify the "Ninja" message is thrown here. Since the first line will be reached as soon as cogdl is imported, it might confuse the users that do not use this module.

oagbert model Killed

🐛 Bug

I'm using oagbert for sentence embedding.

To Reproduce

I want to obtain the embedding of a list of sentences ( len(corpus)=124) so I use (as the example) the following code:

tokenizer, bert_model = oagbert()
tokens = tokenizer(corpus, return_tensors="pt", padding=True)
batch_embeddings = bert_model(**tokens)
embeddings = embeddings[1]

Expected behavior

embedding should be the a torch.Size([124, 768]) but instead the bert_model(**tokens) says:

Killed

"missing 1 required positional argument: num_nodes" when running graphsage model

I am sorry to bother you, could you help me? when i run the command:
python scripts/train.py -dt cora --model graphsage -t node_classification, i get this error:

Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/xxx/cogdl/scripts/parallel_train.py", line 34, in main
    result = task.train()
  File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 52, in train
    self._train_step()
  File "/home/xxx/cogdl/cogdl/tasks/node_classification.py", line 80, in _train_step
    self.model(self.data.x, self.data.edge_index)[self.data.train_mask],
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/cogdl/cogdl/models/nn/graphsage.py", line 86, in forward
    x = self.convs[i](x, edge_index_sp)
  File "/home/xxx/anaconda3/envs/gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'num_nodes'

metapath2vec --schema

Hi,
When I choose schema "0-1-0,0-1-2-1-0”, not all nodes are included in walks.
It will throw error:
KeyError: "word '6931' not in vocabulary"
How to solve this problem?Thanks!

Issue about unsupervised_graph_classification

❓ Questions & Help

Hi,I just installed cogdl and tried to run a demo,but I found is seems that unsupervised_graph_classification model and dataset are missing,for example ,gin and infograph
So,maybe something is wrong with my code?
My code is :
experiment(task="unsupervised_graph_classification", dataset="proteins", model="infograph")

Refactor Data, Dataset

Hi,

Our current implementations of data preprocessing and data loading are borrowed from pyg. This part needs refactor before release.

About the performance

Hi, thanks for the great job!
I simplify this project and remove the torch dependency, and plan to implement it under tf.
Now I have finished the algorithms of emb folder. These are pure Python implementations. But my test results is poor. I find the default parameters in codes differ from the readme. So, could you please clarify the parameters of the results displayed on your website?
Thanks a lot.

random.seed

I want to use the function display_data.py, I get the error:

Traceback (most recent call last):
  File "display_data.py", line 75, in <module>
    random.seed(args.seed)
  File "/Users/wangzhikai/.conda/envs/cogdl/lib/python3.7/random.py", line 126, in seed
    super().seed(a)
TypeError: unhashable type: 'list'

Some Advice

1.Adding docs for preparing data

Like what Euler does: Preparing-Data

2.Support exporting embedding for nodes

3.Description for Graph Structure

  • supporting bipartite graph?
  • supervised or unsupervised?

unsupervised_node_classification evaluation

Hi. When evaluating the performance of node classification, why LINE, NetMF, ProNE has the same result every time? For example, if use Wikipedia dataset on NetMF, It's always going to be this,
| ('wikipedia', 'netmf') | 0.4373±0.0000 | 0.4747±0.0000 | 0.4883±0.0000 | 0.4953±0.0000 | 0.5022±0.0000 |

Looking forward to your reply, Thanks!

python scripts/train.py --task node_classification --dataset cora --model gcn

python scripts/train.py --task node_classification --dataset cora --model gcn

Using backend: pytorch
Namespace(cpu=False, dataset=['cora'], device_id=[0], dropout=0.5, enhance=None, hidden_size=64, lr=0.01, max_epoch=500, model=['gcn'], num_classes=None, num_features=None, patience=100, save_dir='.', seed=[1], task='node_classification', weight_decay=0.0005)
Traceback (most recent call last):
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 101, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 29, in main
task = build_task(args)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/init.py", line 49, in build_task
return TASK_REGISTRYargs.task
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in init
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 308, in apply
self[key] = self.apply(item, func)
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch_geometric/data/data.py", line 287, in apply
return func(item)
File "/Users/happywei/Desktop/code/cogdl-master/cogdl/tasks/node_classification.py", line 66, in
self.data.apply(lambda x: x.to(self.device))
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 186, in _lazy_init
_check_driver()
File "/Users/happywei/opt/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 61, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

我配置的环境是基于CPU版本的PyTorch 1.6,请问只有配置CUDA才能运行吗?

python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model gcn gat deepwalk node2vec hope netmf netsmf prone

python scripts/train.py --task unsupervised_node_classification --dataset wikipedia --model gcn gat deepwalk node2vec hope netmf netsmf prone
我是运行的上面的命令
出现如下的错误。环境都安装好了 macOS +torch1.4.0 无cuda,在笔记本本地跑的。
`Traceback (most recent call last):
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 75, in
results = [main(args) for args in variant_args_generator(args, variants)]
File "scripts/train.py", line 26, in main
task = build_task(args)
File "/Users/XXX/cogdl-master/cogdl/tasks/init.py", line 48, in build_task
return TASK_REGISTRYargs.task
File "/Users/XXX/cogdl-master/cogdl/tasks/unsupervised_node_classification.py", line 56, in init
self.model = build_model(args)
File "/Users/XXX/cogdl-master/cogdl/models/init.py", line 108, in build_model
return MODEL_REGISTRY[args.model].build_model_from_args(args)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 71, in build_model_from_args
return cls(args.num_features, args.hidden_size, args.num_classes, args.dropout)
File "/Users/XXX/cogdl-master/cogdl/models/nn/gcn.py", line 76, in init
self.gc1 = GraphConvolution(nfeat, nhid)
File "/Users/zengyujian/cogdl-master/cogdl/models/nn/gcn.py", line 20, in init
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:

  • (torch.device device)
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, torch.device device)
    didn't match because some of the arguments have invalid types: (NoneType, int)
  • (object data, torch.device device)
    didn't match because some of the arguments have invalid types: (NoneType, int)`

怎样获得模型输出而不是准确率?

❓ Questions & Help

Hi,
CogDL是很好的工作包,请问怎样获得结点分类模型对每个结点的预测值?如用GAT在Cora上执行半监督结点分类,希望得到输出尺寸为(2708,7)的预测矩阵,而不是简单的测试精度。

Split package requirements

Not all packages in setup.py of cogdl are actually needed for end-users, and setuptools supports split requirements into install_requires, setup_requires and tests_requires.

It will great to move packages like pytest and spinx out of install_requires, the former should be in tests_requires, and the latter should be removed because doc folder has its own requirements.txt.

how to run unsupervised_node_classification task on graphsage model

Dear author, when I tried run unsupervised_node_classification task on graphsage model, this error is shown as follow:

Traceback (most recent call last):
File "train.py", line 76, in
results = pool.map(main, variant_args_generator())
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/mli/.localpython/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: new() received an invalid combination of arguments - got (NoneType, int), but expected one of:

  • (torch.device device)
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, torch.device device)
    didn't match because some of the arguments have invalid types: (NoneType, int)
  • (object data, torch.device device)
    didn't match because some of the arguments have invalid types: (NoneType, int)

Computer configuration

What are the requirements for the configuration of the computer and the pyTorch version of the library?

Advice: provide a script to download all datasets for offline usage

Currrently, CogDL downloads the missing datasets on the fly. However, some servers are installed in an environment isolated from the Internet. It is inconvenient to use CogDL in such environment.

A script to download all the needed datasets into a local directory will be very helpful. Users can then upload the local directory to the remote server only once.

Thanks for considering the advice.

documents

@neozhangthe1
Hi, I'm very curious about your project. I would like to consult with your the problem what is the difference from the geometrics.
And I found that the documents are incomplete and lace some introduction about how to use. And I expect the update the documents.
thank you!

Running on new dataset

Hi,
I am wondering if this is limited to the datasets you have made available. Is there any documentation for how to format a new graph dataset to test?

I apologize as this is likely not a code base issue, but the slack invite link is broken.

Thanks,
Kayla

Issues loading TU dataset

❓ Questions & Help

@THINK2TRY

You were the last person to make significant edits to the TU dataloader. I am getting an error when loading what I believe to be correctly formatted data. I've been trying to debug for a while now. Any idea what is happening? I've attached the error along with my dataset. No worries if you don't have time to investigate, figured I'd ask in case I am missing something straightforward :)

tu
tu-format-gh.zip

How to setup codgl in cluster!!

Hi,
I am trying to setup cogdl in a virtual environment of a cluster. Can you provide setup instruction to do so!!
Thanks,
Ajay Madhavan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.