Coder Social home page Coder Social logo

rucaibox / recbole-gnn Goto Github PK

View Code? Open in Web Editor NEW
163.0 9.0 37.0 1.02 MB

Efficient and extensible GNNs enhanced recommender library based on RecBole.

License: MIT License

Python 99.96% Shell 0.04%
collaborative-filtering graph-neural-networks recommender-system

recbole-gnn's Issues

New dataset not being detected. Where to save the atomic files?

Describe the bug

我按照这里清晰的说明添加新数据集到RecBole。我已经创建了原子文件,然后按照说明创建了datasetdataloader

但是当我运行新数据集时,出现错误:ValueError: Neither [dataset/data34452] exists in the device nor [data34452] a known dataset name

然而,数据集·data34452·绝对存在。我应该将原子文件保存在哪个位置,以便可以使用RecBole运行我的数据?我已经尝试将其从个人目录移动到RecBole 的Python包目录,但没有成功。

image

To Reproduce
YAML file:


data_path: /data/nicholas/abc/RecBole-GNN-main/recbole_gnn/data/
dataset: data34452

USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating
TIME_FIELD: timestamp

load_col:
    inter: [user_id, item_id, rating, timestamp]
    user: [user_id]
    item: [item_id, category_id, category_level]


eval_args:
    split: {'RS': [8,1,1]}
    group_by: user
    order: RO
  1. your code
    Code for creating the new dataset:
modelname = 'LightGCN'
dataset='data34452'
yaml_path = '/data/nicholas/abc/RecBole-GNN-main/recbole_gnn/data/data34452/data34452.yaml'

if __name__ == '__main__':

    config = Config(model=modelname, dataset='data34452',config_file_list=[yaml_path])
    dataset = create_dataset(config)
    train_data, valid_data, test_data = data_preparation(config, dataset)

  1. script for running

我只需使用·run_recbole_gnn.py·文件在新数据集上运行RecBole,命令为 python run_recbole_gnn.py -m LightGCN -d data34452, 但是我得到了错误: ValueError: Neither [dataset/data34452] exists in the device nor [data34452] a known dataset name.

我正在使用一台Linux机器,PyTorch 2.0,Python 3.11和RecBole 1.1.1。

感谢您帮助我解决这个问题。

[🐛BUG] SGL emb_loss should not be averaged.

In RecBole-GNN SGL, the emb_loss has been averaged:

https://github.com/RUCAIBox/RecBole/blob/169412a91f5e504e620d6f708a62f5a7da42580a/recbole/model/loss.py#LL80C23-L80C23

emb_loss /= embeddings[-1].shape[0]

And In paper author's code: the emb_loss hasn't been averaged:
https://github.com/wujcan/SGL-Torch/blob/c56328817daa6ebd7669cb9c0429a1f0f3dad5b9/util/pytorch/loss.py#L103

for w in weights:
    loss += torch.sum(torch.pow(w, 2))

Although it doesn't affect the model performance.

[🐛BUG] DiffNet可否支持读取user和item的review embdding

描述这个 bug
如题,原始代码中可以读取作者预训练的user_vector.npy和item_vector.npy,但在RecBole中,我似乎无法读取这两个文件,我尝试将参数中的pretrained_review改为true,但会报这个错:
Traceback (most recent call last):
File "run_recbole_gnn.py", line 15, in
run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "C:\Users\Administrator\Desktop\传输\social\RecBole-GNN-main\recbole_gnn\quick_start.py", line 37, in run_recbole_gnn
model = get_model(config['model'])(config, train_data.dataset).to(config['device'])
File "C:\Users\Administrator\Desktop\传输\social\RecBole-GNN-main\recbole_gnn\model\social_recommender\diffnet.py", line 68, in init
self.user_review_embedding.weight.data.copy_(self.convertDistribution(dataset.user_feat['user_review_emb']))
TypeError: 'NoneType' object is not subscriptable

代码中没有读取npy文件的部分。我尝试在RecBole的代码里直接修改,但会提示维度不匹配。原始的:
num_users = 17237
num_items = 38342
social dataset中提供的inter文件:
num_users = 17236
num_items = 37379
请问是否可以帮助解决这个问题,非常感谢!

关于数据集参数entity_kg_num_interval跟relation_kg_num_interval,调整了多次没有产生任何作用

应用的是recbole的KGAT模型,使用的数据集是ml-1m,在数据过滤,调节.yaml文件时,发现entity_kg_num_interval跟relation_kg_num_interval这两个参数[0,inf)跟[50,inf),最终对于entity个数,还是relation个数都没有产生任何影响,所以想问一下以下的问题:
1、这两个参数具体是调节什么的?
2、如果这两个参数无法调节entity个数跟relation个数,除了k核过滤还有别的方法进行数据过滤吗

[🐛BUG] Recole -GNN error

Dear author:
We downloaded Recole-gnn and wanted to use the model about GNN, but we encountered some strange problems. At the beginning, we thought it was our own problem, but after nearly two months of searching, we still couldn't determine the problem, so we hope to get your help.
The results of sgl and simgcl are consistent with the results of random prediction. After searching, we found that the gradient disappeared, but for BPR and LightGCN, such a result did not appear. We tried to implement BUIR and SSL4Rec on the basis of Recbole, but they also encountered the situation of gradient disappearing. I have been unable to find out the cause of the error, and I hope to get your help.
In addition, I do not know whether the author plans to implement the following models in the near future: SSL4Rec, BUIR, IMP-GCN. LightGCL,UltraGCN, LR-GCCF, DGCF, XSimGCL, MixGCF, GMCF, etc.
All the best!

sgl-ml-100k
simgcl-ml-100k
LightGCN-ml-100k

Version information:
Desktop (please complete the following information):

  • OS:Linux
  • RecBole: 1.1.1
  • Python: 3.7.3
  • PyTorch: 1.7.1
  • RTX: 1080 12GB
  • cuda: 10.2

Question: batch of training

Hello, I wonder to ask a question about the samples of training.

If we have users = [0, 1] and pos_items = {0: [1, 2], 1: [3, 4]}, in recbole, what data will be generated if num_neg is 1 and sample method is uniform?

To my understaning, we will get four training samples user, pos_item, neg_item like (0, 1, 3), (0, 2, 4), (1, 3, 1), (1, 4, 2), we will use all rating data. Do I understand correctly?

But in LightGCN and NGCF, SGL, i find that they all use the way to random select users, and then generate samples for training. for examples, if num rating is 100000, LightGCN will randomly generate 100000 users, and then sample pos ang neg for them. But in recbole, we will directly use the 100000 rating as user and pos.

I am little confused by the difference of them. Which one is usually used and which one is more reasonable?

Thanks in advance!

[🐛BUG] 您好,使用序列推荐模型中的TAGNN模型时报错 MAC环境下

Traceback (most recent call last):
File "/Users/jason_wang/Desktop/RecBole-GNN-main/run_recbole_gnn.py", line 15, in
run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "/Users/jason_wang/Desktop/RecBole-GNN-main/recbole_gnn/quick_start.py", line 33, in run_recbole_gnn
train_data, valid_data, test_data = data_preparation(config, dataset)
File "/Users/jason_wang/Desktop/RecBole-GNN-main/recbole_gnn/utils.py", line 115, in data_preparation
dataloaders = load_split_dataloaders(config)
File "/Users/jason_wang/Desktop/RecBole-master/recbole/data/utils.py", line 78, in load_split_dataloaders
with open(saved_dataloaders_file, 'rb') as f:
TypeError: expected str, bytes or os.PathLike object, not Config
(RecBole-master1) jason_wang@Jason-WangdeMBP RecBole-GNN-main %

还请问是否有其他序列推荐可用的数据集,因为科研需求,想问下除了diginetica-not-merged数据集其他两个是否可用,谢谢!

运行srgnn时报错

敬爱的工作者您好!我在运行srgnn时报错,猜测应该是main函数中trainer和interaction使用的是recbole而非recbole_gnn框架下的问题,但我不知道如何进行修改补充,辛苦您为我答疑解惑,期待您的回复,万分感谢!

main函数:
from recbole_gnn.config import Config
from recbole_gnn.utils import create_dataset, data_preparation
from recbole.utils import init_logger, init_seed
from recbole_gnn.utils import set_color, get_trainer
from logging import getLogger

from test import SRGNN

if name == 'main':
# configurations initialization
config = Config(
model=SRGNN,
dataset='diginetica',
config_file_list=['config.yaml', 'config_model.yaml'],
)
init_seed(config['seed'], config['reproducibility'])

# logger initialization
init_logger(config)
logger = getLogger()

logger.info(config)

# dataset filtering
dataset = create_dataset(config)
logger.info(dataset)

# dataset splitting
train_data, valid_data, test_data = data_preparation(config, dataset)

model = SRGNN(config, train_data.dataset).to(config['device'])

logger.info(model)

# trainer loading and initialization
# trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)


# model training
best_valid_score, best_valid_result = trainer.fit(
    train_data, valid_data, saved=True, show_progress=config['show_progress']
)

# model evaluation
test_result = trainer.evaluate(test_data, load_best_model=True, show_progress=config['show_progress'])

logger.info(set_color('best valid result:', 'yellow') + f': {best_valid_result}')
logger.info(set_color('test result:', 'yellow') + f': {test_result}')

config.yaml与config_model.yaml均使用框架中提供的参数。

运行结果:
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = dataset/diginetica
checkpoint_dir = saved
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 500
train_batch_size = 4096
learner = adam
learning_rate = 0.001
neg_sampling = None
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'LS': 'valid_and_test'}, 'mode': 'full', 'order': 'TO', 'group_by': 'user'}
repeatable = True
metrics = ['MRR', 'Precision']
topk = [10, 20]
valid_metric = MRR@10
valid_metric_bigger = True
eval_batch_size = 2000
metric_decimal_place = 5

Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = session_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = {'inter': ['session_id', 'item_id', 'timestamp']}
unload_col = None
unused_col = None
additional_feat_suffix = None
rm_dup_inter = None
val_interval = None
filter_inter_by_user_or_item = True
user_inter_num_interval = [5,inf)
item_inter_num_interval = [5,inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 20
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None

Other Hyper Parameters:
wandb_project = recbole
require_pow = False
embedding_size = 64
step = 1
loss_type = CE
MODEL_TYPE = ModelType.SEQUENTIAL
gnn_transform = sess_graph
train_neg_sample_args = {'strategy': 'none'}
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
device = cpu
eval_neg_sample_args = {'strategy': 'full', 'distribution': 'uniform'}

06 Mar 13:17 INFO diginetica
The number of users: 72014
Average actions of users: 8.060905669809618
The number of items: 29454
Average actions of items: 19.70902794282416
The number of inters: 580490
The sparsity of the dataset: 99.97263260088765%
Remain Fields: ['session_id', 'item_id', 'timestamp']
06 Mar 13:17 INFO Constructing session graphs.
100%|██████████| 364451/364451 [00:33<00:00, 11034.37it/s]
06 Mar 13:18 INFO Constructing session graphs.
100%|██████████| 72013/72013 [00:07<00:00, 9464.61it/s]
06 Mar 13:18 INFO Constructing session graphs.
100%|██████████| 72013/72013 [00:07<00:00, 9047.17it/s]
06 Mar 13:18 INFO SessionGraph Transform in DataLoader.
06 Mar 13:18 INFO SessionGraph Transform in DataLoader.
06 Mar 13:18 INFO SessionGraph Transform in DataLoader.
06 Mar 13:18 INFO [Training]: train_batch_size = [4096] negative sampling: [{'strategy': 'none'}]
06 Mar 13:18 INFO [Evaluation]: eval_batch_size = [2000] eval_args: [{'split': {'LS': 'valid_and_test'}, 'mode': 'full', 'order': 'TO', 'group_by': 'user'}]
06 Mar 13:18 INFO SRGNN(
(item_embedding): Embedding(29454, 64, padding_idx=0)
(gnncell): SRGNNCell(
(incomming_conv): SRGNNConv()
(outcomming_conv): SRGNNConv()
(lin_ih): Linear(in_features=128, out_features=192, bias=True)
(lin_hh): Linear(in_features=64, out_features=192, bias=True)
)
(linear_one): Linear(in_features=64, out_features=64, bias=True)
(linear_two): Linear(in_features=64, out_features=64, bias=True)
(linear_three): Linear(in_features=64, out_features=1, bias=False)
(linear_transform): Linear(in_features=128, out_features=64, bias=True)
(loss_fct): CrossEntropyLoss()
)
Trainable parameters: 1947264
Train 0: 0%| | 0/89 [00:00<?, ?it/s]
Traceback (most recent call last):
File "E:/ADACONDA/envs/pytorch/pythonproject_test/Next Work/RecBole-GNN-main/main.py", line 41, in
best_valid_score, best_valid_result = trainer.fit(
File "E:\ADACONDA\envs\pytorch\lib\site-packages\recbole\trainer\trainer.py", line 335, in fit
train_loss = self._train_epoch(train_data, epoch_idx, show_progress=show_progress)
File "E:\ADACONDA\envs\pytorch\lib\site-packages\recbole\trainer\trainer.py", line 181, in _train_epoch
losses = loss_func(interaction)
File "E:\ADACONDA\envs\pytorch\pythonproject_test\Next Work\RecBole-GNN-main\test.py", line 105, in calculate_loss
x = interaction['x']
File "E:\ADACONDA\envs\pytorch\lib\site-packages\recbole\data\interaction.py", line 131, in getitem
return self.interaction[index]
KeyError: 'x'

执行超参数搜索时KeyError报错

ERROR:hyperopt.fmin:job exception: 'model'

0%| | 0/12 [1:01:04<?, ?trial/s, best loss=?]
Traceback (most recent call last):
File "run_hyper.py", line 26, in
main()
File "run_hyper.py", line 18, in main
hp.run()
File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/recbole/trainer/hyper_tuning.py", line 411, in run
fmin(
File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 553, in fmin
rval.exhaust()
File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 356, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 292, in run
self.serial_evaluate()
File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/fmin.py", line 170, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/hyperopt/base.py", line 907, in evaluate
rval = self.fn(pyll_rval)
File "/opt/conda/envs/siton_env/lib/python3.8/site-packages/recbole/trainer/hyper_tuning.py", line 349, in trial
result_dict["model"],
KeyError: 'model'

RT,报错信息如上,跑的模型是Hmlet,输入指令如下:

python run_hyper.py --model='HMLET' --dataset='ml-1m' --config_files='ml-1m.yaml' --params_file=Hmlet.hyper

`model_name` [xxx] is not the name of an existing model.

我想要使用训练好的模型来预测,报错内容显示模型不存在

代码如下,照抄的recbole/run_example/case_study_example.py,模型是自己训练好的

import torch
from recbole.utils.case_study import full_sort_topk, full_sort_scores
from recbole.quick_start import load_data_and_model

config, model, dataset, train_data, valid_data, test_data = load_data_and_model(
    model_file="saved/XSimGCL-Jan-21-2024_02-59-55.pth",
)  # Here you can replace it by your model path.

    # uid_series = np.array([1, 2])  # internal user id series
    # or you can use dataset.token2id to transfer external user token to internal user id
uid_series = dataset.token2id(dataset.uid_field, ["1"])

topk_score, topk_iid_list = full_sort_topk(
    uid_series, model, test_data, k=10, device=config["device"]
)
print(topk_score)  # scores of top 10 items
print('top@10_item_ids',topk_iid_list)  # internal id of top 10 items
external_item_list = dataset.id2token(dataset.iid_field, topk_iid_list.cpu())
print(external_item_list)  # external tokens of top 10 items
print()

score = full_sort_scores(uid_series, model, test_data, device=config["device"])
print(score)  # score of all items
print(
    score[0, dataset.token2id(dataset.iid_field, ["242", "302"])]
)  # score of item ['242', '302'] for user '196'.

报错如下

ValueError: `model_name` [XSimGCL] is not the name of an existing model.

实验环境):

  • 操作系统: Linux
  • RecBole:1.2.0
  • Python:3.9
  • PyTorch:2.1
  • cuda:12.1

recbole这个仓库下有两个issue,分别是issue1issue2,给出的回复均为版本不是最新
我通过命令conda install -c aibox recbole下载的版本如下
recbole 1.2.0 py39_0 aibox
请问报错的原因也是"代码非最新版本"吗?

有关Recbole_GNN框架中自定义模型的问题

您好,我在Recbole_GNN框架下运行自定义模型时报错【interaction中没有 x 这个key】
数据集和配置文件是CORE论文提供的。model 类中的myGCEGNN模型就是原GCEGNN模型,只是把代码复制出来命名model。
我调试了一下,发现main文件没有调用到 recbole_gnn\data\dataset.py 这个文件,这个文件是处理interaction的。

=== main文件代码 ===
from logging import getLogger
from recbole.utils import init_logger, init_seed, set_color
from recbole_gnn.config import Config
from recbole_gnn.utils import create_dataset, data_preparation,get_model, get_trainer
from model import myGCEGNN

if name == 'main':
# configurations initialization
config = Config(
model=myGCEGNN,
dataset='diginetica',
config_file_list=['config.yaml', 'config_model.yaml'],
)
init_seed(config['seed'], config['reproducibility'])
# logger initialization
init_logger(config)
logger = getLogger()
logger.info(config)
# dataset filtering
dataset = create_dataset(config)
logger.info(dataset)
# dataset splitting
train_data, valid_data, test_data = data_preparation(config, dataset)
# model loading and initialization
model = myGCEGNN(config, train_data.dataset).to(config['device'])
logger.info(model)
# trainer loading and initialization
trainer = get_trainer(config['MODEL_TYPE'], config['model'])(config, model)
# model training
best_valid_score, best_valid_result = trainer.fit(
train_data, valid_data, saved=True, show_progress=config['show_progress']
)
# model evaluation
test_result = trainer.evaluate(test_data, load_best_model=True, show_progress=config['show_progress'])
logger.info(set_color('best valid ', 'yellow') + f': {best_valid_result}')
logger.info(set_color('test result', 'yellow') + f': {test_result}')

复现最佳超参问题

您好!
下面是我用SGL在ml-1m上运行的超参数,已经按照您提供的最佳超参来设置
但是并不能达到您提供的结果
想向您请教下原因
(SimGCL和NCL也出现类似问题)
type:"ED"
n_layers:3
ssl_tau:0.5
reg_weight:1e-4
ssl_weight:0.005
drop_ratio:0.1
embedding_size:64
learning_rate:0.002

metrics:["Recall","NDCG","MRR"]
topk:[10,20,50]

您给的结果:
('recall@10',0.1889)('ndcg@10',0.2505)
我们得到的结果:
('recall@10',0.1762)('ndcg@10',0.2652)

关于不同数据集下超参数搜索

您好!在yelp20数据集下,想请教下像NGCF,LightGCN,SGL这类模型进行超参数搜索的时候,一般需要设置多少个epochs才能保证它们收敛呢

关于数据集的问题

您好,我在百度网盘中下载的的Yelp2018、Amazon-Book、Gowalla数据集都要比论文中所报告的统计数值大很多,请问怎么获得与论文中相同版本的数据集呢,十分感谢~

关于性能的建议

您好,我想询问一下关于模型性能的建议。
在你提供的文件中,BPR,LightGCN,SimGCL在ML-10M的性能如下

Method Recall@10 MRR@10 NDCG@10 Hit@10 Precision@10
BPR 0.1776 0.4187 0.2401 0.7199 0.1779
LightGCN 0.1861 0.4388 0.2538 0.7330 0.1863
SimGCL 0.2029 0.4550 0.2667 0.7640 0.1933
但是在我自己运行的时候得到如下结果
BPR: test result: OrderedDict([('recall@10', 0.2466), ('mrr@10', 0.4895), ('ndcg@10', 0.2928), ('hit@10', 0.7815), ('precision@10', 0.1962)])
LightGCN: test result: OrderedDict([('recall@10', 0.1853), ('mrr@10', 0.376), ('ndcg@10', 0.2116), ('hit@10', 0.6819), ('precision@10', 0.1406)])
SimGCL: test result: OrderedDict([('recall@10', 0.1273), ('mrr@10', 0.1989), ('ndcg@10', 0.1134), ('hit@10', 0.4899), ('precision@10', 0.0692)])
上述结果都与报告结果存在明显差距。

[💡SUG] DiffNet跑不过LightGCN

您好,看您的官方文档基于lastfm数据集的Evaluation Results中DiffNet跑不过LightGCN,我使用RecBole在FilmTrust和CiaoDVD数据集上的结果也是DiffNet不如LightGCN,请问这样的结果的原因是什么呢,会不会在Epinions这样的数据集上DiffNet会优于LightGCN呢,谢谢解答!

[💡SUG] 請問會出DiffNet++的解決方案嗎

您好。感謝科研人員對此系統的開發與維護。

言歸正傳,我發現DiffNet++網絡上暫時沒有任何開放的代碼,除了原作者平臺的。但是,我很沮喪地發現原作者的代碼并沒有用torch編寫並且存在一定冗餘。請問能否開發Diffnet++的torch版,精簡版。

謝謝。

[🐛BUG] 在ML-100K上进行序列推荐,使用uni100评测标准报错

描述这个 bug
好像对于uni100这个标准的支持有点问题,非常的奇怪。不仅100K,ML-1M等等也有相同问题。

如何复现

  1. git clone https://github.com/RUCAIBox/RecBole.git
  2. cd RecBole
  3. pip install -e .
  4. git clone https://github.com/RUCAIBox/RecBole-GNN.git
  5. cd RecBole-GNN
  6. 使用官方给出的diginetica的yaml文件,session_id改成user_id,评测mode改成uni100
  7. python run_recbole_gnn.py --config_files=test.yaml --dataset=ml-100k --model=SRGNN
  8. 然后就可以看见各种各样奇特的错误了,我做过很多修改的尝试不过好像都不太容易解决这个问题。EvalDataloader的cat_interaction那里会报错。另外,我试过直接跑测试脚本,发现所有的序列推荐模型好像都过不了测试的样子……

实验环境(请补全下列信息):

  • 操作系统: Ubuntu 22.04
  • RecBole 版本 和 master branch 一样
  • Python 版本 3.9
  • PyTorch 版本 1.12
  • cudatoolkit 版本 11.6

Question: About LightGCN eval mode.

Hello, thanks for your wonderful work! I have a question about LightGCN eval mode.

To my understanding, in recbole, we have eval methods like full, unixxx like https://recbole.io/docs/user_guide/config/evaluation_settings.html.

If we use full eval, we will call function def full_sort_predict(self, interaction): to get users with all items ratings and get metrics like recall, mrr and so on. If we use unixxx eval, we will call function def predict(self, interaction): and get users with specific items other than all items ratings and get metrics. Do I understand correctly?

But in https://github.com/RUCAIBox/RecBole-GNN/blob/main/recbole_gnn/model/general_recommender/lightgcn.py#L123. If i want to use full eval to evaluate my model, will we always use the previous embedding of the model and get the same metrics in every epoch? Or do I misunderstanding something?

    def full_sort_predict(self, interaction):
        user = interaction[self.USER_ID]
        if self.restore_user_e is None or self.restore_item_e is None:
            self.restore_user_e, self.restore_item_e = self.forward()
        # get user embedding from storage variable
        u_embeddings = self.restore_user_e[user]

        # dot with all item embedding to accelerate
        scores = torch.matmul(u_embeddings, self.restore_item_e.transpose(0, 1))

        return scores.view(-1)

[💡SUG] Multi behavior dataset usage

Hi, thanks for your hard work! The repo is quite useful and inspiring.
As there're many multi-behavior recommendation models based on GNN, I wonder is there any plan to support multi-behavior dataset loading in this repo and how to use it?
Thanks for your quick reply.

[🐛BUG] MHCN在Douban数据集上并没有复现出原论文的效果

您好,我使用MHCN在Douban数据集上并没有跑出原论文的效果,原论文对Douban数据集的预处理是删除掉评分小于4的item,原论文的结果是:recall@10:0.06556、ndcg@10:0.20694、precision@10:0.18283
但是我跑的结果是recall@10:0.0485、ndcg@10:0.1397、precision@10:0.1247
跑不过lightGCN,recall@10:0.0654、ndcg@10:0.1968、precision@10:0.171
想请教一下原因是什么,感谢指导

[🐛BUG] 关于lightgcn实验部分Laplace正则化的一个疑问。

问题描述
lightgcn源代码中对于度为0的点是设置为inf,也就是正则化后变为0,但我看咱这个实现的代码中这些度为0的节点正则化后为1,就这种差异想咨询下。

屏幕截图
上为当前库的实现截图,下为lightgcn中的实现
image
image

如何指定自己的数据集

hi,您好:

我参照着github首页的执行如下命令:
python run_recbole_gnn.py -m [model] -d [dataset]

我想修改dataset的默认项“ml-100k”, 但是我不知道如何设置dataset项。

所以想请问下如何设置自己的数据集。

谢谢!

[🐛BUG] MHCN每次运行的结果不一样

您好,我在运行社会化推荐中的MHCN时,设置了reproducibility = True,seed = 2020,但是发现每次运行结果不一样,想请教一下原因在哪里,谢谢!

希望支持NegSampleEvalDataLoader

RecBole-GNN比RecBole快很多,但是不支持负采样评估,希望支持NegSampleEvalDataLoader,可以更方便进行比较论baseline。

[🐛BUG] 社交推荐数据集下载以后,没有.net文件

你好,我发现一个问题,使用本项目脚本下载的数据集,用于社交推荐以后,没有.net文件,请问是什么原因?
运行 python run_recbole_gnn.py -m MHCN -d lastfm

ValueError: [lastfm.net] not found in [dataset/lastfm].

[🐛BUG] 有个空指针异常能帮忙看看吗?

总是报错这个问题:
Traceback (most recent call last):
File "/home/izuna/zyx/RecBole-GNN-main/run_recbole_gnn.py", line 19, in
run_recbole_gnn(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
File "/home/izuna/zyx/RecBole-GNN-main/recbole_gnn/quick_start.py", line 48, in run_recbole_gnn
best_valid_score, best_valid_result = trainer.fit(
File "/home/izuna/zyx/rec/recbole/trainer/trainer.py", line 466, in fit
valid_score, valid_result = self._valid_epoch(
File "/home/izuna/zyx/rec/recbole/trainer/trainer.py", line 285, in _valid_epoch
valid_result = self.evaluate(
File "/opt/miniconda/envs/bole/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/izuna/zyx/rec/recbole/trainer/trainer.py", line 618, in evaluate
interaction, scores, positive_u, positive_i = eval_func(batched_data)
File "/home/izuna/zyx/rec/recbole/trainer/trainer.py", line 556, in _neg_sample_batch_eval
batch_user_num = positive_u[-1] + 1
TypeError: 'NoneType' object is not subscriptable

我的配置文件如下:
embedding_size: 64
leakyrelu_alpha: 0.2
dropout_local: 0.
dropout_global: 0.5
dropout_gcn: 0.
loss_type: CE
gnn_transform: sess_graph

global

build_global_graph: True
sample_num: 12
hop: 1
field: ~

topk: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
use_gpu: True
gpu_id: 0 # 根据实际情况设置 GPU 设备编号
seed: 2

dataset config

field_separator: " " #指定数据集field的分隔符
seq_separator: " " #指定数据集中token_seq或者float_seq域里的分隔符
USER_ID_FIELD: user_id #指定用户id域
ITEM_ID_FIELD: item_id #指定物品id域
RATING_FIELD: rating #指定打分rating域
#指定从什么文件里读什么列,这里就是从ml-1m.inter里面读取user_id, item_id, rating, timestamp这四列
load_col:
inter: [user_id, item_id, timestamp, rating]
TIME_FIELD: timestamp

training settings

#epochs: 2000 #训练的最大轮数
epochs: 2000 #训练的最大轮数
train_batch_size: 2048
learner: adam #使用的pytorch内置优化器
learning_rate: 0.01 #学习率
training_neg_sample_num: 0 #负采样数目
LABEL_FIELD: label #指定标签域
eval_type: 2

关于运行SRGNN在douban和yoochoose数据下评价指标为0

作者您好,关于上述问题的运行结果如下:
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 2020
state = INFO
reproducibility = True
data_path = dataset/douban
checkpoint_dir = saved
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 500
train_batch_size = 4096
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'none', 'sample_num': 'none', 'alpha': 'none', 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4

Evaluation Hyper Parameters:
eval_args = {'split': {'LS': 'valid_and_test'}, 'order': 'TO', 'group_by': 'user', 'mode': {'valid': 'full', 'test': 'full'}}
repeatable = True
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk = [10]
valid_metric = MRR@10
valid_metric_bigger = True
eval_batch_size = 4096
metric_decimal_place = 4

Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = None
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = {'inter': ['user_id', 'item_id', 'rating', 'timestamp', 'likes_num']}
unload_col = None
unused_col = None
additional_feat_suffix = None
rm_dup_inter = None
val_interval = None
filter_inter_by_user_or_item = True
user_inter_num_interval = [0,inf)
item_inter_num_interval = [0,inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = None
normalize_field = None
normalize_all = None
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None

Other Hyper Parameters:
worker = 0
wandb_project = recbole
shuffle = True
require_pow = False
enable_amp = False
enable_scaler = False
transform = None
embedding_size = 64
step = 1
loss_type = CE
numerical_features = []
discretization = None
kg_reverse_r = False
entity_kg_num_interval = [0,inf)
relation_kg_num_interval = [0,inf)
MODEL_TYPE = ModelType.SEQUENTIAL
gnn_transform = sess_graph
training_neg_sample_num = 0
eval_setting = TO_LS,full
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
single_spec = True
local_rank = 0
device = cuda
valid_neg_sample_args = {'distribution': 'uniform', 'sample_num': 'none'}
test_neg_sample_args = {'distribution': 'uniform', 'sample_num': 'none'}

C:\Users\HP\anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\dataset\dataset.py:648: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

feat[field].fillna(value=0, inplace=True)
C:\Users\HP\anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\data\dataset\dataset.py:650: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

feat[field].fillna(value=feat[field].mean(), inplace=True)
20 Mar 16:34 INFO douban
The number of users: 738701
Average actions of users: 2.8767510491403816
The number of items: 29
Average actions of items: 75894.85714285714
The number of inters: 2125056
The sparsity of the dataset: 90.08018222481785%
Remain Fields: ['user_id', 'item_id', 'rating', 'timestamp', 'likes_num']
20 Mar 16:36 INFO Constructing session graphs.
100%|██████████| 1038965/1038965 [01:56<00:00, 8952.76it/s]
20 Mar 16:38 INFO Constructing session graphs.
100%|██████████| 145071/145071 [00:15<00:00, 9590.91it/s]
20 Mar 16:38 INFO Constructing session graphs.
100%|██████████| 202320/202320 [00:23<00:00, 8537.58it/s]
20 Mar 16:38 INFO SessionGraph Transform in DataLoader.
20 Mar 16:38 INFO SessionGraph Transform in DataLoader.
20 Mar 16:38 INFO SessionGraph Transform in DataLoader.
20 Mar 16:38 INFO [Training]: train_batch_size = [4096] negative sampling: [{'distribution': 'none', 'sample_num': 'none', 'alpha': 'none', 'dynamic': False, 'candidate_num': 0}]
20 Mar 16:38 INFO [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'LS': 'valid_and_test'}, 'order': 'TO', 'group_by': 'user', 'mode': {'valid': 'full', 'test': 'full'}}]
20 Mar 16:38 INFO SRGNN(
(item_embedding): Embedding(29, 64, padding_idx=0)
(gnncell): SRGNNCell(
(incomming_conv): SRGNNConv(
(lin): Linear(in_features=64, out_features=64, bias=True)
)
(outcomming_conv): SRGNNConv(
(lin): Linear(in_features=64, out_features=64, bias=True)
)
(lin_ih): Linear(in_features=128, out_features=192, bias=True)
(lin_hh): Linear(in_features=64, out_features=192, bias=True)
)
(linear_one): Linear(in_features=64, out_features=64, bias=True)
(linear_two): Linear(in_features=64, out_features=64, bias=True)
(linear_three): Linear(in_features=64, out_features=1, bias=False)
(linear_transform): Linear(in_features=128, out_features=64, bias=True)
(loss_fct): CrossEntropyLoss()
)
Trainable parameters: 64064
Train 0: 100%|████████████████████████| 254/254 [03:50<00:00, 1.10it/s, GPU RAM: 0.44 G/2.00 G]
20 Mar 16:42 INFO epoch 0 training [time: 230.78s, train loss: 651.7647]
Evaluate : 100%|██████████████████████████| 36/36 [00:14<00:00, 2.54it/s, GPU RAM: 0.44 G/2.00 G]
C:\Users\HP\anaconda3\envs\pytorch-gpu\lib\site-packages\recbole\evaluator\base_metric.py:78: RuntimeWarning: Mean of empty slice.
avg_result = value.mean(axis=0)
C:\Users\HP\anaconda3\envs\pytorch-gpu\lib\site-packages\numpy\core_methods.py:184: RuntimeWarning: invalid value encountered in divide
ret = um.true_divide(
20 Mar 16:42 INFO epoch 0 evaluating [time: 14.26s, valid_score: nan]
20 Mar 16:42 INFO valid result:
recall@10 : nan mrr@10 : nan ndcg@10 : nan hit@10 : nan precision@10 : nan
Train 1: 100%|████████████████████████| 254/254 [02:55<00:00, 1.45it/s, GPU RAM: 0.44 G/2.00 G]
20 Mar 16:45 INFO epoch 1 training [time: 175.20s, train loss: 571.7500]
Evaluate : 100%|██████████████████████████| 36/36 [00:09<00:00, 3.79it/s, GPU RAM: 0.44 G/2.00 G]
20 Mar 16:46 INFO epoch 1 evaluating [time: 9.54s, valid_score: nan]
20 Mar 16:46 INFO valid result:
recall@10 : nan mrr@10 : nan ndcg@10 : nan hit@10 : nan precision@10 : nan
Train 2: 100%|████████████████████████| 254/254 [02:54<00:00, 1.46it/s, GPU RAM: 0.44 G/2.00 G]
20 Mar 16:48 INFO epoch 2 training [time: 174.05s, train loss: 562.1718]
Evaluate : 100%|██████████████████████████| 36/36 [00:08<00:00, 4.31it/s, GPU RAM: 0.44 G/2.00 G]
20 Mar 16:49 INFO epoch 2 evaluating [time: 8.39s, valid_score: nan]
20 Mar 16:49 INFO valid result:
recall@10 : nan mrr@10 : nan ndcg@10 : nan hit@10 : nan precision@10 : nan

如何复现
yaml文件如下:

model config

embedding_size: 64
step: 1
loss_type: 'CE'
gnn_transform: sess_graph

dataset config

field_separator: "\t" #指定数据集field的分隔符
seq_separator: " " #指定数据集中token_seq或者float_seq域里的分隔符
USER_ID_FIELD: user_id #指定用户id域
ITEM_ID_FIELD: item_id #指定物品id域
RATING_FIELD: rating #指定打分rating域
TIME_FIELD: timestamp #指定时间域
NEG_PREFIX: neg_ #指定负采样前缀
LABEL_FIELD: label #指定标签域
ITEM_LIST_LENGTH_FIELD: item_length #指定序列长度域
LIST_SUFFIX: _list #指定序列前缀
MAX_ITEM_LIST_LENGTH: 50 #指定最大序列长度
POSITION_FIELD: position_id #指定生成的序列位置id
#指定从什么文件里读什么列,这里就是从ml-1m.inter里面读取user_id, item_id, rating, timestamp这四列,剩下的以此类推
load_col:
inter: [user_id, item_id, rating, timestamp,likes_num]

training settings

epochs: 500 #训练的最大轮数
train_batch_size: 4096 #训练的batch_size
learner: adam #使用的pytorch内置优化器
learning_rate: 0.001 #学习率
training_neg_sample_num: 0 #负采样数目
eval_step: 1 #每次训练后做evalaution的次数
stopping_step: 10 #控制训练收敛的步骤数,在该步骤数内若选取的评测标准没有什么变化,就可以提前停止了

evalution settings

eval_setting: TO_LS,full #对数据按时间排序,设置留一法划分数据集,并使用全排序
metrics: ["Recall", "MRR","NDCG","Hit","Precision"] #评测标准
valid_metric: MRR@10 #选取哪个评测标准作为作为提前停止训练的标准
eval_batch_size: 4096 #评测的batch_size

**实验环境(请补全下列信息

  • 操作系统: Windows
  • RecBole 版本 0.2.0
  • Python 版本 3.9
  • PyTorch 版本 2.1.1
    我不知道应该从何处进行解决,麻烦作者能够帮忙解决一下,万分感谢!

[🐛BUG] 关于social recommendation模型SEPT占用显存非常大的疑惑

在yelp1数据集中跑SEPT模型,所需要的显存容量居然高于24Gb导致显存溢出,而MHCN模型只需要1GB不到的显存,这显然是不合理的,想请问下是否复现代码有错误或者是该模型确实需要非常大的显存?现在只能在小数据集lastfm上能够运行该模型。非常期待作者是否有解决该问题的建议,是否需要多张卡并行运算才能运行该模型呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.