reczoo / recbox Goto Github PK

View Code? Open in Web Editor NEW

91.0 3.0 19.0 4.97 MB

A box of core libraries for recommendation model development

License: Apache License 2.0

Python 100.00%

collaborative-filtering sequential-recommendation gnn recommendation candidate-matching two-tower-models

recbox's Introduction

RecZoo

RecZoo: A curated model zoo for recommendation tasks

Matching

No	Model	Publication
1	UltraGCN	Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, Xiuqiang He. UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation, in CIKM 2021.
2	SimpleX	Kelong Mao, Jieming Zhu, Jinpeng Wang, Quanyu Dai, Zhenhua Dong, Xi Xiao, Xiuqiang He. SimpleX: A Simple and Strong Baseline for Collaborative Filtering, in CIKM 2021.

Ranking

No	Model	Publication
1	FinalMLP	Kelong Mao, Jieming Zhu, Liangcai Su, Guohao Cai, Yuru Li, Zhenhua Dong. FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction, in AAAI 2023.
2	FinalNet	Jieming Zhu, Qinglin Jia, Guohao Cai, Quanyu Dai, Jingjie Li, Zhenhua Dong, Ruiming Tang, Rui Zhang. FINAL: Factorized Interaction Layer for CTR Prediction, in SIGIR 2023.
3	RAT	Yushen Li, Jinpeng Wang, Tao Dai, Jieming Zhu, Jun Yuan, Rui Zhang, Shu-Tao Xia. RAT: Retrieval-Augmented Transformer for Click-Through Rate Prediction, in WWW 2024.
4	STEM	Liangcai Su, Junwei Pan, Ximei Wang, Xi Xiao, Shijie Quan, Xihua Chen, Jie Jiang. STEM: Unleashing the Power of Embeddings for Multi-task Recommendation, in AAAI 2024.
5	Helen	Zirui Zhu, Yong Liu, Zangwei Zheng, Huifeng Guo, Yang You. Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization, in WWW 2024.
6	Combined-Pair	Zhutian Lin, Junwei Pan, Shangyu Zhang, Ximei Wang, Xi Xiao, Shudong Huang, Lei Xiao, Jie Jiang. Understanding the Ranking Loss for Recommendation with Sparse User Feedback, in KDD 2024.
7	AdaGIN	Lei Sang, Honghao Li, Yiwen Zhang, Yi Zhang, Yun Yang. AdaGIN: Adaptive Graph Interaction Network for Click-Through Rate Prediction, in TOIS 2024.
8	SimCEN	Honghao Li, Lei Sang, Yi Zhang, Yiwen Zhang. SimCEN: Simple Contrast-enhanced Network for CTR Prediction, in MM 2024.
9	RecSys	Qi Zhang, Jieming Zhu, Jiansheng Sun, Guohao Cai, Ruining Yu, Bangzheng He, Liangbi Li. Enhancing News Recommendation with Real-Time Feedback and Generative Sequence Modeling, in RecSys Challenge Workshop 2024.
10	DCNv3	Honghao Li, Yiwen Zhang, Yi Zhang, Hanwei Li, Lei Sang, Jieming Zhu. DCNv3: Towards Next Generation Deep Cross Network for CTR Prediction, in Arxiv 2024.

Reranking

Pretraining

No	Model	Publication
1	UNBERT	Qi Zhang, Jingjie Li, Qinglin Jia, Chuyuan Wang, Jieming Zhu, Zhaowei Wang, Xiuqiang He. UNBERT: User-News Matching BERT for News Recommendation, in IJCAI 2021.

Personalization

No	Model	Publication
1	PMG	Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi Xiao. PMG: Personalized Multimodal Generation with Large Language Models, in WWW 2024.

recbox's People

Contributors

Stargazers

Watchers

Forkers

acnowa zhyj3038 qianrenjian jepsonwong tonywang-sh byzhang eadon999 techthiyanes xxaxtt gotid qianduanzhishen sun-jye shahjaidev vaibhavpawar05 mtandhj tokkiu tmukande-debug jianhua-chen muratuysal

recbox's Issues

Datasets for SimpleX: Format

Hello,

On running:
python run_param_tuner.py --config Yelp18/SimpleX_yelp18_x0/SimpleX_yelp18_x0_tuner_config.yaml --gpu 0

for instance the errors thrown demand a csv with specific columns whereas the Yelp dataset provided only has train.txt and test.txt.

I tried to first convert train and test to an edge list representation train.csv, test.csv but now there are errors such as KeyError 'corpus_index' which is not clear how to resolve.

Could you please provide the training and test files formatted as the run_param_tuner.py script expects?

At the end of SimpleX paper, you tested the model on more datasets like Amazon-Beauty, Amazon-Movies, Movielens-20M and MillionSongData. Could you please also provide these datasets so that we could reproduce your model for fair comparisons? Thank you

Configuration is missing

Only the Yelp18 configuration exists.
What configuration is required to run AmazonBooks dataset

Error: "generator raised StopIteration"

When I run with the following command: "cd benchmarks; python run_param_tuner.py --config Yelp18/MF_CCL_yelp18_x0/MF_CCL_yelp18_x0_tuner_config.yaml --gpu 0", I get an error at the second epoch.

Also, when enable parallel in evaluate_metrics, the code will get stuck here, so I had to set parallel to False when evalution.

Another problem is the code is quite CPU consuming, I run the code on a 32G memory PC, but the memory rate become 100% during evaluation (paralle is set to False, otherwise it will get stuck as mentioned above).

Any solutions?

Role of query_index, user_id, corpus_index, and item_id

@xpai
Hello. Thank you for your contribution to standardizing RecSys benchmarks. I have a question regarding data preprocessing.

I wonder why the query_index / user_id and corpus_index / item_id are separated. At first, I assumed that the relationship between query-user and corpus-item was a mapping of idx-id, but it doesn't seem to be the case. Could you kindly explain the meaning of query_index, corpus_index, user_id, item_id, and the role of each?

Thank you in advance for your help.

"deem.tensorflow" is missing.

Correspondence between user_id in the model and user_id in the dataset

It seems that "user_id" in model (SimpleX) is not exactly equal to the "user_id" in the dataset (gowalla).
For example, i printed user-item pairs in model training, and one of the positive (user_id,item_id) pair is (1220, 10807).
But it is not found in train.txt of gowalla dataset(I got the gowalla dataset from LightGCN https://github.com/gusye1234/LightGCN-PyTorch/tree/master/data/gowalla)

How can i get the correspondence between user_id in the model and user_id in the dataset (and item id)? Thanks a lot!

1.printing user-item pair

2.got an positive (user_id,item_id) pair (1220, 10807)

3.positive (user_id,item_id) pair (1220, 10807) is not found in gowalla train.txt

What is the config or yaml file for AmazonBooks dataset?

I was able to repeat the MF-CCL results on Yelp18 and Gowalla datasets in table 1 based on the command and yaml file provided.

When I try to run the similar command on AmazonBooks files, I found that sampling time is too much long than your logs.

https://github.com/openbenchmark/BARS/tree/master/candidate_matching/benchmarks/MF_CCL/MF_CCL_amazonbooks_x1

And results are not reasonable.

Could you please help me to reproduce the results of MF-CCL on AmazonBooks datasets.

Thanks,

About the required package

Thanks for the great job.
Please let me know the python version and the required package (and its version).

#typo# one useless space char after "cuda:"

device = torch.device("cuda: " + str(gpu))

refer: https://github.com/xue-pai/TwinModels/blob/master/deem/pytorch/torch_utils.py#L33