One for All, All for One

About The Project

Multi-target Cross-domain recommendation with implicit feedback data in each domain. Paper under submission to CIKM-22, "One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation".

Code Structure

One-For-All   
├── README.md                                 Read me file
├── bert                                      Bert modules and Transformer layers
├── data                                      Demo data  
├── dataset                                   Data set and processing methods
│  ├── check_sparsity.ipynb                   Calculate data set distribution for all domains.   
│  ├── multi-domain_amazon.ipynb              Demo on how to process the Amazon multiple data sets for MTCDR   
├── utils                                     Utilities, e.g. early stop.
│  ├── __init__.py                            Module init 
│  ├── data.py                                All data loaders 
│  ├── eval_metrics.py                        Evaluation metrics 
│  ├── loss_func.py                           Loss functions 
│  ├── pytorchtools.py                        Early stop implementation
│  ├── result_plot.py                         Line plot function 
│  ├── Save_embedding.py                      Functions for saving item/user embedding, preference scores, etc. 
│  ├── scheduler.py                           Learning rate scheduler
│  ├── tools.py                               Others (not in use)
│── Data_loader.py                            Entry for data split and data loader creators      
├── loss_functions.py                         Loss functions
├── main_cross.py                             CAT-ART model, without pre-training of the CAT module 
├── main_cross_pre.py                         CAT-ART model, with pre-training of the CAT module
├── main_cross_scores.py                      Get the averaged attention scores of the ART module on the test set.
├── main_single_mf.py                         Single domain MF model for recommendation
├── models.py                                 Models 
├── run_functions.py                          Run functions, e.g. train step, test step
├── show_result.py                            Printout result from the pickle file
├── run.sh                                    General run entry on Venus 
├── run_cross.sh                              My run demos with parameters 
├── run_cross_auto.sh                         My run demos with parameters
├── run_cross_pre.sh
└── .gitignore                                gitignore file

Packages

pip install torch
pip install sklearn

Usage

Dataset and process

1.1 Original Full Data Processing Stream

 # Train data at '/data/ceph/seqrec/UMMD/data/hdfs/q36_age_train_08' (17G);
 cd bert/dataset
 # for each file in the train data folder run the following:
 python data_txt2pickle.py --filename "demo_data.gz" --domain 0
 # Results were saved at: /data/ceph/seqrec/UMMD/data/pickle/q36_age_train_org
 
 cd ../../   # Return to the main folder 
 ipython notebook
 run the train_test_split.ipynb  # for train and test splitting (only for missing 0 data samples)
 # Results were saved to:/data/ceph/seqrec/UMMD/data/pickle/q36_age_train_rec2

Here we provide a single processed file for both train and test as demo data at ./data folder.

We are still discussing whether and how to open-source the original dataset.

1.2 Data Structure of used pickle file

Each pickle file stores a dict data with keys: 'uid', 'age', 'gender', and 'feature', in which the values of keys 'uid', 'age', 'gender' are list data. The value for the 'feature' key is a list of length 5, where each element represent the 'features' (interacted items) in each of the 5 domains. For example:

'uid': [1, 2, 3],
'age': [11, 21, 31],
'gender': [1, 2, 1], # 1-male, 2-female
'feature': [
            [[a, b], [1, 2, 1], [i,j, k,l]],  # user 'feature' in domain 0 (for the three users)
            [[b, c], [1, 2, 4], [i,k,l]],  # user 'feature' in domain 1 (for the three users)
            [[a, d], [1, 2, 10], [k,l]],  # user 'feature' in domain 2 (for the three users)
            [[d, b], [1,2, 6], [1, l]],  # user 'feature' in domain 3 (for the three users)
            [[g, b], [1, 2, 5, 6], [i,h, p]],  # user 'feature' in domain 4 (for the three users)
            ]

Single-domain recommendation

configure your dataset path in function train_test_split() at Data_loader.py

# Single domain recommendation with BPR based MF
python main_single_mf.py --batch_size 2048 --num_run 0 --domain 0 --epoch 20 --bar_dis True --result_dir 'your path' --train True

Multi-target cross-domain recommendation
```
# CAT-ART
python main_cross.py --result_dir 'your path'
python main_cross_pre.py --result_dir 'your path'
```
For baseline HeroGRAPH, please refer to their open source at: https://github.com/cuiqiang1990/HeroGRAPH

Contact

Chenglin Li: ch11 AT ualberta dot ca

chain123 / cat-art Goto Github PK

cat-art's Introduction

One for All, All for One

About The Project

Code Structure

Packages

Usage

Contact

cat-art's People

Contributors

Stargazers

Watchers

Forkers

cat-art's Issues

About the class Quantize in .vq

Hello Li, about the amazaon dataset

Hello Li , about the code release and dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent