Coder Social home page Coder Social logo

chain123 / cat-art Goto Github PK

View Code? Open in Web Editor NEW
31.0 3.0 8.0 14.43 MB

One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation (WSDM-2023)

License: MIT License

Python 46.09% Jupyter Notebook 52.86% Shell 1.05%

cat-art's Introduction


One for All, All for One

About The Project

Multi-target Cross-domain recommendation with implicit feedback data in each domain. Paper under submission to CIKM-22, "One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation".

Code Structure

One-For-All   
├── README.md                                 Read me file
├── bert                                      Bert modules and Transformer layers
├── data                                      Demo data  
├── dataset                                   Data set and processing methods
│  ├── check_sparsity.ipynb                   Calculate data set distribution for all domains.   
│  ├── multi-domain_amazon.ipynb              Demo on how to process the Amazon multiple data sets for MTCDR   
├── utils                                     Utilities, e.g. early stop.
│  ├── __init__.py                            Module init 
│  ├── data.py                                All data loaders 
│  ├── eval_metrics.py                        Evaluation metrics 
│  ├── loss_func.py                           Loss functions 
│  ├── pytorchtools.py                        Early stop implementation
│  ├── result_plot.py                         Line plot function 
│  ├── Save_embedding.py                      Functions for saving item/user embedding, preference scores, etc. 
│  ├── scheduler.py                           Learning rate scheduler
│  ├── tools.py                               Others (not in use)
│── Data_loader.py                            Entry for data split and data loader creators      
├── loss_functions.py                         Loss functions
├── main_cross.py                             CAT-ART model, without pre-training of the CAT module 
├── main_cross_pre.py                         CAT-ART model, with pre-training of the CAT module
├── main_cross_scores.py                      Get the averaged attention scores of the ART module on the test set.
├── main_single_mf.py                         Single domain MF model for recommendation
├── models.py                                 Models 
├── run_functions.py                          Run functions, e.g. train step, test step
├── show_result.py                            Printout result from the pickle file
├── run.sh                                    General run entry on Venus 
├── run_cross.sh                              My run demos with parameters 
├── run_cross_auto.sh                         My run demos with parameters
├── run_cross_pre.sh
└── .gitignore                                gitignore file

Packages

pip install torch
pip install sklearn

Usage

  1. Dataset and process

    1.1 Original Full Data Processing Stream

     # Train data at '/data/ceph/seqrec/UMMD/data/hdfs/q36_age_train_08' (17G);
     cd bert/dataset
     # for each file in the train data folder run the following:
     python data_txt2pickle.py --filename "demo_data.gz" --domain 0
     # Results were saved at: /data/ceph/seqrec/UMMD/data/pickle/q36_age_train_org
     
     cd ../../   # Return to the main folder 
     ipython notebook
     run the train_test_split.ipynb  # for train and test splitting (only for missing 0 data samples)
     # Results were saved to:/data/ceph/seqrec/UMMD/data/pickle/q36_age_train_rec2  

    Here we provide a single processed file for both train and test as demo data at ./data folder.

    We are still discussing whether and how to open-source the original dataset.

    1.2 Data Structure of used pickle file

    Each pickle file stores a dict data with keys: 'uid', 'age', 'gender', and 'feature', in which the values of keys 'uid', 'age', 'gender' are list data. The value for the 'feature' key is a list of length 5, where each element represent the 'features' (interacted items) in each of the 5 domains. For example:

    'uid': [1, 2, 3],
    'age': [11, 21, 31],
    'gender': [1, 2, 1], # 1-male, 2-female
    'feature': [
                [[a, b], [1, 2, 1], [i,j, k,l]],  # user 'feature' in domain 0 (for the three users)
                [[b, c], [1, 2, 4], [i,k,l]],  # user 'feature' in domain 1 (for the three users)
                [[a, d], [1, 2, 10], [k,l]],  # user 'feature' in domain 2 (for the three users)
                [[d, b], [1,2, 6], [1, l]],  # user 'feature' in domain 3 (for the three users)
                [[g, b], [1, 2, 5, 6], [i,h, p]],  # user 'feature' in domain 4 (for the three users)
                ]           
    
  2. Single-domain recommendation

    configure your dataset path in function train_test_split() at Data_loader.py

    # Single domain recommendation with BPR based MF
    python main_single_mf.py --batch_size 2048 --num_run 0 --domain 0 --epoch 20 --bar_dis True --result_dir 'your path' --train True
  3. Multi-target cross-domain recommendation

    # CAT-ART
    python main_cross.py --result_dir 'your path'
    python main_cross_pre.py --result_dir 'your path'

    For baseline HeroGRAPH, please refer to their open source at: https://github.com/cuiqiang1990/HeroGRAPH

Contact

Chenglin Li: ch11 AT ualberta dot ca

cat-art's People

Contributors

chain123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cat-art's Issues

About the class Quantize in .vq

Hello Li, this paper is an excellent job.

I found a problem when trying to reproduce it, the implementation of class Quantize in .vq is not included in the open source code, could you add the implementation of that class? Thank you to a great deal!

Hello Li, about the amazaon dataset

Hello Li, I read your article recently, and I want to run the code, but failed. If the dataset in your paper can't be released, could you please give me the amazon dataset that has been processed?

And the implementation of class Quantize in .vq is not included in the open source code, is it unnecessary?

Best wishes!

Hello Li , about the code release and dataset

Hi I read your article on arxiv, your work is very advanced work in CDR, I would like to know more about this work, when can I see the code? Is it possible to provide the dataset😄 ?

Thanks !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.