Coder Social home page Coder Social logo

deep-deconf's Introduction

Deep Causal Reasoning for Recommender Systems

The codes are associated with the following paper:

Deep Causal Reasoning for Recommendations,
Yaochen Zhu, Jing Yi, Jiayi Xie and Zhenzhong Chen,
ArXiv Preprints 2022. [pdf]

Note! We have released a survey regarding causal inference in recommender system. Check it out! Causal Inference in Recommender Systems: A Survey of Strategies for Bias Mitigation, Explanation, and Generalization.

Note! To better understand Rubin and Pearl's causal framework discussed in this paper, check out our new repo that summarizes relevant books of and disputes between the two most prominent schools of causal inference. Moreover, Prof. Ruocheng Guo's repo includes a thorough archive of various causal inference algorithms, with a sub-section devoted especially for recommender systems.

Environment

The codes are written in Python 3.6.5.

  • numpy == 1.16.3
  • pandas == 0.21.0
  • tensorflow-gpu == 1.15.0
  • tensorflow-probability == 0.8.0

Dataset Acquirement and Simulation

  • Acquire the movielens-1m and amazon-vg datasets:
    The original datasets can be found [here] and [here].
    Preprocess the data with data_sim/raw/prepare_data.py.

  • Preprocess the original dataset: cd to data_sim/raw folder, run
    python prepare_data.py --dataset Name --simulate {exposure, ratings}.

  • Fit the exposure and rating distribution via VAEs: cd to data_sim folder, run
    python train.py --dataset Name --simulate {exposure, ratings}.

  • Simulate the causal dataset under various confounding levels:
    python simulate.py --dataset Name --simulate {exposure, ratings}.

  • The simulated datasets are in casl/data folder

Fitting the Exposure and Rating Models

  • Split the simulated causal datasets into train/val/test:
    cd to casl_rec/data folder, run
    python preprocess.py --dataset Name --split 5.

  • Train the exposure model, conduct predictive check:
    python train_exposure.py --dataset Name --split [0-4]

  • Infer the subsititute confounders:
    python infer_subs_conf.py --dataset Name --split [0-4]

  • Train the potential rating prediction model:
    python train_ratings.py --dataset Name --split [0-4]

  • Predict the scores for hold-out users:
    python evaluate_model.py --dataset Name --split [0-4]

For advanced argument usage, run the code with --help argument.

If you find the codes useful, please kindly cite our paper. Thanks.

@article{zhu2022deep,
  title={Deep Causal Reasoning for Recommendations},
  author={Zhu, Yaochen and Yi, Jing and Xie, Jiayi and Chen, Zhenzhong},
  journal={arXiv preprint arXiv:2201.02088},
  year={2022}
}

deep-deconf's People

Contributors

yaochenzhu avatar

Stargazers

Juliano Leonardo Soares avatar  avatar  avatar 爱可可-爱生活 avatar Jason Poulos avatar Ramsey avatar Ruocheng Guo avatar  avatar princeess avatar  avatar Gary Fan avatar Jiayi Xie avatar  avatar HU_TAO avatar Xubin Ren avatar jingyi avatar  avatar Yin Zhang avatar Yuyuchen avatar Allen avatar  avatar  avatar Karen avatar Jiaming Han avatar  avatar

Watchers

 avatar

deep-deconf's Issues

ValueError: operands could not be broadcast together

Traceback (most recent call last):
  File "simulate.py", line 270, in <module>
    simulate()
  File "simulate.py", line 201, in simulate
    rat_dist = get_rat_dist(rat_table)
  File "simulate.py", line 62, in get_rat_dist
    return adj_rat_dist(rat_dist_raw)
  File "simulate.py", line 55, in adj_rat_dist
    rat_dist_raw = rat_dist_raw*weights
ValueError: operands could not be broadcast together with shapes (138,) (5,)

Gettig this error in simulate.py, could you please suggest a fix?

NDCG is nan in train.py for ratings.

I am getting NDCG as nan while simulating for ratings, But no issue while simulating for exposure. Could you please suggest a solution for this?

Adding user features

I have item purchase data. How do I add user features like age, gender etc. Currently I see that user features are a PCA of randomly sampled data from a Normal distribution. Does it make sense to randomly assign user features?

Regarding Evaluation

I have a very sparse dataset, where the average no. of ratings per user is 3, and under strong generalization we do a further split of val data, which further reduces the ratings per user in the val data. Hence does it even make sense to calculate recall@k,ndcg@k, where k>1?. Should i change the evaluation method to weak generalization?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.