Coder Social home page Coder Social logo

causalmarl's Introduction

A MULTI-AGENT REINFORCEMENT LEARNING FRAMEWORK FOR OFF-POLICY EVALUATION IN TWO-SIDED MARKETS

This repository contains the implementation for the paper "A MULTI-AGENT REINFORCEMENT LEARNING FRAMEWORK FOR OFF-POLICY EVALUATION IN TWO-SIDED MARKETS" (AOAS, 2022+) in Python.

Summary of the paper

The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies.

drawing

Figure: Causal diagram for a multi-agent system with two agents. $(S_{j,t}, A_{j,t}, R_{j,t})$ represents the statetreatment-outcome triplet of the $j$th agent at time $t$.

Requirements

  • Python version: Python 3.6.8 :: Anaconda custom (64-bit)
  • Main packages for the proposed estimator
    • numpy
    • scipy
    • sklearn
    • tensorflow == 1.13.2
    • itertools
  • Additional packages for experiments
    • pickle
    • multiprocessing
    • os
    • time
    • sys
    • logging
    • warnings

File Overview

Our estimator

  • main.py: main function for the proposed estimator and its components
  • weight.py: neural network for the weight estimation part
  • utils.py: helper functions for main.py and simulation parts

Synthetic data

  • simu.py: experiment script for reproducing the results in the paper
  • simu_funs.py: main functions for the simulation experiment
  • simu_DGP.py: data generating functions for the simulation experiment

Real data-based simulation

  • Real_Simu/policy_defination.py: definitions of target policies
  • Real_Simu/est_with_pseudo_data.ipynb: notebook to obtain estimated policy value with the pseudo-dataset.
  • Real_Simu/real_data_input.py: functions for preparing input data
  • Real_Simu/pseudo_data: a pickle file. One simulated dataset.

Testing the Markov property

Files _core_test_fun.py, _QRF.py, and _uti_basic.py are all from the package TestMDP.

How to reproduce simulation results

To reproduce the results presented in the Simulaiton section:

  1. Change the working directory to the main folder
  2. Run python simu.py , which generates two files final_sd.txt and final_T.txt
  3. Load them with pickle.load and then we can get the results that Figure 4 is based on.

How to test the Markov property

We use the testing procedure developed in the package TestMDP. Essentially, we collect the simulated trajectories, organize them into the expected input formats, and call the test function. See test.ipynb for details.

How to reproduce the real data-based simulation

The real data used in the paper is from a ride-sharing company. Due to data confidentiality, we are not allowed to release some sensitive data such as the Gross Merchandise Volume and the implemented subsidizing policies. Therefore, results can not be exactly reproduced.

That said, we try our best to provide a similar simulation environment. Specifically, a dataset generated by the simulator is provided and can be used with the estimators. True policy values are also provided. This dataset corresponds to the passenger-side subsidizing policies.

To reproduce the estimation results corresponding to the generated pseudo dataset, run Real_based_Simulation/est_with_pseudo_data.ipynb.

causalmarl's People

Contributors

runzhestat avatar callmespring avatar

Stargazers

 avatar Nikolaus Schlemm avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.