bokae / taxi Goto Github PK

View Code? Open in Web Editor NEW

9.0 4.0 5.0 218.45 MB

Simulating taxi-request matchings on a grid.

Python 1.67% Shell 0.02% Jupyter Notebook 98.31%

simulation agent-based taxi

taxi's Introduction

Taxi simulation

This repository contains an agent-based simulation of a taxi-passenger city system.

Current algorithm:

Taxis are moving on a square grid.
There is a fixed number of taxis in the system. Their velocity is 1/timestep.
A fixed number of requests are generated at each timestep. The request origin and destination are generated both according predefined distributions.
Currently, three matching algorithms are implemented (nearest, poorest, random_limited, random_unlimited).
If a match is made, a taxi begins to move towards the request origin, where it picks up the passenger. Then the taxi moves towards the destination, where it drops off its passenger. After that, the taxi heads towards the base, but while it is moving back, it will be available again. The request will be marked as fulfilled.
After a certain waiting threshold, requests are dropped.

Configuration

The basic parameters for each simulation are stored in the *.conf files of the ./configs folder. These files have a JSON structure. Here is a sample config file with comments on the parameters:

{
  "n": 40, # grid width
  "m": 40, # grid height
  "price_fixed": 450, # price per trip
  "base_conf": [ # where is the taxi base
    20,
    20
  ],
  "price_per_dist": 140, # price per distance unit
  "hard_limit": 10, # max distance to be paired
  "cost_per_unit": 13, # fuel price
  "log": false, # verbose logging for debugging purposes
  "show_map_labels": false,  # taxi and request id labels on map
  "show_pending": false, # pending requests on map
  "show_plot": false, # show interactive map for debugging
  "max_request_waiting_time": 30, # max request waiting time for assignment
  "max_time": 300, # total time to run the simulation
  "batch_size": 100, # batch time after which there is a result dump
  "num_taxis": 100, # number of taxis in the system
  "request_origin_distributions": [ # origin distribution, might be encoded as arbitrary density function to be rotated around the center, see docstring in code
    {
      "location": [ # 2D Gaussian mean
        15,
        15
      ],
      "strength": 5, # relative strength
      "sigma": 5 # 2D Gaussian sigma
    },
    {
      "location": [
        25,
        25
      ],
      "strength": 5,
      "sigma": 5
    }
  ],
  "avg_request_lengths": 12.5, # this is added after having generated requests according to the origin and destination distributions
  "request_rate": 10, # number of requests per time unit
  "matching": "levelling2_random_user_nearest_poorest_taxi_w_waiting_limit", # matching algorithm
  "R": 1, # request-to-taxi ratio
  "d": 200, # taxi density
  "request_destination_distributions": [ # similar to origin distributions
    {
      "location": [
        20,
        20
      ],
      "strength": 5,
      "sigma": 10
    },
    {
      "location": [
        35,
        35
      ],
      "strength": 2,
      "sigma": 2
    }
  ]
}

For a batch run for the same system with different parameter sets, there is always a base config file (e.g. configs/0606_base.conf), from which a series of config files are generated using the generate_configs.py file. Usage:

python generate_configs.py 0711_base.conf generate_mode

Batch run

The file run.py initiates one run from a config file and writes the results of the simulation to gzipped csv and json files. Usage:

python run.py 0525_1_priced

where 0525_1_priced.conf is a file name from the configs directory, and the results will be saved with a similarly beginning filename into the results directory as a csv.gz (for the aggregated metrics) and two json,gz files (one for the per taxi and one for the per results metrics).

For batch running of several configurations see the manual of ./batch_run.sh. On servers with a SLURM system, I used ./batch_slurm.sh and ./batch_slurm_big.sh to submit jobs to the processing queue.

Note: the scripts have to be marked as executable. Or you can use a bash batch_run.sh ... syntax.

Debugging with interactive visualization

For debugging and fun purposes, display the map of the simulation with the taxis and requests color-coded and moving. Must be run from a Juypter Notebook.

Importing the classes, initializing interactive plotting, importing the timestep button.

%matplotlib notebook
from city_model import *
from ipywidgets import Button
import json

Configuring the simulation in a JSON file, then displaying the map.

config=json.load(open('configs/simple.conf'))
s = Simulation(**config) # create a Simulation instance
b = Button(description = "Step_time") # create clickable time tick button
b.on_click(s.step_time) # assign time ticker function to button callback
b

We can print the state of taxis or requests with the usual print command using their IDs. For example:

print(s.requests[3])
print(s.taxis[1])

taxi's People

Contributors

Stargazers

Watchers

Forkers

geroldcsendes zhenliangma fusion-research siyingwang2333

taxi's Issues

Fit distributions to real-world values

Error list

A[x,y] elements have to be updated when empty taxis are moving back to the base OK
Some taxis are assigned a request, but their path is empty. OK

Summarize system stabilization over time in one plot

Rather than (or in addition to) producing plots every 10k steps, make one plot per variable that summarizes change over the entire simulation (like it is done with avg waiting time now)

Some higher-level questions

How distance from reference location affects the items rankings?
I wonder, if it might be interesting to use the distance from the request origin as a penalty factor in driver selection.

TODO list

literature reading, group related work into categories
running simulations with bigger city (15 km x 15 km), multiple times for smoother curves
investigating the role of initial conditions (driver home locations) -> new figure
calculating new unfairness indices (20-20, Gini, Atkinson egyben)
recreating old figures with the new simulations
proofreading the draft
analyzing NYC data for realistic inputs?
requesting Uber data for London

Determine basic parameters that should be fixed

E.g. taxi/request ratio etc.

Journal

https://journals.sagepub.com/home/ssc
http://rsif.royalsocietypublishing.org/

Jegyzetek

idő múlása kicsit másképp implementálni
középen bázisállomás - üresjáratban a taxik elindulnak ide vissza
requestek eloszlása a középpont körül 2D Gauss - szélesség ("vonzás") változtatása
időben legyen állandó a taxik száma és az új kérések száma
matching algoritmusok - specifikáció + tűréshatár, hogy valami körön belül legyen pl.
- baseline: véletlen taxi megy az emberünkért
- csak az emberre optimalizálunk - ő várjon minél kevesebbet
- távolság alapján match, túl távoli esetben várakozás
- aktuális metrikákra alapozó - kiegyenlítő
metrikák
- taxi várakozás
- taxi üresjárat
- taxi hasznos járás
- ember várakozás
- mikor nincs taxi egyáltalán
  - mindegyikre átlag, eloszlás (individual, group)

About the taxi project

It's a great project. I have some confusion when using the code, the code doesn't work properly.
Is there any relevant paper I can read?

The role of initial conditions [HUN]

Pl, hogy mit gondolunk, az initial state okozza az egyenlotlensegeket, vagy egyszeruen az, hogy a rendszer nem egyenloen osztja szet a requesteket. Igazabol mivel a taxisoknak semmi beleszolasa, hogy hova mennek (es a szimulacionkban ugyanugy is viselkednek), ezert az indulasi pontjuk es az, hogy utana milyen algoritmus dont, teljesen meghatarozza, hogy mennyit keresnek.
Azert is kene szetvalasztani, hogy melyik hatas okozza a kulonbsegeket, mert mas ra a megoldas. Ha az initial state okozza, akkor azt kene randomizalni, de ha az assignment procedure, akkor ugye azon kene valtoztatni. Szoval ki kene talalni a gazdagok miert gazdagok az igazi rendszerben..

Erdekes lenne kitalalni azt is, hogy az initial statenek mennyi a mixing-timeja, vagy hogy mondjak ezt. Siman lehet, hogy hamar eltunik a hatasa, de az is lehet, hogy kb addig nem, amig nem megy offline es persze akkor is megint ugyanott fog masnap felbukkanni es valoszinuleg hasonlo parositasokat kap.

Nyilvan csomo dolog van egy igazi varosban, ami nagyjabol ismetlodik, pl az utas flow-k napkozben, meg a taxisok kezdo helye, de lehet, hogy egy kis randomizalas is eleg, ahhoz hogy nagyobb hatasu kiegyenlitodes legyen, nem?

Kezd az egesz markov lancokra emlekeztetni..

Poor drivers

average length until pickup
why do some people earn less monay than others (less requests, more pickup length, megbízások hossza)
mindig ugyanazokkal történik-e az, hogy elszegényednek: hogyan függ a kezdőfeltételektől?
miért hasznos leegyszerűsítés az, amit csinálunk?

Assign monetary value to idling and crusing

There is a monetary loss in sitting idly (waiting time) and in going somewhere without a customer (empty time).
Make these costs configurable and incorporate into total money made.

Matching algorithms

There are four classes in the py-file.

City() - stores the geometry of the simulation,
Taxi() - a taxi object,
Request() - a request object,
Simulation() - the simulation components.

The most important class variables for the matching algorithm are the following:

Simulation.taxis: dict with int keys taxi_id, contains all used Taxi() instances.
Simulation.requests: dict with int keys request_id, contains all used Request() instances.

You have to update these dictionaries when you make a match as in the example Simulation.assign_request.

The most important methods for the matching algorithm are the following:

Simulation.find_nearest_available_taxis(): see description in docstring. Returns a list of taxi_ids according to the chosen method (all taxis within a given radius, nearest taxis only). If we want to search for taxis in a given ring, then we should subtract the results of the smaller radius from that of the bigger radius.
Simulation.assign_request(self, request_id): this is the function that returns the assigned taxi_id (that does the actual matching). I wrote a first example that chooses one taxi randomly from the closest ones, and rejects serving the request if it is further than a predefined threshold.

Think about the passengers!

Add passenger waiting time distribution as a measured variable.

Bugs, mistakes, to develop

Folders in the configs and results folders to organize input and output files.
num_trips_completed always shows 0.

Speeding up the simulation

It is still quite slow for bigger (e.g. more realistic) sizes. Ideas for speeding up.

Done:

I've already changed state storing lists to either deques (where we only ever need the two ends of the list), or to sets (where we only need containment testing and adding and removing elements).
https://github.com/robtandy/randomdict There is an error in the above repo, so installation is pip install git+https://github.com/NicoDeGiacomo/randomdict, this contains the fix. On the benchmark config, it only speeded up the simulation by 1s/batch...
For loops to list comprehensions or maps where possible, making as many local variables in for loops as we can, since global variable access is slower. What is the case when using methods of the same class?
I've ran the code with python -m cProfile run.py 0608_benchmark, and it turned out that generating requests eats up almost half of the time! There should be a better way for it. The poorest algorithm is not much slower than the random one.
I'm generating random numbers in the first place into a deque, and map and filter them into grid integers. Then I access request origin and destination points from this deque, and extend the deque if there are no more points in it. This has been really efficient!
Calculating neighbors on the grid in advance, then storing them in a dictionary or an array for really quick access.
Is a huge dict the fastest way of storing all objects? Or maybe a big array, where indices are the request_ids and taxi_ids? If the number of objects change, how do we allocate the space dynamically? Or do we allocate all in advance? Maybe that would eat up too much memory... Or maybe some kind of a tree structure for O(log n) faster access?: A dict enables O(1) access, because it is hashing the keys. It is definitely fast.
** Is there a computer somewhere, where we have more than 5 threads? Correct jimgray commands do the trick!!!

 sbatch -c 6 --mem=1000 run.sh
# -c is number of threads, has to divide 24 (e.g. 2,4,6 or 12, depending on memory consumption)
# --mem is memory allocation in MB, compulsory, otherwise, all memory is allocated to one job, and there will be no multiple threads

I could not find a great data structure (e.g. KDTree, RTree) in Python for storing City.A in a way that is accessibe very fast, but insertion and/or deletion is also very fast. I've given up on this track for now, maybe if we modify the grid in a next round to some map... Geopandas has to be reindexed every time it gets modified, it is not good here.

Possibilities:

Maybe Python itself is an obstacle - is there any way to compile the code? (How much would it take to rewrite it in C++? Is it worth the time?) A pyc file is compiled from the city_model.py module, the question is whether there is a compiler that can optimize more.
Could we run it on GPU?
Estimate runtime based on parameters and find an optimum schedule, submit jobs in that order. ** That is just having a look at the slurm files.
numba
Putting taxis into a pandas DataFrame, then moving them by .apply().

Data

Chicago taxi trips
https://www.kaggle.com/chicago/chicago-taxi-trips-bq

Minimize std of total money earned

Instead of assigning drivers to riders just on the distance, give preference to drivers with less total earnings to minize the standard deviation of money earned.