Coder Social home page Coder Social logo

taxi's Introduction

Taxi simulation

This repository contains an agent-based simulation of a taxi-passenger city system.

Current algorithm:

  • Taxis are moving on a square grid.
  • There is a fixed number of taxis in the system. Their velocity is 1/timestep.
  • A fixed number of requests are generated at each timestep. The request origin and destination are generated both according predefined distributions.
  • Currently, three matching algorithms are implemented (nearest, poorest, random_limited, random_unlimited).
  • If a match is made, a taxi begins to move towards the request origin, where it picks up the passenger. Then the taxi moves towards the destination, where it drops off its passenger. After that, the taxi heads towards the base, but while it is moving back, it will be available again. The request will be marked as fulfilled.
  • After a certain waiting threshold, requests are dropped.

Configuration

The basic parameters for each simulation are stored in the *.conf files of the ./configs folder. These files have a JSON structure. Here is a sample config file with comments on the parameters:

{
  "n": 40, # grid width
  "m": 40, # grid height
  "price_fixed": 450, # price per trip
  "base_conf": [ # where is the taxi base
    20,
    20
  ],
  "price_per_dist": 140, # price per distance unit
  "hard_limit": 10, # max distance to be paired
  "cost_per_unit": 13, # fuel price
  "log": false, # verbose logging for debugging purposes
  "show_map_labels": false,  # taxi and request id labels on map
  "show_pending": false, # pending requests on map
  "show_plot": false, # show interactive map for debugging
  "max_request_waiting_time": 30, # max request waiting time for assignment
  "max_time": 300, # total time to run the simulation
  "batch_size": 100, # batch time after which there is a result dump
  "num_taxis": 100, # number of taxis in the system
  "request_origin_distributions": [ # origin distribution, might be encoded as arbitrary density function to be rotated around the center, see docstring in code
    {
      "location": [ # 2D Gaussian mean
        15,
        15
      ],
      "strength": 5, # relative strength
      "sigma": 5 # 2D Gaussian sigma
    },
    {
      "location": [
        25,
        25
      ],
      "strength": 5,
      "sigma": 5
    }
  ],
  "avg_request_lengths": 12.5, # this is added after having generated requests according to the origin and destination distributions
  "request_rate": 10, # number of requests per time unit
  "matching": "levelling2_random_user_nearest_poorest_taxi_w_waiting_limit", # matching algorithm
  "R": 1, # request-to-taxi ratio
  "d": 200, # taxi density
  "request_destination_distributions": [ # similar to origin distributions
    {
      "location": [
        20,
        20
      ],
      "strength": 5,
      "sigma": 10
    },
    {
      "location": [
        35,
        35
      ],
      "strength": 2,
      "sigma": 2
    }
  ]
}

For a batch run for the same system with different parameter sets, there is always a base config file (e.g. configs/0606_base.conf), from which a series of config files are generated using the generate_configs.py file. Usage:

python generate_configs.py 0711_base.conf generate_mode

Batch run

The file run.py initiates one run from a config file and writes the results of the simulation to gzipped csv and json files. Usage:

python run.py 0525_1_priced

where 0525_1_priced.conf is a file name from the configs directory, and the results will be saved with a similarly beginning filename into the results directory as a csv.gz (for the aggregated metrics) and two json,gz files (one for the per taxi and one for the per results metrics).

For batch running of several configurations see the manual of ./batch_run.sh. On servers with a SLURM system, I used ./batch_slurm.sh and ./batch_slurm_big.sh to submit jobs to the processing queue.

Note: the scripts have to be marked as executable. Or you can use a bash batch_run.sh ... syntax.

Debugging with interactive visualization

For debugging and fun purposes, display the map of the simulation with the taxis and requests color-coded and moving. Must be run from a Juypter Notebook.

Importing the classes, initializing interactive plotting, importing the timestep button.

%matplotlib notebook
from city_model import *
from ipywidgets import Button
import json

Configuring the simulation in a JSON file, then displaying the map.

config=json.load(open('configs/simple.conf'))
s = Simulation(**config) # create a Simulation instance
b = Button(description = "Step_time") # create clickable time tick button
b.on_click(s.step_time) # assign time ticker function to button callback
b

We can print the state of taxis or requests with the usual print command using their IDs. For example:

print(s.requests[3])
print(s.taxis[1])

taxi's People

Contributors

ancsaaa3 avatar bokae avatar sapiezynski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

taxi's Issues

Error list

  • A[x,y] elements have to be updated when empty taxis are moving back to the base OK
  • Some taxis are assigned a request, but their path is empty. OK

Some higher-level questions

  • How distance from reference location affects the items rankings?
  • I wonder, if it might be interesting to use the distance from the request origin as a penalty factor in driver selection.

TODO list

  • literature reading, group related work into categories

  • running simulations with bigger city (15 km x 15 km), multiple times for smoother curves

  • investigating the role of initial conditions (driver home locations) -> new figure

  • calculating new unfairness indices (20-20, Gini, Atkinson egyben)

  • recreating old figures with the new simulations

  • proofreading the draft

  • analyzing NYC data for realistic inputs?

  • requesting Uber data for London

Jegyzetek

  • idő múlása kicsit másképp implementálni
  • középen bázisállomás - üresjáratban a taxik elindulnak ide vissza
  • requestek eloszlása a középpont körül 2D Gauss - szélesség ("vonzás") változtatása
  • időben legyen állandó a taxik száma és az új kérések száma
  • matching algoritmusok - specifikáció + tűréshatár, hogy valami körön belül legyen pl.
    • baseline: véletlen taxi megy az emberünkért
    • csak az emberre optimalizálunk - ő várjon minél kevesebbet
    • távolság alapján match, túl távoli esetben várakozás
    • aktuális metrikákra alapozó - kiegyenlítő
  • metrikák
    • taxi várakozás
    • taxi üresjárat
    • taxi hasznos járás
    • ember várakozás
    • mikor nincs taxi egyáltalán
      • mindegyikre átlag, eloszlás (individual, group)

About the taxi project

It's a great project. I have some confusion when using the code, the code doesn't work properly.
Is there any relevant paper I can read?

The role of initial conditions [HUN]

Pl, hogy mit gondolunk, az initial state okozza az egyenlotlensegeket, vagy egyszeruen az, hogy a rendszer nem egyenloen osztja szet a requesteket. Igazabol mivel a taxisoknak semmi beleszolasa, hogy hova mennek (es a szimulacionkban ugyanugy is viselkednek), ezert az indulasi pontjuk es az, hogy utana milyen algoritmus dont, teljesen meghatarozza, hogy mennyit keresnek.
Azert is kene szetvalasztani, hogy melyik hatas okozza a kulonbsegeket, mert mas ra a megoldas. Ha az initial state okozza, akkor azt kene randomizalni, de ha az assignment procedure, akkor ugye azon kene valtoztatni. Szoval ki kene talalni a gazdagok miert gazdagok az igazi rendszerben..

Erdekes lenne kitalalni azt is, hogy az initial statenek mennyi a mixing-timeja, vagy hogy mondjak ezt. Siman lehet, hogy hamar eltunik a hatasa, de az is lehet, hogy kb addig nem, amig nem megy offline es persze akkor is megint ugyanott fog masnap felbukkanni es valoszinuleg hasonlo parositasokat kap.

Nyilvan csomo dolog van egy igazi varosban, ami nagyjabol ismetlodik, pl az utas flow-k napkozben, meg a taxisok kezdo helye, de lehet, hogy egy kis randomizalas is eleg, ahhoz hogy nagyobb hatasu kiegyenlitodes legyen, nem?

Kezd az egesz markov lancokra emlekeztetni..

Poor drivers

  • average length until pickup
  • why do some people earn less monay than others (less requests, more pickup length, megbízások hossza)
  • mindig ugyanazokkal történik-e az, hogy elszegényednek: hogyan függ a kezdőfeltételektől?
  • miért hasznos leegyszerűsítés az, amit csinálunk?

Assign monetary value to idling and crusing

There is a monetary loss in sitting idly (waiting time) and in going somewhere without a customer (empty time).
Make these costs configurable and incorporate into total money made.

Matching algorithms

There are four classes in the py-file.

  • City() - stores the geometry of the simulation,
  • Taxi() - a taxi object,
  • Request() - a request object,
  • Simulation() - the simulation components.

The most important class variables for the matching algorithm are the following:

  • Simulation.taxis: dict with int keys taxi_id, contains all used Taxi() instances.
  • Simulation.requests: dict with int keys request_id, contains all used Request() instances.

You have to update these dictionaries when you make a match as in the example Simulation.assign_request.

The most important methods for the matching algorithm are the following:

  • Simulation.find_nearest_available_taxis(): see description in docstring. Returns a list of taxi_ids according to the chosen method (all taxis within a given radius, nearest taxis only). If we want to search for taxis in a given ring, then we should subtract the results of the smaller radius from that of the bigger radius.
  • Simulation.assign_request(self, request_id): this is the function that returns the assigned taxi_id (that does the actual matching). I wrote a first example that chooses one taxi randomly from the closest ones, and rejects serving the request if it is further than a predefined threshold.

Bugs, mistakes, to develop

  • Folders in the configs and results folders to organize input and output files.
  • num_trips_completed always shows 0.

Speeding up the simulation

It is still quite slow for bigger (e.g. more realistic) sizes. Ideas for speeding up.

Done:

  • I've already changed state storing lists to either deques (where we only ever need the two ends of the list), or to sets (where we only need containment testing and adding and removing elements).
  • https://github.com/robtandy/randomdict There is an error in the above repo, so installation is pip install git+https://github.com/NicoDeGiacomo/randomdict, this contains the fix. On the benchmark config, it only speeded up the simulation by 1s/batch...
  • For loops to list comprehensions or maps where possible, making as many local variables in for loops as we can, since global variable access is slower. What is the case when using methods of the same class?
  • I've ran the code with python -m cProfile run.py 0608_benchmark, and it turned out that generating requests eats up almost half of the time! There should be a better way for it. The poorest algorithm is not much slower than the random one.
  • I'm generating random numbers in the first place into a deque, and map and filter them into grid integers. Then I access request origin and destination points from this deque, and extend the deque if there are no more points in it. This has been really efficient!
  • Calculating neighbors on the grid in advance, then storing them in a dictionary or an array for really quick access.
  • Is a huge dict the fastest way of storing all objects? Or maybe a big array, where indices are the request_ids and taxi_ids? If the number of objects change, how do we allocate the space dynamically? Or do we allocate all in advance? Maybe that would eat up too much memory... Or maybe some kind of a tree structure for O(log n) faster access?: A dict enables O(1) access, because it is hashing the keys. It is definitely fast.
    ** Is there a computer somewhere, where we have more than 5 threads? Correct jimgray commands do the trick!!!
 sbatch -c 6 --mem=1000 run.sh
# -c is number of threads, has to divide 24 (e.g. 2,4,6 or 12, depending on memory consumption)
# --mem is memory allocation in MB, compulsory, otherwise, all memory is allocated to one job, and there will be no multiple threads
  • I could not find a great data structure (e.g. KDTree, RTree) in Python for storing City.A in a way that is accessibe very fast, but insertion and/or deletion is also very fast. I've given up on this track for now, maybe if we modify the grid in a next round to some map... Geopandas has to be reindexed every time it gets modified, it is not good here.

Possibilities:

  • Maybe Python itself is an obstacle - is there any way to compile the code? (How much would it take to rewrite it in C++? Is it worth the time?) A pyc file is compiled from the city_model.py module, the question is whether there is a compiler that can optimize more.

  • Could we run it on GPU?

  • Estimate runtime based on parameters and find an optimum schedule, submit jobs in that order. ** That is just having a look at the slurm files.

  • numba

  • Putting taxis into a pandas DataFrame, then moving them by .apply().

Minimize std of total money earned

Instead of assigning drivers to riders just on the distance, give preference to drivers with less total earnings to minize the standard deviation of money earned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.