bokae / taxi Goto Github PK

View Code? Open in Web Editor NEW

9.0 4.0 5.0 218.45 MB

Simulating taxi-request matchings on a grid.

Python 1.67% Shell 0.02% Jupyter Notebook 98.31%

simulation agent-based taxi

taxi's Issues

TODO list

literature reading, group related work into categories
running simulations with bigger city (15 km x 15 km), multiple times for smoother curves
investigating the role of initial conditions (driver home locations) -> new figure
calculating new unfairness indices (20-20, Gini, Atkinson egyben)
recreating old figures with the new simulations
proofreading the draft
analyzing NYC data for realistic inputs?
requesting Uber data for London

Poor drivers

average length until pickup
why do some people earn less monay than others (less requests, more pickup length, megbízások hossza)
mindig ugyanazokkal történik-e az, hogy elszegényednek: hogyan függ a kezdőfeltételektől?
miért hasznos leegyszerűsítés az, amit csinálunk?

The role of initial conditions [HUN]

Pl, hogy mit gondolunk, az initial state okozza az egyenlotlensegeket, vagy egyszeruen az, hogy a rendszer nem egyenloen osztja szet a requesteket. Igazabol mivel a taxisoknak semmi beleszolasa, hogy hova mennek (es a szimulacionkban ugyanugy is viselkednek), ezert az indulasi pontjuk es az, hogy utana milyen algoritmus dont, teljesen meghatarozza, hogy mennyit keresnek.
Azert is kene szetvalasztani, hogy melyik hatas okozza a kulonbsegeket, mert mas ra a megoldas. Ha az initial state okozza, akkor azt kene randomizalni, de ha az assignment procedure, akkor ugye azon kene valtoztatni. Szoval ki kene talalni a gazdagok miert gazdagok az igazi rendszerben..

Erdekes lenne kitalalni azt is, hogy az initial statenek mennyi a mixing-timeja, vagy hogy mondjak ezt. Siman lehet, hogy hamar eltunik a hatasa, de az is lehet, hogy kb addig nem, amig nem megy offline es persze akkor is megint ugyanott fog masnap felbukkanni es valoszinuleg hasonlo parositasokat kap.

Nyilvan csomo dolog van egy igazi varosban, ami nagyjabol ismetlodik, pl az utas flow-k napkozben, meg a taxisok kezdo helye, de lehet, hogy egy kis randomizalas is eleg, ahhoz hogy nagyobb hatasu kiegyenlitodes legyen, nem?

Kezd az egesz markov lancokra emlekeztetni..

Speeding up the simulation

It is still quite slow for bigger (e.g. more realistic) sizes. Ideas for speeding up.

Done:

I've already changed state storing lists to either deques (where we only ever need the two ends of the list), or to sets (where we only need containment testing and adding and removing elements).
https://github.com/robtandy/randomdict There is an error in the above repo, so installation is pip install git+https://github.com/NicoDeGiacomo/randomdict, this contains the fix. On the benchmark config, it only speeded up the simulation by 1s/batch...
For loops to list comprehensions or maps where possible, making as many local variables in for loops as we can, since global variable access is slower. What is the case when using methods of the same class?
I've ran the code with python -m cProfile run.py 0608_benchmark, and it turned out that generating requests eats up almost half of the time! There should be a better way for it. The poorest algorithm is not much slower than the random one.
I'm generating random numbers in the first place into a deque, and map and filter them into grid integers. Then I access request origin and destination points from this deque, and extend the deque if there are no more points in it. This has been really efficient!
Calculating neighbors on the grid in advance, then storing them in a dictionary or an array for really quick access.
Is a huge dict the fastest way of storing all objects? Or maybe a big array, where indices are the request_ids and taxi_ids? If the number of objects change, how do we allocate the space dynamically? Or do we allocate all in advance? Maybe that would eat up too much memory... Or maybe some kind of a tree structure for O(log n) faster access?: A dict enables O(1) access, because it is hashing the keys. It is definitely fast.
** Is there a computer somewhere, where we have more than 5 threads? Correct jimgray commands do the trick!!!

 sbatch -c 6 --mem=1000 run.sh
# -c is number of threads, has to divide 24 (e.g. 2,4,6 or 12, depending on memory consumption)
# --mem is memory allocation in MB, compulsory, otherwise, all memory is allocated to one job, and there will be no multiple threads

I could not find a great data structure (e.g. KDTree, RTree) in Python for storing City.A in a way that is accessibe very fast, but insertion and/or deletion is also very fast. I've given up on this track for now, maybe if we modify the grid in a next round to some map... Geopandas has to be reindexed every time it gets modified, it is not good here.

Possibilities:

Maybe Python itself is an obstacle - is there any way to compile the code? (How much would it take to rewrite it in C++? Is it worth the time?) A pyc file is compiled from the city_model.py module, the question is whether there is a compiler that can optimize more.
Could we run it on GPU?
Estimate runtime based on parameters and find an optimum schedule, submit jobs in that order. ** That is just having a look at the slurm files.
numba
Putting taxis into a pandas DataFrame, then moving them by .apply().

Data

Chicago taxi trips
https://www.kaggle.com/chicago/chicago-taxi-trips-bq

Assign monetary value to idling and crusing

There is a monetary loss in sitting idly (waiting time) and in going somewhere without a customer (empty time).
Make these costs configurable and incorporate into total money made.

Error list

A[x,y] elements have to be updated when empty taxis are moving back to the base OK
Some taxis are assigned a request, but their path is empty. OK

Journal

https://journals.sagepub.com/home/ssc
http://rsif.royalsocietypublishing.org/

Some higher-level questions

How distance from reference location affects the items rankings?
I wonder, if it might be interesting to use the distance from the request origin as a penalty factor in driver selection.

About the taxi project

It's a great project. I have some confusion when using the code, the code doesn't work properly.
Is there any relevant paper I can read?

Bugs, mistakes, to develop

Folders in the configs and results folders to organize input and output files.
num_trips_completed always shows 0.

idő múlása kicsit másképp implementálni
középen bázisállomás - üresjáratban a taxik elindulnak ide vissza
requestek eloszlása a középpont körül 2D Gauss - szélesség ("vonzás") változtatása
időben legyen állandó a taxik száma és az új kérések száma
matching algoritmusok - specifikáció + tűréshatár, hogy valami körön belül legyen pl.
- baseline: véletlen taxi megy az emberünkért
- csak az emberre optimalizálunk - ő várjon minél kevesebbet
- távolság alapján match, túl távoli esetben várakozás
- aktuális metrikákra alapozó - kiegyenlítő
metrikák
- taxi várakozás
- taxi üresjárat
- taxi hasznos járás
- ember várakozás
- mikor nincs taxi egyáltalán
  - mindegyikre átlag, eloszlás (individual, group)

Matching algorithms

There are four classes in the py-file.

City() - stores the geometry of the simulation,
Taxi() - a taxi object,
Request() - a request object,
Simulation() - the simulation components.

The most important class variables for the matching algorithm are the following:

Simulation.taxis: dict with int keys taxi_id, contains all used Taxi() instances.
Simulation.requests: dict with int keys request_id, contains all used Request() instances.

You have to update these dictionaries when you make a match as in the example Simulation.assign_request.

The most important methods for the matching algorithm are the following:

Simulation.find_nearest_available_taxis(): see description in docstring. Returns a list of taxi_ids according to the chosen method (all taxis within a given radius, nearest taxis only). If we want to search for taxis in a given ring, then we should subtract the results of the smaller radius from that of the bigger radius.
Simulation.assign_request(self, request_id): this is the function that returns the assigned taxi_id (that does the actual matching). I wrote a first example that chooses one taxi randomly from the closest ones, and rejects serving the request if it is further than a predefined threshold.

bokae / taxi Goto Github PK

taxi's Issues

Recommend Projects

Recommend Topics

Recommend Org