Coder Social home page Coder Social logo

nrel / hive Goto Github PK

View Code? Open in Web Editor NEW
21.0 21.0 9.0 17.4 MB

HIVE™ is a mobility services research platform

Home Page: https://nrelhive.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
python ridesharing simulation transportation transportation-simulation

hive's People

Contributors

clintonsteiner avatar erik-hasse avatar jhoshiko avatar jkirsh2 avatar machallboyd avatar nreinicke avatar robfitzgerald avatar roguh avatar zenon18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hive's Issues

test case of cosim with sim and dispatcher state updates

in order to confirm we can influence hive externally, we should build a test case which

  1. submits a mock dispatcher with state updates we can observe / that send instructions we can verify in test
  2. updates stations in some observable way via external state updates
  3. cranks a few time steps to witness these changes

step_simulation_ops.step_vehicle should return an error when it occurs

this method fails kinda-silently; it only logs errors at this level and then continues. instead, it should follow the same convention of returning an Optional[Exception]. of course, this implies that all calling functions would do the same, so, this will require refactors up to the runner-level. the top-level loop should use the idiom of carrying an accumulator of type Tuple[Optional[Exception], Optional[RunnerPayload]] with initial value of None, initial_rp.

ChargingBase.exit doesn't release stall; ChargingBase doesn't manage stall

ChargingBase.exit can leave a stall occupied.

it is only possible to enter a ChargingBase state from a ReserveBase state. ReserveBase modifies the available stalls at the base. when ChargingBase.exit is called, nothing is done to manage the checked out stall. ChargingBase.exit should have two outcomes:

  1. the next state is ReserveBase; no change to stalls
  2. the next state isn't ReserveBase; we need to release the stall as if we were exiting ReserveBase as well

first case above is the special case. it could be handled by making sure we are always checking out and returning stalls in both states:

ReserveBase.enter()  -> base.checkout_stall()
ReserveBase.exit()   -> base.return_stall()
ChargingBase.enter() -> base.checkout_stall()
ChargingBase.exit()  -> base.return_stall()

this will lead to extra operations but should be axiomatically the same as "knowing" we have a checked out stall (which could be known if we knew what state we were transitioning to, such as #27).

returns library

throughout hive, especially in the VehicleState FSM, there are methods with the following signature:

Tuple[Tuple[Exception], Tuple[T]]

where T may be SimulationState, or, some kind of vehicle state, or others. this describes a method which can fail, do nothing, or possibly return a T, and we allow these instances:

error, None  # experienced an error
None, t      # result of method, no error
None, None   # method fails without an error

where the combination "error and a result t" is disallowed.

this is based off the idea of the error-first callback in Node.js, but, also simulates a low-tech version of the Either monad from Scala. the Returns library also has it's implementation of this, with the Result and ResultE types.

Result would reduce our signatures from Tuple[Tuple[Exception], Tuple[T]] to ResultE[T], but, it also brings a lot more with it, some of which may be awkward to integrate into our code base. do we consider using Result?

alternatively a low-tech solution (which doesn't protect us as well from dev errors): should we simply add a type alias for Tuple[Tuple[Exception], Tuple[T]]:

T = TypeVar('T')
Result[T] = Tuple[Tuple[Exception], Tuple[T]]

Implement batch runner

Certain users of the model have expressed a need for an easy way to run multiple simulations in parallel. One suggested solution was to use a bash script that the hive command multiple times using:

hive my-sim1.yaml &
hive my-sim2.yaml

but this method has been failing when used in conjunction with slurm.

It would be helpful to have a new command (maybe hive-batch) that could load multiple simulations and execute them in parallel from a single overhead process.

Fix mypy errors

Running mypy over the hive project currently results in 586 errors. It would be nice to eventually go through and pick these off.

HIVE core written in lower-level language

the core of HIVE is Python. we could improve on memory management and runtime if HIVE were written in C++ or another language which can interface with Python.

the rewrite should be limited to "SimulationState", "Environment", and below. the instructions should still come in Python, though, the logic for their "apply" methods should join the refactor. immutability should still be enforced, and, Vehicles should still be serializable. id types could be made into integers (if the hive executable creates enumeration tables to maintain a reference to the user-provided ids).

vehicle_state_ops.move, MoveResult should support optional partition boundaries

in hive, when a vehicle is updated on move, it runs exactly to the end of the time step. however, in a distributed setting, it may reach the partition boundary before completing the time step. in base hive, this is not an issue.

in order to support both behaviors, the move method could be parameterized by the Optional[GeoId] of the boundary, defaulting to None. if a move reaches this boundary, we need to return the partially-consumed time step in the MoveResult. this includes the current time step, the entity location where we stopped, and the remaining, un-consumed time.

move needs to be refactored to inspect the vehicle state to see if any remaining_time is stored (see below).

modify the chain of methods inside move so that

  1. partially-consumed traversals are properly modeled in the resulting VehicleState (route, remaining_time)
  2. move uses the SimTime duration for execution unless the VehicleState provides remaining_time, which is then used instead

important for hive-dist: after such a move, the vehicle's entity position should be at the origin of the next link as opposed to the destination of the last traversed link: this allows us to discover that the vehicle is out-of-partition in hive-dist.

Q: all of this needs to travel down the pipeline from adding a vehicle and updating it manually in distributed simulation. does that require adding the Seconds remaining to the vehicle state for all move-able states? answering this question may be out-of-scope: since the default case of this is "None" we can take this first step without breaking anything, and save the final step for a separate issue.

a move state may have only partially completed its time step, and if so, the remaining unused time needs to be recorded as state of the vehicle. the only natural place to put this is in the VehicleState. this applies to all move actions.

autonomous vehicles issued ChargingBaseInstruction when no stalls available

this is pretty benign as the effect is just that vehicles stay in a ReserveBase state. that said, it pukes up a lot of warnings to the user:

image

this happens because we are not checking charger availability in driver_instruction_ops.py::av_charge_base_instruction():

    chargers = tuple(filter(
        lambda c: my_mechatronics.valid_charger(c),
        [env.chargers[cid] for cid in my_station.total_chargers.keys()]
    ))

is on_shift_access necessary?

i'm not totally clear on the meaning of a station.csv on_shift_access entry. is this

  • to protect a home base's charger from being used by other on-shift vehicles
  • to prevent the use of slow chargers when on-shift

if the former, can't this be inferred by the fleet id relating a home to a vehicle?

if the latter, don't we instead want to just rank fast chargers higher? and, maybe this means we add a weighting factor to the base ranking function (nearest_shortest_queue)?

tracking a charge session

create new ReportTypes for CHARGE_PLUG_IN_EVENT, CHARGE_PLUG_OUT_EVENT, and CHARGE_SESSION_EVENT.

create reports from

  • CHARGE_PLUG_IN_EVENT: DispatchStation. _enter_default_terminal_state (line 100)
  • CHARGE_PLUG_OUT_EVENT: ChargingBase/ChargingStation exit and _enter_default_terminal_state methods
  • CHARGE_SESSION_EVENT: created in the EventfulHandler by watching for for plug in events, storing them, and then watching for the corresponding plug out event plug out events. creates a new event from them.

dataframe of aggregate values by time step

currently there are two levels of reports in hive:

  1. event-level: records some entity doing something, producing state and event logs
  2. stats-level: overall stats shown in the console

we could use a middle level for analysis which aggregates to the time bin level. for each time step, we would collect a stat and store it at a cell on a DataFrame.

for example, consider state-of-charge (SoC). for each time bin, we would record the average SoC for all vehicles by way of the fuel_source_soc method on their mechatronics instance.

we would create a DataFrame with time as its index:

time soc ...
0 39.5% ...
1 40.2% ...
... ... ...

let's do this for the following columns:

  • soc: state of charge
  • vkt: distance traveled within a time ste
    • this may be tricky, we don't record the delta of this anywhere but in a "vehicle_move_event", that may be what we need to use to capture this effect
  • active_requests count of active requests awaiting vehicle dispatch (unassigned)
  • canceled_requests count of requests that were cancelled at this time
  • assigned_requests count of requests with a vehicle assigned to them
  • servicing_requests count of requests currently being serviced by a vehicle
  • a column for each vehicle state, holding the count of vehicles in that state
  • a column for each charger type, holding the count of chargers in use''

to create this dataframe, we need a new reporter handler. on "close", this handler should write the dataframe to the output directory.

assign travel time to networkx links

for each link, add a new attribute "travel_time" which is computed from the link distance and speed. in route methods, reference this column as the weight argument to nx.shortest_path.

Modify reporting for vehicle charge events

Right now we record vehicle charge events in a marginal time-step fashion; for each time step a vehicle is charging, we emit an event for that time step. Unfortunately, this makes it difficult to aggregate this information to the entire charging session. I can think of two ways of addressing this:

  1. Add a session id variable to each time step based charge event report such that we can do a group by on that id to get session level stats.
  2. Only yield a charge event when we exit a charging state, reporting the effects of the entire charging session (including vehicle state of charge in and out).

I would vote we proceed with option 2 but would like to hear the thoughts of others.

Add more context to simulation state errors.

Some instances of SimulationStateErrors are a little vague and could use a bit more context for those instances in which they are raised.

For example, with a vehicle in the dispatching trip state, when the state is updated and the vehicle is not found in the simulation state, we raise an error: SimulationStateError(f"vehicle {self.vehicle_id} not found"). It would be helpful if we added some additional context such as SimulationStateError(f"vehicle {self.vehicle_id} not found when attempting to perform update on DispatchTrip vehicle state")

assign applied instructions to the SimulationState during Update so the final set of instructions can be inspected

there is an "instructions" field on SimulationState which could store the applied instructions but is not actually being used.

in hive-distributed, we need to inspect how many RepositionInstructions were finally applied. we can either do some weird unintended stuff with logging, or, we can use the SimulationState.instructions and then inspect it. the latter seems to be the better move.

Pooling - interruptable ServicingPoolingTrip requires modification to FSM

as it stands, it is possible to initiate a pooling trip at time t for n requests, but, it is not possible to add an additional request to the same trip at time t + 1.

this is due to the fact that the ServicingPoolingTrip.exit method would be called in order to modify (transition) the state. since the exit method does not know the type of state being transitioned to, it cannot trust having it's exit method called, as it could possibly lead to stranded requests.

in order to address this, it must be possible to know the full state transition being attempted. two ways to do this:

  1. make Vehicle.vehicle_state a Stack, which allows new states to be added without removing the old
  • all vehicles start with Stack(Idle())
  • all exit, enter, update ops need to become Stack-aware; also true of VehicleState, EntityState ops
  1. add "next_state" to the signature of VehicleState.exit
  • then ServicingPoolingTrip.exit could inspect the next state to confirm it is yet another ServicingPoolingTrip state

1 looks more "correct" but 2 is easier and has no downside to it 🤷‍♂️

time_step_stats_handler throws exception due to missing fleet id in hive-distributed

in the test scenario, the fleet "fhv" is associated with all vehicle ids, where a vehicle id takes the name fhv_{n} for the n-1'th vehicle. the file looks like this:

$ head /projects/mbap/hive/scenarios/nyc_v60k_r1500k/static/nyserda_100perc_fhv_15hc_warm_final_1__fleets.yaml 
fhv:
  bases: []
  stations: []
  vehicles:
  - fhv_0
  - fhv_1
  - fhv_2
  - fhv_3
  - fhv_4
  - fhv_5

the exception is thrown at line 198 in time_step_stats_handler, where there is an unsafe lookup called on an immutables.Map which is left in mutate() mode. Map()s are meant to be immutable so this breaks that invariant. the logic should be re-written with a Dict or left immutable after initialization. regardless, the logic may not be correct for all states that are possible in a hive distributed run.

error:

Traceback (most recent call last):
  File "/home/rfitzger/envs/hive-distributed-mamba/bin/hive-distributed", line 33, in <module>
    sys.exit(load_entry_point('nrel-hive-distributed', 'console_scripts', 'hive-distributed')())
  File "/home/rfitzger/hive-distributed/hive_distributed/app/run_scenario.py", line 130, in run
    responses = ray.get(next_step_ids)
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/site-packages/ray/worker.py", line 1621, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(KeyError): ray::WorkerActor.advance() (pid=28947, ip=10.148.8.7, repr=<hive_distributed.actor.worker.worker_actor.WorkerActor object at 0x2b9ec3709050>)
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/rfitzger/hive-distributed/hive_distributed/actor/worker/worker_actor.py", line 308, in advance
    self.partitioning)
  File "/home/rfitzger/hive-distributed/hive_distributed/actor/worker/ops/sim_advance.py", line 118, in advance_simulation
    err, result = ft.reduce(_advance, sim_times, initial)
  File "/home/rfitzger/hive-distributed/hive_distributed/actor/worker/ops/sim_advance.py", line 85, in _advance
    next_runner_payload.e.reporter.flush(next_runner_payload)
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/site-packages/hive/reporting/reporter.py", line 49, in flush
    handler.handle(self.reports, runner_payload)
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/site-packages/hive/reporting/handler/time_step_stats_handler.py", line 198, in handle
    filter_function=lambda v: fleet_id in self.vehicle_membership_map[v.id]
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/site-packages/hive/state/simulation_state/simulation_state.py", line 140, in get_vehicles
    return tuple(filter(filter_function, vehicles))
  File "/home/rfitzger/envs/hive-distributed-mamba/lib/python3.7/site-packages/hive/reporting/handler/time_step_stats_handler.py", line 198, in <lambda>
    filter_function=lambda v: fleet_id in self.vehicle_membership_map[v.id]
KeyError: 'fhv_29392'

refactor "move" logic in VehicleStates

all vehicles "move" the same way in HIVE. however, the code to execute this move is duplicated across the various _perform_update methods for moving VehicleStates, which makes it brittle in the case of future changes. the design of mechatronics.move, vehicle_state_ops.move, and underlying _apply_route_traversal is unclear.

it would be ideal if there were a single "move" method which could be called by the _perform_update methods. perhaps this would combine the current logic in _perform_update with vehicle_state_ops.move and _apply_route_traversal into one function.

the only reason this didn't happen in the past is that it looked weird. the solution involved passing the VehicleState as an argument, and that seems like an anti-pattern. is there a better way?

Update function signatures to use result type

Right now our functions that might fail mid-simulation have a signature that looks like:

def function_that_might_fail() -> Tuple[Optional[Exception], Optional[Type]]:

We could add in this Result type such that the signature would be reduced to:

def function_that_might_fail() -> Result[Type, str]:

This be especially useful if we implement #83 as it would give us a mypy error if we don't handle the result properly.

histograms

a few histograms would improve understanding HIVE results, and this data could be collected in-the-loop of a HIVE simulation.

handler

create a handler for these histograms. it should store the data for chargers (below) throughout the sim, and then on close, it should create all histograms. a "histogram" here is a CSV file, not an image, with the values by bin. for distances, we should pull a configuration parameter from .hive.yaml for "distance_histogram_bins" which should by default be 20. the number of bins for the charger histograms should equal the max observed number of charge events.

distances

at the end of a simulation, let's collect the distance traveled by each vehicle as a histogram. using pandas is fine. we won't need to accumulate data at each time step, we can simply compute this at the end of the sim in the Handler.close method.

chargers

we are interested in the number of times vehicles used chargers. we want 1 histogram per charger type. in order to count this, we need to catch all events when a vehicle begins charging. unfortunately, we don't capture this as an event. let's add a new ReportType.BEGIN_CHARGING_EVENT, which gets called in DispatchStation._enter_default_terminal_state, and then we can catch these in the histogram handler.

refactor Passenger into more generic "Resource" (design)

a Passenger is a kind of thing of which Resource more broadly defines. in the case of Passenger, it is a resource with a pickup and dropoff location which takes up a passenger seat.

in order to allow HIVE to model food delivery / postal delivery / freight delivery problems, the Request concept needs to be made more generic. is the resource a passenger? does it allow_pooling? can it share a vehicle with other types of resources? does it have an expected delivery time? if the destination rejects it, can it be returned to it's origin?

  1. consider the effects that could capture a broader "resource" type
  2. consider what would need to change in the vehicle FSM
  3. come up with a design for a Resource

update attribute names

A collection of updates to attribute names based on recent work with the hive technical report:

  • rename Request.value to Request.price

Station Charger rates can be modified

in order to support electric grid state modifications in HIVE, we want Stations to maintain a local collection of Charger instances which can have their rate values modified.

a station charging update function would take a Station and the Environment and return an optional list of chargers to replace the existing list.

that update fails if the returned list does not include each charger type. it should also fail if rates are negative. maybe it should fail if the rates exceed the amount set in the environment for that charger type?

count vehicles in timestep reporting

in order to perform a weighted average of SoC values in hive-distributed, we need, on a timestep-by-timestep basis, the total count of vehicles. while this can be summed over the vehicle state columns, computing that is clumsy.

pooling physics

(replica of https://github.nrel.gov/MBAP/hive/issues/484)

HIVE is a ride hail simulator. a popular feature of ride hail services (and popular research problem) is the simulation of ride hail pooling trips. the physics of pooling and the implementation of a pooling-aware dispatcher are separate problems in the HIVE implementation. before adding pooling dispatch, the underlying physics needs to be implemented, in the form of Vehicle states, instructions, etc.

pooling flow

  • a given request is "ok with pooling their ride"
  • a driver is in, or, enters a vehicle state where they can be dispatched to pick up a pooling trip
    • constraint: this driver has enough seats to pick up the request's passengers
  • while in transit, the vehicle can receive other pooling pickup instructions which modify the route plan

simulation parameters

  • a service pricing model for pooling
  • pooling request boolean on Request .csv files
  • vehicle "can pool" boolean on Vehicle .csv files
  • vehicle "available seats" on Vehicle .csv files

come up with the design for pooling physics and implement them.

TaskIds for VehicleState equality

in hive-distributed, we compare VehicleStates for near-sameness. in a proposed PR, there is a note pointing out that this is possibly flawed.

as an alternative, a VehicleState could have a simple unique TaskId which could be idempotent through VehicleState.update calls. then task sameness could be a simple test of

TaskId = str
def same_task(v0, v1):
  return v0.vehicle_state.task_id == v1.vehicle_state.task_id

some thoughts:

  • this id could be generated using python's built-in uuid generator (using uuid4)
  • uuid.uuid4() creates a "random uuid"; we should confirm that importing the uuid module in different files does not produce duplicate results due to seed values
  • VehicleState should have a @property @abstractmethod def task_id
  • all VehicleState.default_update methods would need to make sure they use ._replace when updating local state, to retain this TaskId. if not possible, they should explicitly copy the TaskId into a new VehicleState object (unless the new object is a "different task", such as moving from Repositioning to Idle)

new year new formatting!

Okay, back at it again with another formatting discussion. I closed #74 since it was causing several merge conflicts and we haven't formally agreed on what we want to use as a formatting standard.

Reading around, I've formed a tentative opinion that the black formatter is becoming a 'standard' and I've made some attempts to test it out. If you're using Visual Studio Code, you can apply the black formatter to a file using [alt + shift + F] (probably similar on windows?). And, I was able to set up a git pre-commit hook that applies the black formatter to your files upon each commit, letting you review the results.

What are your thoughts?

exception chaining in HIVE without raising

overview

based on this PEP, it looks like

raise EXCEPTION from CAUSE

is equivalent to:

exc = EXCEPTION
exc.__cause__ = CAUSE
raise exc

so, for HIVE to get into the business of exception chaining without raiseing exceptions, we simply need to add this cause to an exception.

example

in the vehicle state classes, there's a lot of passing along errors. things exist like:

error, updated_base = base.return_stall()
if error:
    return error, None
else:
    ...

since we just pass along the error, the user only sees that f'base {self.id} already has max ({self.total_stalls}) stalls' but they don't know that this took place during a ReserveBase.exit operation. to fix this, we can wrap the error before returning it, via exception chaining:

error, updated_base = base.return_stall()
if error:
    exit_error = SimulationStateError(f"unable to call ReserveBase.exit for vehicle {self.vehicle_id}")
    exit_error.__cause__ = error
    return exit_error, None
else:
    ...

report fleet_id when generating request-related event reports

although it is possible to associate a fleet_id with a request in HIVE, the reporting logic for the various request reports omits data about the fleet membership, if present.

a Request should have at most one membership. grabbing the string representation of the request's membership object - str(req.membership) - will cover the possible values it can take (no membership, single membership).

check for the following reporting payloads and add a pair for "fleet_id": str(self.membership)" to that data payload, for the following report types:

ADD_REQUEST_EVENT
PICKUP_REQUEST_EVENT
DROPOFF_REQUEST_EVENT
CANCEL_REQUEST_EVENT

Link class represents too many different concepts

porting https://github.nrel.gov/MBAP/hive/issues/543 from the enterprise repo

right now we use the Link abstraction to represent:

  • the underlying (fixed, un-modifiable) link topology
  • the location of a stationary entity (original Link with modified start + end GeoId)
  • the partially-traversed state of a Link by a Vehicle (original Link with either mod'd start or end)

that's a bit overused, and fairly unclear in the latter two settings. we could use a different data structure to represent these kinds of Links.

stationary entity locations

we want to enforce that these stationary positions are GeoIds along a Link. however, the start/end GeoIds are confusing in this setting and are an opportunity for human error. we could implement a different data structure to track these positions, such as:

class EntityPosition(NamedTuple):
  link_id: LinkId
  geoid: GeoId
link traversal

in a Route, we currently store Links, where we really want to store an abstraction which simply represents the amount of a link that we will traverse. when we traverse this link, we should really be using the RoadNetwork to do so. that way, if the speeds have been updated, we are not relying on out-dated Link attributes from when we assigned the Route.

this could be represented by something like:

class LinkTraversal(NamedTuple):
  link_id: LinkId
  start: GeoId
  end: GeoId

Derive link speed from road network

NOTE: this is dependent on the incorporation of #19

When we move a vehicle we consume a previously generated link traversal object which has fixed speed and travel time information. In order to future proof this for inclusion of variable links speeds, we should probably get speed and travel time information from the road network at the time it's consumed.

quantization errors in discrete time stepping of dispatch states

i think that's the right word for it. it's at least what we would call it in audio engineering.

this came up while writing tests for incoming conflict resolution in hive-distributed. it shows a vehicle in DispatchBase state, where the vehicle position and base position are exactly the same at the beginning. i would expect it would take 1 update to get this vehicle to ReserveBase, but, it takes 2 here:

1970-01-01T00:01:00: Vehicle(v0,DispatchBase(vehicle_id='v0', base_id='b0', route=(LinkTraversal(link_id='8f268cdac268432-8f268cdac268432', start='8f268cdac268432', end='8f268cdac268432', distance_km=0.0, speed_kmph=40),)))
1970-01-01T00:02:00: Vehicle(v0,DispatchBase(vehicle_id='v0', base_id='b0', route=()))
1970-01-01T00:03:00: Vehicle(v0,ReserveBase(vehicle_id='v0', base_id='b0'))
1970-01-01T00:04:00: Vehicle(v0,ReserveBase(vehicle_id='v0', base_id='b0'))

the first step "fully consumes" the route (which was generated by RoadNetwork.route via the HaversineRoadNetwork). the second step realizes it has an empty route and applies the transition.

that hiccup there is a 1-timestep penalty against agents performing tasks. the effect of this may be less trivial then in the case of DispatchTrip or DispatchStation states.

wondering, is this just an artifact of how route is implemented in HaversineRoadNetwork? more specifically, does route(geoid, geoid) return an empty tuple in OSMRoadNetwork instead? if so, HaversineRoadNetwork should repeat this behavior. but, if this isn't the problem, then we may need a review of the VehicleState methods.

pretty print dict in error when encountering invalid request row

we encounter invalid rows in the manhattan scenario:

[ERROR] - hive.state.simulation_state.update.update_requests_from_file - unable to parse request 11061 from row due to invalid value(s): OrderedDict([('request_id', '11061'), ('o_lat', '40.780457'), ('o_lon', '-73.980317'), ('d_lat', '0'), ('d_lon', '0'), ('departure_time', '71820'), ('passengers', '1')])

the issue here is that the destination lat/lon parses as (0, 0), but that is not apparent. it would help if the dictionary representing the row was pretty-printed as JSON instead of printed as an OrderedDict(...)

API for grid co-simulation

create an API to load and crank HIVE from a grid simulation. when cranked, the updated sim and all charge events should be returned from the crank method.

remove membership map, add membership to any report

an easier way to aggregate reports by fleet in a distributed context is to just ensure that the membership ids are attached directly to reports. with this done, the membership map can be removed from the Environment.

stats handler compile_stats() depends on first calling "close" but close is for writing to file

this is issue #558 from the original repo

in hive-distributed, we want to be able to collect the stats observed in a partitioned hive run. to do this, we call:

rp.e.reporter.get_summary_stats()

for each rp: RunnerPayload returned after running. however, the mean_final_soc, station_revenue and fleet_revenue is calculated in the StatsHander.close() method which itself calls get_summary_stats(). this method also has the side effect of attempting to write a file, which causes an exception in distributed hive where the output paths are not created/valid (a separate issue).

re-factor these methods so that we can get the summary stats without performing the close() operation.

split time step aggregate results by FleetId

the work in #31 set up aggregate values by time step. let's break this down by fleet. for example, we have "fleet a" and "fleet b". let's track 3 DataFrames, one for each fleet and one for "all vehicles" (ignore fleet info). this will require keeping a Map[FleetId, DataFrame] in the handler, and writing the results to 3 different file locations.

some vehicles will end up having their data added to more than one output file.

Use frozen dataclass instead of NamedTuple for immutable classes

When testing hive using python 3.9 and 3.10, we get the following error from each class that we use the NamedTuple as a base class.

TypeError: Multiple inheritance with NamedTuple is not supported

One idea for a solution is to move our immutable classes to be frozen dataclasses. We could then use a typical abstract base class and get the same behavior as the named tuple. For example:

from dataclasses import dataclass, replace
from abc import ABC, abstractmethod

class Base(ABC):
    @abstractmethod
    def sum(self):
        pass

@dataclass(frozen=True)
class Foo(Base):
    a: int
    b: int
    
    def sum(self) -> int:
        return self.a + self.b
    
    def update_a(self, new_value: int) -> Foo:
        return replace(self, a=new_value)

cosim input for a custom dispatcher and custom dispatcher update function

in order to affect HIVE's dispatch modules from some external source, we need to expose custom dispatch updates at the cosim level.

in hive.app.cosim we have a load_simulation function which sets the instruction generators to our default set. but there is an API when we instantiate our update functions which allows us to pass in any set of instruction generators along with an update function to apply. this allows for externally-sourced HIVE state updates to our control module.

identifying the module to update

there is a bad design here where the update function must do a type-check against the set of instruction generators to select which generator to modify. this could be replaced by a system which uses a unique identifier such as a name or an index when selecting the generator to update.

other requirements

  • our solution should have good ergonomics for the user who is leveraging the update API
  • failures should bubble up somewhere logical
    • maybe as a field in CrankResult?

top-level API for making state changes

we should expose a top-level API for injecting a change to the entities in the simulation state which is fluent and simple. as a test case we should make a state change to a station's charger rates and confirm that the effect is observable.

expose a way to modify reports to allow distributed hive to tag reports with partition ids

in hive-distributed, we run p separate instances of HIVE which write to their own output file. in this setting, it may be helpful to tag each report row with it's corresponding partition_id so that when the files are joined, that information is not lost.

to support this, Reporter could have a tagFn: Optional[Callable[[Report], Report]] = None in it's constructor which could be called within the Reporter.file_report method, if it is not None. it could be used to modify the incoming Report.

in the hive-distributed setting, the Reporter callable could be a lambda like this:

Reporter(tagFn=lambda r: r.report.update({ "partition_id": p_id }))

this is also open-ended for any other reasons we may need to tag Reports.

Error when loading cached yaml file

When attempting to run a scenario from a cached yaml file (i.e. the one that gets written to the output directory) I'm met with the following error:

yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/apply:collections.OrderedDict'

It looks like the yaml dump is writing our config objects as ordered dicts but we should probably just dump these in the same format as the original yaml.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.