Hi there,
I'm not sure if this is a right place to ask the question - if yes, please forgive me.
First of all, congratulations on a great job!
My doubts, however, concern the calculations that take place in the SupplyChainEnvironment class, in the step method. I found there a line describing the equation:
next_state.factory_stocks = np.minimum(
np.subtract(np.add(state.factory_stocks,
action.production_level),
np.sum(action.shipped_stocks, axis=0)
),
self.storage_capacities[0]
)
If I understand correctly, the result of next_state.factory_stocks must not exceed the storage capacity. On the other hand, this is calculated after the action has already taken place, so it will not always be in accordance with the actual situation, because it might happen that after calculating next_state.factory_stocks we will receive value greater than we are able to stock (in this code excess poroducts evaporate). How can I understand this procedure in the context of the learning process?
Thank you in advance for response !