Comments (3)
Hi @philtabor
I see that in line
a_t += discount*(reward_arr[k] + self.gamma*values[k+1] *(1-int(dones_arr[k])) - values[k])
k+1 and k are suppose to be next_state_value and current_state_value respectively. But while making the batches
def generate_batches(self):
n_states = len(self.states)
batch_start = np.arange(0, n_states, self.batch_size)
indices = np.arange(n_states, dtype=np.int64)
np.random.shuffle(indices)
batches = [indices[i:i+self.batch_size] for i in batch_start]
we have used shuffle so now the batch which we are receiving need not be in the proper order. that is the reason of my confusion.
from youtube-code-repository.
As far as I can see everything is correct. Yes, it's true that while generating batches we use shuffle, but it is only used to shuffle indices. So state_arr, action_arr, old_prob_arr, vals_arr, reward_arr, dones_arr
are returned in the correct order.
Then, when we compute a_t += discount*(reward_arr[k] + self.gamma*values[k+1] *(1-int(dones_arr[k])) - values[k])
we iterate through indices of original arrays (we don't use batches here) (for t in range(len(reward_arr)-1):
and for k in range(t, len(reward_arr)-1):
) and get data using t
or k
here a_t += discount*(reward_arr[k] + self.gamma*values[k+1]*\ (1-int(dones_arr[k])) - values[k])
. So in m view, k+1 and k are next_state_value and current_state_value indices respectively.
Please, correct me if I missed something. Thanks in advance!
from youtube-code-repository.
Hi @NonameUntitled I went through it again it seems you are correct. Function returns state_arr, action_arr, old_prob_arr, vals_arr, reward_arr, dones_arr
in correct order and only batches
has the shuffled indices so calculation must be right..Thank you for the response.
from youtube-code-repository.
Related Issues (20)
- SAC custom env HOT 1
- magicSquares to self.magicSquares
- Does not start your python code.
- Error when I changed dueling_ddqn_torch.py to get multiple dicrete actions HOT 3
- D3QN for Multiple Action Selection
- main_keras_dqn_lunar_lander first env.reset() array plus empty dict
- Error in Store_transition in pytorch dqns HOT 4
- Issue HOT 3
- simple_dqn_tf2.py Doesn't allow for multiple return actions
- DQN HOT 1
- ActorNetwork - sample_normal method log_probs issue
- torch multiprocessing library
- PPO pytorch implementation question HOT 2
- API has changed, `state_steps` argument must contain a list of singleton tensors
- Is there a way to swap the actor models out for boosting libraries?
- 'Sequential' object has no attribute 'model' in the DDQN_Keras code HOT 1
- Policy Gradient, SAC doesn't learn HOT 2
- DDPG on Traffic Light Control
- Mountain Car Continuous does not learn
- Gift
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from youtube-code-repository.