flyyufelix / c51-ddqn-keras Goto Github PK

View Code? Open in Web Editor NEW

125.0 11.0 31.0 45.39 MB

C51-DDQN in Keras

Home Page: https://flyyufelix.github.io/2017/10/24/distributional-bellman.html

License: MIT License

Python 100.00%

keras reinforcement-learning vizdoom c51 distributional-bellman

c51-ddqn-keras's People

Contributors

Stargazers

Watchers

c51-ddqn-keras's Issues

For Distributed Dueling DDQN, what would be action advantage tower and state value tower like?

Hi Felix,

I am learning your C51 code and trying to replicate the Rainbow DQN, but I am confused whether action advantage tower should be:
"action_advantage = Lambda(lambda a: a[:, :, :] - K.mean(a[:, :, :], keepdims=True), output_shape=(action_size, z_atoms,))(action_advantage)"
or:
"action_advantage = Lambda(lambda a: a[:, :, :] - K.expand_dims(K.mean(a[:, :, :], axis=1), axis=1), output_shape=(self.action_size, self.z_atoms,))(action_advantage)"

Could you please kindly give me a hand.
Thanks indeed for your help.

def build_network(self, input_shape, action_size, algorithm=Algorithm.RAINBOW, network_type=NetworkType.RESIDUAL, z_atoms=51):

inputs_x = x = Input(shape=(input_shape))

x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg))(x)
x = BatchNormalization(axis=1)(x)
x = Activation("relu")(x)

for _ in range(self.n_residual_block):
	in_x = x

	x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg), name="res"+str(_)+"_Conv1")(x)
	x = BatchNormalization(axis=1, name="res"+str(_)+"_batchnorm1")(x)
	x = Activation("relu")(x)
	x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg), name="res"+str(_)+"_Conv2")(x)
	x = BatchNormalization(axis=1, name="res"+str(_)+"_batchnorm2")(x)
	x = Add()([in_x, x])
	x = Activation("relu")(x)	

x = Flatten()(x)

state_value = NoisyDense(self.noisydense_units, self.noisydense_init_sigma,self.noisydense_activation)(x)
state_value = NoisyDense(1*z_atoms, self.noisydense_init_sigma, self.noisydense_activation_last)(state_value)
state_value = Lambda(lambda s: K.expand_dims(s[:, :], axis=1), output_shape=(action_size, z_atoms,))(state_value)

action_advantage = NoisyDense(self.noisydense_units, self.noisydense_init_sigma, self.noisydense_activation)(x)
action_advantage = NoisyDense(action_size*z_atoms, self.noisydense_init_sigma, self.noisydense_activation_last)(action_advantage)
action_advantage = Lambda(lambda a: K.reshape(a[:, :],[-1, action_size, z_atoms]), output_shape=(action_size, z_atoms,))(action_advantage)
action_advantage = Lambda(lambda a: a[:, :, :] - K.mean(a[:, :, :], keepdims=True), output_shape=(action_size, z_atoms,))(action_advantage)

state_action_value = merge([state_value, action_advantage], mode='sum')

output_distribution_list = []
for i_ in range(action_size):
	output_distribution_list.append(Lambda(lambda sa: sa[:,i_,:], output_shape=(z_atoms,))(state_action_value))

model = Model(inputs=inputs_x, outputs=output_distribution_list)
model.compile(loss='categorical_crossentropy', optimizer=rmsprop(lr=self.learning_rate))`

Issue with defend_the_center.cfg

Latest version of ViZDoom have this line in scenarios/defend_the_center.cfg
available_game_variables = { AMMO2 HEALTH }

But your code expects to have three variables, so this line need to be changed to this
available_game_variables = { KILLCOUNT AMMO2 HEALTH }

It's not a big deal, but should be reflected in the README, I think.

Additionally, the training process crashes without statistics dir created beforehead.

Could you please explain the reason behind a line of code

q = np.sum(np.multiply(z_concat, np.array(self.z)), axis=1)
what is the point in this code when you design which action is optimal?

FileNotFoundError: [Errno 2] No such file or directory: 'statistics/c51_ddqn_stats.txt'

Update Rolling Statistics
Traceback (most recent call last):
File "c51_ddqn.py", line 341, in
with open("statistics/c51_ddqn_stats.txt", "w") as stats_file:
FileNotFoundError: [Errno 2] No such file or directory: 'statistics/c51_ddqn_stats.txt'

DDQN compute target

Hey, nice implementation and blog post. I have a question about the code, for these 2 lines:

z = self.model.predict(next_states) # Return a list [32x51, 32x51, 32x51]
z_ = self.model.predict(next_states) # Return a list [32x51, 32x51, 32x51]

where z_ is used to compute the target, and z is used to draw the max value action. But shouldn't it be:

z_ = self.target_model.predict(next_states) # Return a list [32x51, 32x51, 32x51]

Issue when m_u equals m_l

When updating the target distribution, if m_u == m_l == bj then the updated probability will be 0.

m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Easy fix is:

if m_u == m_l:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j]
else:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Same goes for when the target is a delta function (termination state):

.... else:
m_prob[action[i]][i][int(m_u)] += 1

IndexError: index 2 is out of bounds for axis 0 with size 2

Traceback (most recent call last):
File "c51_ddqn.py", line 290, in
r_t = agent.shape_reward(r_t, misc, prev_misc, t)
File "c51_ddqn.py", line 123, in shape_reward
if (misc[2] < prev_misc[2]): # Loss HEALTH
IndexError: index 2 is out of bounds for axis 0 with size 2

Could someone kindly help me?
Thanks.

flyyufelix / c51-ddqn-keras Goto Github PK

c51-ddqn-keras's People

Contributors

Stargazers

Watchers

Forkers

c51-ddqn-keras's Issues

For Distributed Dueling DDQN, what would be action advantage tower and state value tower like?

Issue with defend_the_center.cfg

Could you please explain the reason behind a line of code

FileNotFoundError: [Errno 2] No such file or directory: 'statistics/c51_ddqn_stats.txt'

DDQN compute target

Issue when m_u equals m_l

IndexError: index 2 is out of bounds for axis 0 with size 2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent