Coder Social home page Coder Social logo

flyyufelix / c51-ddqn-keras Goto Github PK

View Code? Open in Web Editor NEW
125.0 11.0 31.0 45.39 MB

C51-DDQN in Keras

Home Page: https://flyyufelix.github.io/2017/10/24/distributional-bellman.html

License: MIT License

Python 100.00%
keras reinforcement-learning vizdoom c51 distributional-bellman

c51-ddqn-keras's Issues

Issue with defend_the_center.cfg

Latest version of ViZDoom have this line in scenarios/defend_the_center.cfg
available_game_variables = { AMMO2 HEALTH }

But your code expects to have three variables, so this line need to be changed to this
available_game_variables = { KILLCOUNT AMMO2 HEALTH }

It's not a big deal, but should be reflected in the README, I think.

Additionally, the training process crashes without statistics dir created beforehead.

For Distributed Dueling DDQN, what would be action advantage tower and state value tower like?

Hi Felix,

I am learning your C51 code and trying to replicate the Rainbow DQN, but I am confused whether action advantage tower should be:
"action_advantage = Lambda(lambda a: a[:, :, :] - K.mean(a[:, :, :], keepdims=True), output_shape=(action_size, z_atoms,))(action_advantage)"
or:
"action_advantage = Lambda(lambda a: a[:, :, :] - K.expand_dims(K.mean(a[:, :, :], axis=1), axis=1), output_shape=(self.action_size, self.z_atoms,))(action_advantage)"

Could you please kindly give me a hand.
Thanks indeed for your help.

`

def build_network(self, input_shape, action_size, algorithm=Algorithm.RAINBOW, network_type=NetworkType.RESIDUAL, z_atoms=51):

inputs_x = x = Input(shape=(input_shape))

x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg))(x)
x = BatchNormalization(axis=1)(x)
x = Activation("relu")(x)

for _ in range(self.n_residual_block):
	in_x = x

	x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg), name="res"+str(_)+"_Conv1")(x)
	x = BatchNormalization(axis=1, name="res"+str(_)+"_batchnorm1")(x)
	x = Activation("relu")(x)
	x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg), name="res"+str(_)+"_Conv2")(x)
	x = BatchNormalization(axis=1, name="res"+str(_)+"_batchnorm2")(x)
	x = Add()([in_x, x])
	x = Activation("relu")(x)	

x = Flatten()(x)

state_value = NoisyDense(self.noisydense_units, self.noisydense_init_sigma,self.noisydense_activation)(x)
state_value = NoisyDense(1*z_atoms, self.noisydense_init_sigma, self.noisydense_activation_last)(state_value)
state_value = Lambda(lambda s: K.expand_dims(s[:, :], axis=1), output_shape=(action_size, z_atoms,))(state_value)

action_advantage = NoisyDense(self.noisydense_units, self.noisydense_init_sigma, self.noisydense_activation)(x)
action_advantage = NoisyDense(action_size*z_atoms, self.noisydense_init_sigma, self.noisydense_activation_last)(action_advantage)
action_advantage = Lambda(lambda a: K.reshape(a[:, :],[-1, action_size, z_atoms]), output_shape=(action_size, z_atoms,))(action_advantage)
action_advantage = Lambda(lambda a: a[:, :, :] - K.mean(a[:, :, :], keepdims=True), output_shape=(action_size, z_atoms,))(action_advantage)

state_action_value = merge([state_value, action_advantage], mode='sum')

output_distribution_list = []
for i_ in range(action_size):
	output_distribution_list.append(Lambda(lambda sa: sa[:,i_,:], output_shape=(z_atoms,))(state_action_value))

model = Model(inputs=inputs_x, outputs=output_distribution_list)
model.compile(loss='categorical_crossentropy', optimizer=rmsprop(lr=self.learning_rate))`

DDQN compute target

Hey, nice implementation and blog post. I have a question about the code, for these 2 lines:

z = self.model.predict(next_states) # Return a list [32x51, 32x51, 32x51]
z_ = self.model.predict(next_states) # Return a list [32x51, 32x51, 32x51]

where z_ is used to compute the target, and z is used to draw the max value action. But shouldn't it be:

z_ = self.target_model.predict(next_states) # Return a list [32x51, 32x51, 32x51]

??

Issue when m_u equals m_l

When updating the target distribution, if m_u == m_l == bj then the updated probability will be 0.

m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Easy fix is:

if m_u == m_l:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j]
else:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Same goes for when the target is a delta function (termination state):

.... else:
m_prob[action[i]][i][int(m_u)] += 1

IndexError: index 2 is out of bounds for axis 0 with size 2

Traceback (most recent call last):
File "c51_ddqn.py", line 290, in
r_t = agent.shape_reward(r_t, misc, prev_misc, t)
File "c51_ddqn.py", line 123, in shape_reward
if (misc[2] < prev_misc[2]): # Loss HEALTH
IndexError: index 2 is out of bounds for axis 0 with size 2

Could someone kindly help me?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.