Coder Social home page Coder Social logo

flyyufelix / c51-ddqn-keras Goto Github PK

View Code? Open in Web Editor NEW
125.0 11.0 31.0 45.39 MB

C51-DDQN in Keras

Home Page: https://flyyufelix.github.io/2017/10/24/distributional-bellman.html

License: MIT License

Python 100.00%
keras reinforcement-learning vizdoom c51 distributional-bellman

c51-ddqn-keras's People

Contributors

flyyufelix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

c51-ddqn-keras's Issues

For Distributed Dueling DDQN, what would be action advantage tower and state value tower like?

Hi Felix,

I am learning your C51 code and trying to replicate the Rainbow DQN, but I am confused whether action advantage tower should be:
"action_advantage = Lambda(lambda a: a[:, :, :] - K.mean(a[:, :, :], keepdims=True), output_shape=(action_size, z_atoms,))(action_advantage)"
or:
"action_advantage = Lambda(lambda a: a[:, :, :] - K.expand_dims(K.mean(a[:, :, :], axis=1), axis=1), output_shape=(self.action_size, self.z_atoms,))(action_advantage)"

Could you please kindly give me a hand.
Thanks indeed for your help.

`

def build_network(self, input_shape, action_size, algorithm=Algorithm.RAINBOW, network_type=NetworkType.RESIDUAL, z_atoms=51):

inputs_x = x = Input(shape=(input_shape))

x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg))(x)
x = BatchNormalization(axis=1)(x)
x = Activation("relu")(x)

for _ in range(self.n_residual_block):
	in_x = x

	x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg), name="res"+str(_)+"_Conv1")(x)
	x = BatchNormalization(axis=1, name="res"+str(_)+"_batchnorm1")(x)
	x = Activation("relu")(x)
	x = Conv2D(filters=self.cnn_filter_num, kernel_size=self.cnn_filter_size, padding="same", data_format=self.data_format, kernel_regularizer=l2(self.l2_reg), name="res"+str(_)+"_Conv2")(x)
	x = BatchNormalization(axis=1, name="res"+str(_)+"_batchnorm2")(x)
	x = Add()([in_x, x])
	x = Activation("relu")(x)	

x = Flatten()(x)

state_value = NoisyDense(self.noisydense_units, self.noisydense_init_sigma,self.noisydense_activation)(x)
state_value = NoisyDense(1*z_atoms, self.noisydense_init_sigma, self.noisydense_activation_last)(state_value)
state_value = Lambda(lambda s: K.expand_dims(s[:, :], axis=1), output_shape=(action_size, z_atoms,))(state_value)

action_advantage = NoisyDense(self.noisydense_units, self.noisydense_init_sigma, self.noisydense_activation)(x)
action_advantage = NoisyDense(action_size*z_atoms, self.noisydense_init_sigma, self.noisydense_activation_last)(action_advantage)
action_advantage = Lambda(lambda a: K.reshape(a[:, :],[-1, action_size, z_atoms]), output_shape=(action_size, z_atoms,))(action_advantage)
action_advantage = Lambda(lambda a: a[:, :, :] - K.mean(a[:, :, :], keepdims=True), output_shape=(action_size, z_atoms,))(action_advantage)

state_action_value = merge([state_value, action_advantage], mode='sum')

output_distribution_list = []
for i_ in range(action_size):
	output_distribution_list.append(Lambda(lambda sa: sa[:,i_,:], output_shape=(z_atoms,))(state_action_value))

model = Model(inputs=inputs_x, outputs=output_distribution_list)
model.compile(loss='categorical_crossentropy', optimizer=rmsprop(lr=self.learning_rate))`

Issue with defend_the_center.cfg

Latest version of ViZDoom have this line in scenarios/defend_the_center.cfg
available_game_variables = { AMMO2 HEALTH }

But your code expects to have three variables, so this line need to be changed to this
available_game_variables = { KILLCOUNT AMMO2 HEALTH }

It's not a big deal, but should be reflected in the README, I think.

Additionally, the training process crashes without statistics dir created beforehead.

DDQN compute target

Hey, nice implementation and blog post. I have a question about the code, for these 2 lines:

z = self.model.predict(next_states) # Return a list [32x51, 32x51, 32x51]
z_ = self.model.predict(next_states) # Return a list [32x51, 32x51, 32x51]

where z_ is used to compute the target, and z is used to draw the max value action. But shouldn't it be:

z_ = self.target_model.predict(next_states) # Return a list [32x51, 32x51, 32x51]

??

Issue when m_u equals m_l

When updating the target distribution, if m_u == m_l == bj then the updated probability will be 0.

m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Easy fix is:

if m_u == m_l:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j]
else:
m_prob[action[i]][i][int(m_l)] += z_[optimal_action_idxs[i]][i][j] * (m_u - bj)
m_prob[action[i]][i][int(m_u)] += z_[optimal_action_idxs[i]][i][j] * (bj - m_l)

Same goes for when the target is a delta function (termination state):

.... else:
m_prob[action[i]][i][int(m_u)] += 1

IndexError: index 2 is out of bounds for axis 0 with size 2

Traceback (most recent call last):
File "c51_ddqn.py", line 290, in
r_t = agent.shape_reward(r_t, misc, prev_misc, t)
File "c51_ddqn.py", line 123, in shape_reward
if (misc[2] < prev_misc[2]): # Loss HEALTH
IndexError: index 2 is out of bounds for axis 0 with size 2

Could someone kindly help me?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.