kuza55 / keras-extras Goto Github PK

View Code? Open in Web Editor NEW

257.0 257.0 65.0 14 KB

Extra batteries for Keras

License: Apache License 2.0

Python 100.00%

keras-extras's People

Contributors

Stargazers

Watchers

Forkers

ambrishrawat vladimir-yashin rollingstone cyber-neuron wangdelp furiouslycurious beeva-enriqueotero ivjia zhengfangwu samskruthireddy llcf jgraving vishruit techscientist spensercai bityangke cc13ny jameskoo kafku thisiskofi tomheaven jannick-v bnaul asanakoy palashshastri visionscaper marcoleewow ahangchen jonathanventura jebbem marcoforte evictor genghis88 brandenkmurray drorhilman geoyi bkcline carrawaggio ghego pksubbarao bingojojstu yatiraj1 mave5 oarriaga andyaloha jamesben6688 rchgit leezqcst snci richardmarcum chaohuazhu ilibx rdelassus shubhampachori12110095 barseghyanartur afcarl johirbuet saumishr fitrialif jesusoctavioas arunkumarramanan julianamtcoelho whqchina gaimjkp libin19861023

keras-extras's Issues

Share weights

I am not sure whether the script will make the weights shared. If not, simply split data into different towers will train each tower separately?

For reference, in the tutorial https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py,
they have a (line 175) "tf.get_variable_scope().reuse_variables()" to make sure to reuse the weights.

Thanks!

InvalidArgumentError: Incompatible shapes: [4096] vs. [8192]

I write a U-Net using a sample and I am running to the incompatible shape issue. The U-Net is working fine for inputs with 3 channels (the sample that I used) but not with inputs with 1 channel (my inputs) and it gives me this error. The following is the network structure. The input shape is 32x32x1 and batch size of 4 and epics of 10. Even if I run it with input shape 64x64x1 and batch of 8 and epics of 10. I get incompatible shapes:[32768] vs. [65536]. The error happens in history_unet when I fit the model.
Could you please help me? I am new in U-Net and don't know what is it. I even read the same issues on google but none seems to fix mine. I am using Tensorflow backend.

The error is" tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4096] vs. [8192] "

def UNet(n_input_channels, n_output_channels):
    from keras.layers import Input, Dropout, UpSampling2D, MaxPooling2D, BatchNormalization, Conv2D, Concatenate
    from keras.models import Model

    inputs = Input((None, None, n_input_channels))
     conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
     conv1 = BatchNormalization()(conv1)
     conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
     conv1 = BatchNormalization()(conv1)
     pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
     conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
     conv2 = BatchNormalization()(conv2)
     conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
     conv2 = BatchNormalization()(conv2)
     pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
     conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
     conv3 = BatchNormalization()(conv3)
     conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
     conv3 = BatchNormalization()(conv3)
     pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
     conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
     conv4 = BatchNormalization()(conv4)
     conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
     conv4 = BatchNormalization()(conv4)
     drop4 = Dropout(0.5)(conv4)
     pool4 = MaxPooling2D(pool_size=(2, 2))(drop4)

     conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
     conv5 = BatchNormalization()(conv5)
     conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
     conv5 = BatchNormalization()(conv5)
     drop5 = Dropout(0.5)(conv5)

     up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(drop5))
     merge6 = Concatenate(axis=-1)([conv4,up6])
     conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
     conv6 = BatchNormalization()(conv6)
     conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)
     conv6 = BatchNormalization()(conv6)

     up7 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
     merge7 = Concatenate(axis=-1)([conv3,up7])
     conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
     conv7 = BatchNormalization()(conv7)
     conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)
     conv7 = BatchNormalization()(conv7)

     up8 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
     merge8 = Concatenate(axis=-1)([conv2,up8])
     conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
     conv8 = BatchNormalization()(conv8)
     conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)
     conv8 = BatchNormalization()(conv8)

     up9 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
     merge9 = Concatenate(axis=-1)([conv1,up9])
     conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
     conv9 = BatchNormalization()(conv9)
     conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
     conv9 = BatchNormalization()(conv9)
     conv9 = Conv2D(2, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
     conv9 = BatchNormalization()(conv9)
     conv10 = Conv2D(n_output_channels, 1, activation = 'softmax')(conv9)

     return Model(inputs = inputs, outputs = conv10)

model_unet = UNet(n_input_channels=1, n_output_channels=2)
model_unet.compile(optimizer='adam', loss=dice_coef_loss, metrics=[dice_coef]) 
history_unet = model_unet.fit(x_train, y_train, batch_size=batchsize,epochs=epochs,verbose=1, shuffle=True,validation_data=(x_validation, y_validation))

Invalid argument error

this is the below code which has got errors
x_test_pred = np.argmax(classifier.predict(x_test[:47]), axis=1)
nb_correct_pred = np.sum(x_test_pred == np.argmax(y_test[:47], axis=1))

print("Original test data (first 100 images):")
print("Correctly classified: {}".format(nb_correct_pred))
print("Incorrectly classified: {}".format(47-nb_correct_pred))

and the error is
InvalidArgumentError Traceback (most recent call last)
in
1 #print(x_test)
----> 2 x_test_pred = np.argmax(classifier.predict(x_test[:47]), axis=1)
3 nb_correct_pred = np.sum(x_test_pred == np.argmax(y_test[:47], axis=1))
4
5 print("Original test data (first 100 images):")

~/Documents/adversarial-robustness-toolbox-master/art/classifiers/keras.py in predict(self, x, logits, batch_size)
220 for b in range(int(np.ceil(x_.shape[0] / float(batch_size)))):
221 begin, end = b * batch_size, min((b + 1) * batch_size, x_.shape[0])
--> 222 preds[begin:end] = self.preds([x[begin:end]])[10]
223
224 if not logits and not self._custom_activation:

~/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in call(self, inputs)
2713 return self._legacy_call(inputs)
2714
-> 2715 return self._call(inputs)
2716 else:
2717 if py_any(is_tensor(x) for x in inputs):

~/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
2673 fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
2674 else:
-> 2675 fetched = self._callable_fn(*array_vals)
2676 return fetched[:len(self.outputs)]
2677

~/Downloads/ENTER/envs/gans/lib/python3.6/site-packages/tensorflow/python/client/session.py in call(self, *args)
1452 else:
1453 return tf_session.TF_DeprecatedSessionRunCallable(
-> 1454 self._session._session, self._handle, args, status, None)
1455
1456 def del(self):

~/Downloads/ENTER/envs/gans/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
517 None, None,
518 compat.as_text(c_api.TF_Message(self.status.status)),
--> 519 c_api.TF_GetCode(self.status.status))
520 # Delete the underlying status object from memory otherwise it stays alive
521 # as there is a reference to status from this from the traceback due to

InvalidArgumentError: Matrix size-incompatible: In[0]: [47,2048], In[1]: [100352,1024]
[[Node: dense_1_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](flatten_1_1/Reshape, dense_1_1/kernel/read)]]

accuracy drop

Hi @kuza55,
When I use single GPU for training, the model trains with training accuracy of 99.99%. But when i use make_parallel. The training accuracy gets stuck at 96%.

Minimum Loss:
Single GPU: 0.0063
Multi GPU: 0.1213
The loss is also not dropping much.

I am training a multi-label classifier with resnet-50 with sigmoid layers in the end with binary crossentropy.

Is there a method to transform multiple-gpu model back to single-gpu model

I started to use the multiple-gpu method and found it speed-up training. However, how to convert the model back to single-gpu version since we may need to deploy the model on single-gpu machines?

AttributeError: 'Node' object has no attribute 'output_masks'

I'm getting above mentioned error on this line of the code -
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)

Here is my model

def build_model(shape, params):

input_dim, hidden_units, num_classes = shape
audio_sequence = Input(shape=(None,input_dim),dtype='float32',name='audio')

encode = _Encoding(input_dim, hidden_units, hidden_units, params['kernel_size'], params['dropout'], params['L2'])
hide = _Hidden(2*hidden_units, params['dropout'], params['L2'])
attend = _Attention(2*hidden_units)
classify = _Classifier(2*hidden_units, num_classes, params['dropout'], params['L2'])

encoding = encode(audio_sequence)
hidden = hide(encoding)
attention_weight = attend(hidden)
align = _align(encoding, attention_weight)
scores = classify(align)

model = Model(inputs=[audio_sequence], outputs=[scores])
model.compile(optimizer=Adam(lr=params['lr']),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
return model

class _Encoding(object):

def __init__(self, input_dim, num_filters, hidden_units, kernel_size, dropout=0.0, L2=10**-5):

    self.model = Sequential()
    self.model.add(Conv1D(num_filters,
                          kernel_size,
                          padding='same',
                          activation='relu',
                          kernel_regularizer=l2(L2),
                          input_shape=(None, input_dim)
                          ))
    self.model.add(Bidirectional(LSTM(hidden_units,
                                      dropout=dropout,
                                      recurrent_dropout=dropout,
                                      return_sequences=True,
                                      unroll=False,
                                      )))

def __call__(self, sequence):
    return self.model(sequence)

class _Hidden(object):

def __init__(self, hidden_units, dropout=0.0, L2=10**-5):

    self.model = Sequential()
    self.model.add(Dropout(dropout, input_shape=(hidden_units,)))
    self.model.add(Dense(hidden_units,
                         activation='tanh',
                         name='attend',
                         kernel_initializer='he_normal',
                         kernel_regularizer=l2(L2),
                         ))
    self.model = TimeDistributed(self.model)

def __call__(self, encoding):
    return self.model(encoding)

class _Attention(object):

def __init__(self, input_dim):

    self.model = Sequential()
    self.model.add(Dense(1,
                         activation=None,
                         use_bias=False,
                         kernel_initializer='he_normal',
                         input_shape=(input_dim,),
                         ))
    self.model = TimeDistributed(self.model)

    self.output = Sequential()
    self.output.add(Flatten())
    self.output.add(Activation('softmax'))

def __call__(self, encoding):
    return self.output(self.model(encoding))

def _align(encoding, attention_weight):

return Dot((1,1), normalize=False)([encoding, attention_weight])

class _Classifier(object):

def __init__(self, hidden_units, output_dim, dropout=0.0, L2=10**-5):

    self.model = Sequential()
    self.model.add(Dropout(dropout, input_shape=(hidden_units,)))
    self.model.add(Dense(hidden_units,
                         activation='relu',
                         name='classifier_hidden',
                         kernel_initializer='he_normal',
                         kernel_regularizer=l2(L2),
                         ))
    self.model.add(Dense(output_dim,
                         activation='softmax',
                         name='scores',
                         kernel_initializer='he_normal',
                         kernel_regularizer=l2(L2),
                         ))

def __call__(self, align):
    return self.model(align)

def test_build():

shape = (26, 100, 5)
params = {}
params['dropout'] = 0.0
params['L2'] = 10**-5
params['kernel_size'] = 5
params['lr'] = 0.001
return build_model(shape, params)

model = test_build()
Can please someone help me ?

multi_gpu.make_parallel() produce incompatible shapes

Understanding gradient flow

Hi!

Can you please explain are weights of the model same on all GPUs after first batch or not?
I mean, that we make copy of model on each GPU (at what line does it happen exactly?), and then compute gradients on all GPUs separately, with different slices of batch. It means, that each GPU will update its model weights in a different way. Or not, if all copies of model across GPUs are synced somehow. Does it happen somewhere inside Keras?

If looking at Tensorflow example with multiple GPUs, they calculate gradients on each GPU separately (for each slice), then average them, then update shared weights. Looking at variable scopes and their usages we can see, that model weights are shared across GPUs. But in case on Keras it's not obvious.

link to TF example: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py

Incompatible shapes

I am running make_parallel with 2 GPUs, the error occurred with gradients/sub_grad/BroadcastGradientArgs:
"InvalidArgumentError (see above for traceback): Incompatible shapes: [483,1] vs. [482,1]
[[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape, gradients/sub_grad/Shape_1/_79)]]"

The speed of using multi gpus

I use Tensorflow as the backend. And I use multi_gpu.py to achieve multi-gpus training. However, I find that the speed of using two gpus is almost same with using one gpu. Besides, I find when I use one gpu, the usage of gpu is almost 100%; but using two gpus, the usage og each gpu is about 40%-60%. How can I solve the problem?

My environment:
CPU: 40x Intel E5-2630 v4
Mem: 384GB
GPU: 4x NVIDIA GTX 1080 Ti

Cannot call `load_model` on network trained using `multi_gpu`.

I came across your post on Medium and was instantly hooked. Nice job!

I've been developing a series of deep learning experiments that use only a single GPU and decided to switch them over to a multi-GPU setting. After training the models are serialized to disk via model.save.

However, when I try to call load_model on to load the pre-trained network for disk I get an error:

[INFO] loading model...
Traceback (most recent call last):
  File "rank_accuracy.py", line 28, in 
    model = load_model(config.MODEL_PATH)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 140, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 189, in model_from_config
    return layer_from_config(config, custom_objects=custom_objects)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 34, in layer_from_config
    return layer_class.from_config(config['config'])
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2395, in from_config
    process_layer(layer_data)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2390, in process_layer
    layer(input_tensors[0])
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 517, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 571, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 155, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/layers/core.py", line 587, in call
    return self.function(x, **arguments)
  File "/home/ubuntu/deep-learning-book/dataset_to_hdf5/multi_gpu.py", line 9, in get_slice
    shape = tf.shape(data)
NameError: global name 'tf' is not defined

Looking at multi_gpu.py it's clear that TensorFlow is imported so I'm not sure why the error is being generated.

data shape problem

Hi kuza,

I tried the example posted by you in https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012#.oj738zfd4

but get the following errors. Could you please show me what's the reason of the error? thanks

Traceback (most recent call last):
File "train-kai.py", line 135, in
t()
File "train-kai.py", line 72, in t
model = make_parallel(model, 4)
File "train-kai.py", line 44, in make_parallel
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 641, in call
return self.function(x, **arguments)
File "train-kai.py", line 23, in get_slice
size = tf.concat(0, [ shape[:1] // parts, shape[1:] ])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1030, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 735, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible

Values provided to tf.concat do not have same shape

tf.concat in get_slice is provided with tensors of different shape and also dtype.
I was trying to get it to work for the cifar10_cnn.py from keras examples.

autoencoder fail

Ca you provide a working example of a autoencoder? I modified the keras example to use your wrapper but it failed.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,256,48,64] vs. [1,256,47,63]

Hello there!
I'm running the demo.ipynb from rt-mrcnn-master/samples.
https://github.com/noxouille/rt-mrcnn
I converted it to demo.py. I commented this statement #get_ipython().run_line_magic('matplotlib', 'inline').
When I run it ($python demo.py), it gives the following error.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,256,48,64] vs. [1,256,47,63]

Now I don't understand what to do. Any help please. For more details I paste the whole messages below.
Thanks

(test) *****@*-Server:~/Projects/rt-mrcnn-master/samples$ python demo.py
Using TensorFlow backend.

Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 93
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE none
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001

WARNING:tensorflow:From /home/faheem/.conda/envs/test/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/faheem/.conda/envs/test/lib/python3.6/site-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-05-23 14:21:14.601297: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2019-05-23 14:21:14.623461: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3500000000 Hz
2019-05-23 14:21:14.624053: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557c7f67bbb0 executing computations on platform Host. Devices:
2019-05-23 14:21:14.624102: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-05-23 14:21:15.355519: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557c7f7613d0 executing computations on platform CUDA. Devices:
2019-05-23 14:21:15.355558: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.355568: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.355576: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.355584: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.356344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:19:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-05-23 14:21:15.356526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:1a:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-05-23 14:21:15.356698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:67:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-05-23 14:21:15.356865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:68:00.0
totalMemory: 10.92GiB freeMemory: 10.70GiB
2019-05-23 14:21:15.364666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3
2019-05-23 14:21:15.368640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-23 14:21:15.368659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3
2019-05-23 14:21:15.368664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y Y Y
2019-05-23 14:21:15.368668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N Y Y
2019-05-23 14:21:15.368672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: Y Y N Y
2019-05-23 14:21:15.368675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: Y Y Y N
2019-05-23 14:21:15.369197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10468 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:19:00.0, compute capability: 6.1)
2019-05-23 14:21:15.369500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10468 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:1a:00.0, compute capability: 6.1)
2019-05-23 14:21:15.369730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10468 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:67:00.0, compute capability: 6.1)
2019-05-23 14:21:15.370062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10407 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:68:00.0, compute capability: 6.1)
Processing 1 images
image shape: (375, 500, 3) min: 0.00000 max: 255.00000 uint8
molded_images shape: (1, 375, 500, 3) min: -123.70000 max: 151.10000 float64
image_metas shape: (1, 93) min: 0.00000 max: 500.00000 int64
anchors shape: (1, 47157, 4) min: -0.96802 max: 1.82096 float32
2019-05-23 14:21:19.397662: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "demo.py", line 131, in
results = model.detect([image], verbose=1)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/model.py", line 2524, in detect
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop
batch_outs = f(ins_batch)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in call
run_metadata_ptr)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,256,48,64] vs. [1,256,47,63]
[[{{node fpn_p3add/add}}]]
[[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]

The model expects 2 input arrays, but only received one array.

I have updated this code to Keras2, but met a problem.

with tf.device('/cpu:0'):
	merged = outputs_all[0]
	for outputs in outputs_all[1:]:
		print(outputs)
		merged.append(K.concatenate(outputs, axis=0))
	print(merged)

[<tf.Tensor 'tower_0/sequential_1/dropout_2/cond/Merge:0' shape=(?, 101) dtype=float32>, <tf.Tensor 'tower_1/sequential_1/dropout_2/cond/Merge:0' shape=(?, 101) dtype=float32>]

and got an error

ValueError: The model expects 2 input arrays, but only received one array. Found: array with shape (48, 101)

Could I merge two tf.Tensors to one tf.Tensor?

Potential incompatibility with keras model checkpointing

I recently adopted the multi_gpu module to parallelize learning across multiple gpus. On 8 K80 teslas I get a speed-up of roughly 4x, and learning appears to take place, as the loss goes down per iteration. However, when I actually test the model and visualize the results, it appears to perform in exactly the same way as without training. Previously, at the same loss I achieved while training with multi_gpu, I'd get drastically different performance. I've been working with this model for months and so have proven the learnability of the problem and the success of the architecture, so the results make no sense. I'm using keras's built-in ModelCheckpoint callback to automatically save my model after every epoch in which the validation loss has decreased. My guess is that there is a silent conflict between how the model is saved and this module. Any help debugging this would be greatly appreciated.

Error with make_parallel function

I got the following error while trying to use make_parallel function,

Traceback (most recent call last):
  File "model_language2motion.py", line 1335, in <module>
    main(parser.parse_args())
  File "model_language2motion.py", line 1202, in main
    args.func(args)
  File "model_language2motion.py", line 723, in train
    train_data, valid_data, model, optimizer = prepare_for_training(output_path, args)
  File "model_language2motion.py", line 677, in prepare_for_training
    model = make_parallel(model, 8)
  File "/workspace/deepAnim/make_parallel.py", line 31, in make_parallel
    outputs = model(inputs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 172, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2247, in call
    output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2390, in run_internal_graph
    computed_mask))
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 235, in call
    constants = self.get_constants(x)
  File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 884, in get_constants
    ones = K.tile(ones, (1, int(input_dim)))
TypeError: int() argument must be a string or a number, not 'NoneType'

PS: The code works if the call to make_parallel is removed.

Regularization causes error using mult gpu

Hi, when using kernel_regularizer=regularizers.l2(0.00004), in conv2D layer i get „AttributeError: 'Model' object has no attribute '_losses'„ caused by outputs = model(inputs) that merges the outputs of the different splits in one model.
The problem is that the regularizer waits for the loss but it is split over the sifferent models. Is it possible or even good to regularize batch wise?

An issue about the code

I just use the example code in https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012

But when I changed the number of gpu(TITAN X) from 2 to 8, I found that the speed up compared with just using single gpu is almost the same. Can you give me some hints about this?

multi_gpu error

Hi,

I've tried to run the multi_gpu program to parallelize a convolution neural network that is based on U-Net. I am trying to parallelize on a g2.8xlarge to take advantage of the 4 GPUs.

Anyways, when trying to run the code I got an error. Below are both the full error, as well as the function used to define the model/call the multi_gpu function (make_parallel). The part that calls make_parallel is pretty much at the very end of the script/this post.

This may be super simple but I have no experience with tf and am just starting with keras. Any suggestions would be greatly appreciated.

Thanks,

Anthony.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-b2175f08a7a9> in <module>()
----> 1 modelTest = get_unet()
      2 print(modelTest.summary())

<ipython-input-18-7318b7dd3672> in get_unet()
     40 
     41     model = Model(input=inputs, output=conv10)
---> 42     model = make_parallel(model,3)
     43     model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
     44 

/vol/programs/keras-extras/utils/multi_gpu.pyc in make_parallel(model, gpu_count)
     29                     inputs.append(slice_n)
     30 
---> 31                 outputs = model(inputs)
     32 
     33                 if not isinstance(outputs, list):

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in __call__(self, x, mask)
    483         if inbound_layers:
    484             # this will call layer.build() if necessary
--> 485             self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
    486             input_added = True
    487 

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in add_inbound_node(self, inbound_layers, node_indices, tensor_indices)
    541         # creating the node automatically updates self.inbound_nodes
    542         # as well as outbound_nodes on inbound layers.
--> 543         Node.create_node(self, inbound_layers, node_indices, tensor_indices)
    544 
    545     def get_output_shape_for(self, input_shape):

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in create_node(cls, outbound_layer, inbound_layers, node_indices, tensor_indices)
    146 
    147         if len(input_tensors) == 1:
--> 148             output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
    149             output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0]))
    150             # TODO: try to auto-infer shape if exception is raised by get_output_shape_for

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in call(self, input, mask)
   1920             return self._output_tensor_cache[cache_key]
   1921         else:
-> 1922             output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
   1923             return output_tensors
   1924 

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in run_internal_graph(self, inputs, masks)
   2062                     if len(computed_data) == 1:
   2063                         computed_tensor, computed_mask = computed_data[0]
-> 2064                         output_tensors = to_list(layer.call(computed_tensor, computed_mask))
   2065                         output_masks = to_list(layer.compute_mask(computed_tensor, computed_mask))
   2066                         computed_tensors = [computed_tensor]

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/layers/convolutional.pyc in call(self, x, mask)
   1062     def call(self, x, mask=None):
   1063         return K.resize_images(x, self.size[0], self.size[1],
-> 1064                                self.dim_ordering)
   1065 
   1066     def get_config(self):

/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/backend/tensorflow_backend.pyc in resize_images(X, height_factor, width_factor, dim_ordering)
    506         X = tf.image.resize_nearest_neighbor(X, new_shape)
    507         X = permute_dimensions(X, [0, 3, 1, 2])
--> 508         X.set_shape((None, None, original_shape[2] * height_factor, original_shape[3] * width_factor))
    509         return X
    510     elif dim_ordering == 'tf':

TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

def get_unet():
    inputs = Input((1, img_rows, img_cols))
    conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(inputs)
    conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

    conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(pool1)
    conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

    conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(pool2)
    conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

    conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(pool3)
    conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

    conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(pool4)
    conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(conv5)

    up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)
    
    conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(up6)
    conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv6)

    up7 = merge([UpSampling2D(size=(2, 2))(conv6), conv3], mode='concat', concat_axis=1)
    conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(up7)
    conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv7)

    up8 = merge([UpSampling2D(size=(2, 2))(conv7), conv2], mode='concat', concat_axis=1)
    conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(up8)
    conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv8)

    up9 = merge([UpSampling2D(size=(2, 2))(conv8), conv1], mode='concat', concat_axis=1)
    conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(up9)
    conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv9)

    conv10 = Convolution2D(1, 1, 1, activation='sigmoid')(conv9)

    model = Model(input=inputs, output=conv10)
    model = make_parallel(model,3)
    
    model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
    
    

    return model

Debugging Multiple GPU Model

I am trying to reproduce a multiple GPU implementation of my keras model using some of the code from your blog post. I have slightly modified it to take a list of GPUs (in case I want to specify which GPUs I am using). I am using the Tensorflow backend of Keras, and they are both up to date. I have four NVIDIA Titan X GPUs. Below is a small example using MNIST.

from keras.layers import concatenate
from keras.layers.core import Lambda
from keras.models import Model

import tensorflow as tf

def make_parallel(model, gpu_list):
    def get_slice(data, idx, parts):
        shape = tf.shape(data)
        size = tf.concat([ shape[:1] // parts, shape[1:] ], axis=0)
        stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ], axis=0)
        start = stride * idx
        return tf.slice(data, start, size)

    outputs_all = []
    for i in range(len(model.outputs)):
        outputs_all.append([])

    #Place a copy of the model on each GPU, each getting a slice of the batch
    gpu_count = len(gpu_list)
    for i in range(gpu_count):
        with tf.device('/gpu:%d' % gpu_list[i]):
            with tf.name_scope('tower_%d' % gpu_list[i]) as scope:

                inputs = []
                #Slice each input into a piece for processing on this GPU
                for x in model.inputs:
                    input_shape = tuple(x.get_shape().as_list())[1:]
                    slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
                    inputs.append(slice_n)                

                outputs = model(inputs)
                
                if not isinstance(outputs, list):
                    outputs = [outputs]
                
                #Save all the outputs for merging back together later
                for l in range(len(outputs)):
                    outputs_all[l].append(outputs[l])

    # merge outputs on CPU
    with tf.device('/cpu:0'):
        merged = []
        for outputs in outputs_all:
            merged.append(concatenate(outputs, axis=0))
            
        return Model(inputs=model.inputs, outputs=merged)

if __name__ == "__main__":
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.datasets import mnist
    from keras.utils import to_categorical
    
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train = x_train.reshape(60000, -1)
    x_test = x_test.reshape(10000, -1)
    model = Sequential()
    model.add(Dense(64, input_shape=(784,), activation='relu'))
    model.add(Dense(10, activation='softmax'))
    
    parallel_model = make_parallel(model , [0,1,2,3])
    
    y_train = to_categorical(y_train, 10)
    y_test = to_categorical(y_test, 10)
    
    parallel_model.compile(optimizer='nadam', loss='categorical_crossentropy',
                           metrics=['accuracy'])
    
    parallel_model.fit(x_train, y_train, batch_size=128,
                       validation_data=(x_test, y_test))

This code works when I select two or four GPUs; but when I select three GPUs, I get the following error:

Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
Traceback (most recent call last):

  File "<ipython-input-1-524a8053f5a2>", line 1, in <module>
    runfile('/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py', wdir='/home/rmk6217/Documents/kemker/machine_learning')

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py", line 71, in <module>
    validation_data=(x_test, y_test))

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 1485, in fit
    initial_epoch=initial_epoch)

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 1140, in _fit_loop
    outs = f(ins_batch)

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/backend/tensorflow_backend.py", line 2102, in __call__
    feed_dict=feed_dict)

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)

  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)

InvalidArgumentError: Incompatible shapes: [128] vs. [126]
	 [[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]

Caused by op 'Equal', defined at:
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/ipython/start_kernel.py", line 227, in <module>
    main()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/ipython/start_kernel.py", line 223, in main
    kernel.start()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
    ioloop.IOLoop.instance().start()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/ioloop.py", line 831, in start
    self._run_callback(callback)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/ioloop.py", line 604, in _run_callback
    ret = callback()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 258, in enter_eventloop
    self.eventloop(self)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/eventloops.py", line 93, in loop_qt5
    return loop_qt4(kernel)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/eventloops.py", line 87, in loop_qt4
    start_event_loop_qt4(kernel.app)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/lib/guisupport.py", line 144, in start_event_loop_qt4
    app.exec_()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/eventloops.py", line 39, in process_stream_events
    kernel.do_one_iteration()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 291, in do_one_iteration
    stream.flush(zmq.POLLIN, 1)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 352, in flush
    self._handle_recv()
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
    user_expressions, allow_stdin)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2827, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-524a8053f5a2>", line 1, in <module>
    runfile('/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py', wdir='/home/rmk6217/Documents/kemker/machine_learning')
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py", line 68, in <module>
    metrics=['accuracy'])
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 952, in compile
    append_metric(i, 'acc', masked_fn(y_true, y_pred, mask=masks[i]))
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 479, in masked
    score_array = fn(y_true, y_pred)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/metrics.py", line 25, in categorical_accuracy
    K.argmax(y_pred, axis=-1)),
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/backend/tensorflow_backend.py", line 1347, in equal
    return tf.equal(x, y)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 721, in equal
    result = _op_def_lib.apply_op("Equal", x=x, y=y, name=name)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [128] vs. [126]
	 [[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]

I have dug through the debugger for a while now, but I can seem to track the issue. I can't help feeling that I am doing something stupid, so I was hoping another set of eyes might see things I didn't Any assistance would be appreciated. Thanks!

Multi GPU support with weight regularization

TensorFlow implementation of multi GPU treats weight regularization differently.

It computes partial losses and gradients on each GPU and then combines it on master CPU. It is similar to keras multi GPU.

In TensorFlow, weight regularization is not applied but each GPU. Master CPU computes weight regularization and add it final loss and gradients.

How this is implemented in keras multi GPU?

cons_vae make parallel doubles output

as discussed this is the example from keras with multi_gpu util.
autoencoder's output is double after merge (in make parallel)

https://gist.github.com/varoudis/d6a71f08f3d309cc3b7583f00616d9c0

BatchNormalization gives Warning

When I use a model that uses BatchNorm, I get a lot (~100) of warnings in the following form:

WARNING:tensorflow:Tried to colocate gradients/tower_0/sequential_1/batch_normalization_5/moments/sufficient_statistics/count_grad/Rank with an op tower_0/sequential_1/batch_normalization_5/moments/sufficient_statistics/count that had a different device: /device:CPU:0 vs /device:GPU:0. Ignoring colocation property.

After googleing I found out it maybe has to do with some functions not being implemented on the GPU (yet).

How would it be possible to avoid those warnings? Or to fix the bug?
Can I just ignore the warning? After all it seems to train okayish.