kuza55 / keras-extras Goto Github PK
View Code? Open in Web Editor NEWExtra batteries for Keras
License: Apache License 2.0
Extra batteries for Keras
License: Apache License 2.0
I am not sure whether the script will make the weights shared. If not, simply split data into different towers will train each tower separately?
For reference, in the tutorial https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py,
they have a (line 175) "tf.get_variable_scope().reuse_variables()" to make sure to reuse the weights.
Thanks!
I write a U-Net using a sample and I am running to the incompatible shape issue. The U-Net is working fine for inputs with 3 channels (the sample that I used) but not with inputs with 1 channel (my inputs) and it gives me this error. The following is the network structure. The input shape is 32x32x1 and batch size of 4 and epics of 10. Even if I run it with input shape 64x64x1 and batch of 8 and epics of 10. I get incompatible shapes:[32768] vs. [65536]. The error happens in history_unet when I fit the model.
Could you please help me? I am new in U-Net and don't know what is it. I even read the same issues on google but none seems to fix mine. I am using Tensorflow backend.
The error is" tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4096] vs. [8192] "
def UNet(n_input_channels, n_output_channels):
from keras.layers import Input, Dropout, UpSampling2D, MaxPooling2D, BatchNormalization, Conv2D, Concatenate
from keras.models import Model
inputs = Input((None, None, n_input_channels))
conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
conv1 = BatchNormalization()(conv1)
conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
conv1 = BatchNormalization()(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = BatchNormalization()(conv2)
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
conv2 = BatchNormalization()(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = BatchNormalization()(conv3)
conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
conv3 = BatchNormalization()(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = BatchNormalization()(conv4)
conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
conv4 = BatchNormalization()(conv4)
drop4 = Dropout(0.5)(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(drop4)
conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = BatchNormalization()(conv5)
conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
conv5 = BatchNormalization()(conv5)
drop5 = Dropout(0.5)(conv5)
up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(drop5))
merge6 = Concatenate(axis=-1)([conv4,up6])
conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = BatchNormalization()(conv6)
conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)
conv6 = BatchNormalization()(conv6)
up7 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = Concatenate(axis=-1)([conv3,up7])
conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = BatchNormalization()(conv7)
conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)
conv7 = BatchNormalization()(conv7)
up8 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = Concatenate(axis=-1)([conv2,up8])
conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = BatchNormalization()(conv8)
conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)
conv8 = BatchNormalization()(conv8)
up9 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = Concatenate(axis=-1)([conv1,up9])
conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = BatchNormalization()(conv9)
conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
conv9 = BatchNormalization()(conv9)
conv9 = Conv2D(2, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
conv9 = BatchNormalization()(conv9)
conv10 = Conv2D(n_output_channels, 1, activation = 'softmax')(conv9)
return Model(inputs = inputs, outputs = conv10)
model_unet = UNet(n_input_channels=1, n_output_channels=2)
model_unet.compile(optimizer='adam', loss=dice_coef_loss, metrics=[dice_coef])
history_unet = model_unet.fit(x_train, y_train, batch_size=batchsize,epochs=epochs,verbose=1, shuffle=True,validation_data=(x_validation, y_validation))
this is the below code which has got errors
x_test_pred = np.argmax(classifier.predict(x_test[:47]), axis=1)
nb_correct_pred = np.sum(x_test_pred == np.argmax(y_test[:47], axis=1))
print("Original test data (first 100 images):")
print("Correctly classified: {}".format(nb_correct_pred))
print("Incorrectly classified: {}".format(47-nb_correct_pred))
and the error is
InvalidArgumentError Traceback (most recent call last)
in
1 #print(x_test)
----> 2 x_test_pred = np.argmax(classifier.predict(x_test[:47]), axis=1)
3 nb_correct_pred = np.sum(x_test_pred == np.argmax(y_test[:47], axis=1))
4
5 print("Original test data (first 100 images):")
~/Documents/adversarial-robustness-toolbox-master/art/classifiers/keras.py in predict(self, x, logits, batch_size)
220 for b in range(int(np.ceil(x_.shape[0] / float(batch_size)))):
221 begin, end = b * batch_size, min((b + 1) * batch_size, x_.shape[0])
--> 222 preds[begin:end] = self.preds([x[begin:end]])[10]
223
224 if not logits and not self._custom_activation:
~/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in call(self, inputs)
2713 return self._legacy_call(inputs)
2714
-> 2715 return self._call(inputs)
2716 else:
2717 if py_any(is_tensor(x) for x in inputs):
~/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
2673 fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
2674 else:
-> 2675 fetched = self._callable_fn(*array_vals)
2676 return fetched[:len(self.outputs)]
2677
~/Downloads/ENTER/envs/gans/lib/python3.6/site-packages/tensorflow/python/client/session.py in call(self, *args)
1452 else:
1453 return tf_session.TF_DeprecatedSessionRunCallable(
-> 1454 self._session._session, self._handle, args, status, None)
1455
1456 def del(self):
~/Downloads/ENTER/envs/gans/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
517 None, None,
518 compat.as_text(c_api.TF_Message(self.status.status)),
--> 519 c_api.TF_GetCode(self.status.status))
520 # Delete the underlying status object from memory otherwise it stays alive
521 # as there is a reference to status from this from the traceback due to
InvalidArgumentError: Matrix size-incompatible: In[0]: [47,2048], In[1]: [100352,1024]
[[Node: dense_1_1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](flatten_1_1/Reshape, dense_1_1/kernel/read)]]
Hi @kuza55,
When I use single GPU for training, the model trains with training accuracy of 99.99%. But when i use make_parallel. The training accuracy gets stuck at 96%.
Minimum Loss:
Single GPU: 0.0063
Multi GPU: 0.1213
The loss is also not dropping much.
I am training a multi-label classifier with resnet-50 with sigmoid layers in the end with binary crossentropy.
I started to use the multiple-gpu method and found it speed-up training. However, how to convert the model back to single-gpu version since we may need to deploy the model on single-gpu machines?
I'm getting above mentioned error on this line of the code -
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
Here is my model
def build_model(shape, params):
input_dim, hidden_units, num_classes = shape
audio_sequence = Input(shape=(None,input_dim),dtype='float32',name='audio')
encode = _Encoding(input_dim, hidden_units, hidden_units, params['kernel_size'], params['dropout'], params['L2'])
hide = _Hidden(2*hidden_units, params['dropout'], params['L2'])
attend = _Attention(2*hidden_units)
classify = _Classifier(2*hidden_units, num_classes, params['dropout'], params['L2'])
encoding = encode(audio_sequence)
hidden = hide(encoding)
attention_weight = attend(hidden)
align = _align(encoding, attention_weight)
scores = classify(align)
model = Model(inputs=[audio_sequence], outputs=[scores])
model.compile(optimizer=Adam(lr=params['lr']),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
class _Encoding(object):
def __init__(self, input_dim, num_filters, hidden_units, kernel_size, dropout=0.0, L2=10**-5):
self.model = Sequential()
self.model.add(Conv1D(num_filters,
kernel_size,
padding='same',
activation='relu',
kernel_regularizer=l2(L2),
input_shape=(None, input_dim)
))
self.model.add(Bidirectional(LSTM(hidden_units,
dropout=dropout,
recurrent_dropout=dropout,
return_sequences=True,
unroll=False,
)))
def __call__(self, sequence):
return self.model(sequence)
class _Hidden(object):
def __init__(self, hidden_units, dropout=0.0, L2=10**-5):
self.model = Sequential()
self.model.add(Dropout(dropout, input_shape=(hidden_units,)))
self.model.add(Dense(hidden_units,
activation='tanh',
name='attend',
kernel_initializer='he_normal',
kernel_regularizer=l2(L2),
))
self.model = TimeDistributed(self.model)
def __call__(self, encoding):
return self.model(encoding)
class _Attention(object):
def __init__(self, input_dim):
self.model = Sequential()
self.model.add(Dense(1,
activation=None,
use_bias=False,
kernel_initializer='he_normal',
input_shape=(input_dim,),
))
self.model = TimeDistributed(self.model)
self.output = Sequential()
self.output.add(Flatten())
self.output.add(Activation('softmax'))
def __call__(self, encoding):
return self.output(self.model(encoding))
def _align(encoding, attention_weight):
return Dot((1,1), normalize=False)([encoding, attention_weight])
class _Classifier(object):
def __init__(self, hidden_units, output_dim, dropout=0.0, L2=10**-5):
self.model = Sequential()
self.model.add(Dropout(dropout, input_shape=(hidden_units,)))
self.model.add(Dense(hidden_units,
activation='relu',
name='classifier_hidden',
kernel_initializer='he_normal',
kernel_regularizer=l2(L2),
))
self.model.add(Dense(output_dim,
activation='softmax',
name='scores',
kernel_initializer='he_normal',
kernel_regularizer=l2(L2),
))
def __call__(self, align):
return self.model(align)
def test_build():
shape = (26, 100, 5)
params = {}
params['dropout'] = 0.0
params['L2'] = 10**-5
params['kernel_size'] = 5
params['lr'] = 0.001
return build_model(shape, params)
model = test_build()
Can please someone help me ?
Hi!
Can you please explain are weights of the model same on all GPUs after first batch or not?
I mean, that we make copy of model on each GPU (at what line does it happen exactly?), and then compute gradients on all GPUs separately, with different slices of batch. It means, that each GPU will update its model weights in a different way. Or not, if all copies of model across GPUs are synced somehow. Does it happen somewhere inside Keras?
If looking at Tensorflow example with multiple GPUs, they calculate gradients on each GPU separately (for each slice), then average them, then update shared weights. Looking at variable scopes and their usages we can see, that model weights are shared across GPUs. But in case on Keras it's not obvious.
link to TF example: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py
I am running make_parallel with 2 GPUs, the error occurred with gradients/sub_grad/BroadcastGradientArgs:
"InvalidArgumentError (see above for traceback): Incompatible shapes: [483,1] vs. [482,1]
[[Node: gradients/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@sub"], _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_grad/Shape, gradients/sub_grad/Shape_1/_79)]]"
I use Tensorflow as the backend. And I use multi_gpu.py to achieve multi-gpus training. However, I find that the speed of using two gpus is almost same with using one gpu. Besides, I find when I use one gpu, the usage of gpu is almost 100%; but using two gpus, the usage og each gpu is about 40%-60%. How can I solve the problem?
My environment:
CPU: 40x Intel E5-2630 v4
Mem: 384GB
GPU: 4x NVIDIA GTX 1080 Ti
I came across your post on Medium and was instantly hooked. Nice job!
I've been developing a series of deep learning experiments that use only a single GPU and decided to switch them over to a multi-GPU setting. After training the models are serialized to disk via model.save
.
However, when I try to call load_model
on to load the pre-trained network for disk I get an error:
[INFO] loading model... Traceback (most recent call last): File "rank_accuracy.py", line 28, in model = load_model(config.MODEL_PATH) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 140, in load_model model = model_from_config(model_config, custom_objects=custom_objects) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/models.py", line 189, in model_from_config return layer_from_config(config, custom_objects=custom_objects) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/utils/layer_utils.py", line 34, in layer_from_config return layer_class.from_config(config['config']) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2395, in from_config process_layer(layer_data) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 2390, in process_layer layer(input_tensors[0]) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 517, in __call__ self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 571, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/engine/topology.py", line 155, in create_node output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0])) File "/home/ubuntu/.virtualenvs/dlbook/local/lib/python2.7/site-packages/keras/layers/core.py", line 587, in call return self.function(x, **arguments) File "/home/ubuntu/deep-learning-book/dataset_to_hdf5/multi_gpu.py", line 9, in get_slice shape = tf.shape(data) NameError: global name 'tf' is not defined
Looking at multi_gpu.py
it's clear that TensorFlow is imported so I'm not sure why the error is being generated.
Hi kuza,
I tried the example posted by you in https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012#.oj738zfd4
but get the following errors. Could you please show me what's the reason of the error? thanks
Traceback (most recent call last):
File "train-kai.py", line 135, in
t()
File "train-kai.py", line 72, in t
model = make_parallel(model, 4)
File "train-kai.py", line 44, in make_parallel
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/core.py", line 641, in call
return self.function(x, **arguments)
File "train-kai.py", line 23, in get_slice
size = tf.concat(0, [ shape[:1] // parts, shape[1:] ])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1030, in concat
).assert_is_compatible_with(tensor_shape.scalar())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 735, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (2, 1) and () are incompatible
tf.concat in get_slice is provided with tensors of different shape and also dtype.
I was trying to get it to work for the cifar10_cnn.py from keras examples.
Ca you provide a working example of a autoencoder? I modified the keras example to use your wrapper but it failed.
Hello there!
I'm running the demo.ipynb from rt-mrcnn-master/samples.
https://github.com/noxouille/rt-mrcnn
I converted it to demo.py. I commented this statement #get_ipython().run_line_magic('matplotlib', 'inline').
When I run it ($python demo.py), it gives the following error.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,256,48,64] vs. [1,256,47,63]
Now I don't understand what to do. Any help please. For more details I paste the whole messages below.
Thanks
(test) *****@*-Server:~/Projects/rt-mrcnn-master/samples$ python demo.py
Using TensorFlow backend.
Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 93
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE none
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001
WARNING:tensorflow:From /home/faheem/.conda/envs/test/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/faheem/.conda/envs/test/lib/python3.6/site-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-05-23 14:21:14.601297: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2019-05-23 14:21:14.623461: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3500000000 Hz
2019-05-23 14:21:14.624053: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557c7f67bbb0 executing computations on platform Host. Devices:
2019-05-23 14:21:14.624102: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-05-23 14:21:15.355519: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x557c7f7613d0 executing computations on platform CUDA. Devices:
2019-05-23 14:21:15.355558: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.355568: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.355576: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.355584: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-05-23 14:21:15.356344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:19:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-05-23 14:21:15.356526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:1a:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-05-23 14:21:15.356698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:67:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-05-23 14:21:15.356865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:68:00.0
totalMemory: 10.92GiB freeMemory: 10.70GiB
2019-05-23 14:21:15.364666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1, 2, 3
2019-05-23 14:21:15.368640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-23 14:21:15.368659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2 3
2019-05-23 14:21:15.368664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y Y Y
2019-05-23 14:21:15.368668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N Y Y
2019-05-23 14:21:15.368672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: Y Y N Y
2019-05-23 14:21:15.368675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: Y Y Y N
2019-05-23 14:21:15.369197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10468 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:19:00.0, compute capability: 6.1)
2019-05-23 14:21:15.369500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10468 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:1a:00.0, compute capability: 6.1)
2019-05-23 14:21:15.369730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10468 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:67:00.0, compute capability: 6.1)
2019-05-23 14:21:15.370062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10407 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:68:00.0, compute capability: 6.1)
Processing 1 images
image shape: (375, 500, 3) min: 0.00000 max: 255.00000 uint8
molded_images shape: (1, 375, 500, 3) min: -123.70000 max: 151.10000 float64
image_metas shape: (1, 93) min: 0.00000 max: 500.00000 int64
anchors shape: (1, 47157, 4) min: -0.96802 max: 1.82096 float32
2019-05-23 14:21:19.397662: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "demo.py", line 131, in
results = model.detect([image], verbose=1)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/model.py", line 2524, in detect
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop
batch_outs = f(ins_batch)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in call
run_metadata_ptr)
File "/home/faheem/.conda/envs/test/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,256,48,64] vs. [1,256,47,63]
[[{{node fpn_p3add/add}}]]
[[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]
I have updated this code to Keras2, but met a problem.
with tf.device('/cpu:0'):
merged = outputs_all[0]
for outputs in outputs_all[1:]:
print(outputs)
merged.append(K.concatenate(outputs, axis=0))
print(merged)
[<tf.Tensor 'tower_0/sequential_1/dropout_2/cond/Merge:0' shape=(?, 101) dtype=float32>, <tf.Tensor 'tower_1/sequential_1/dropout_2/cond/Merge:0' shape=(?, 101) dtype=float32>]
and got an error
ValueError: The model expects 2 input arrays, but only received one array. Found: array with shape (48, 101)
Could I merge two tf.Tensors to one tf.Tensor?
I recently adopted the multi_gpu
module to parallelize learning across multiple gpus. On 8 K80 teslas I get a speed-up of roughly 4x, and learning appears to take place, as the loss goes down per iteration. However, when I actually test the model and visualize the results, it appears to perform in exactly the same way as without training. Previously, at the same loss I achieved while training with multi_gpu
, I'd get drastically different performance. I've been working with this model for months and so have proven the learnability of the problem and the success of the architecture, so the results make no sense. I'm using keras's built-in ModelCheckpoint
callback to automatically save my model after every epoch in which the validation loss has decreased. My guess is that there is a silent conflict between how the model is saved and this module. Any help debugging this would be greatly appreciated.
I got the following error while trying to use make_parallel function,
Traceback (most recent call last):
File "model_language2motion.py", line 1335, in <module>
main(parser.parse_args())
File "model_language2motion.py", line 1202, in main
args.func(args)
File "model_language2motion.py", line 723, in train
train_data, valid_data, model, optimizer = prepare_for_training(output_path, args)
File "model_language2motion.py", line 677, in prepare_for_training
model = make_parallel(model, 8)
File "/workspace/deepAnim/make_parallel.py", line 31, in make_parallel
outputs = model(inputs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 172, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks))
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2247, in call
output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 2390, in run_internal_graph
computed_mask))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 235, in call
constants = self.get_constants(x)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/recurrent.py", line 884, in get_constants
ones = K.tile(ones, (1, int(input_dim)))
TypeError: int() argument must be a string or a number, not 'NoneType'
PS: The code works if the call to make_parallel is removed.
Hi, when using kernel_regularizer=regularizers.l2(0.00004)
, in conv2D layer i get „AttributeError: 'Model' object has no attribute '_losses'„ caused by outputs = model(inputs)
that merges the outputs of the different splits in one model.
The problem is that the regularizer waits for the loss but it is split over the sifferent models. Is it possible or even good to regularize batch wise?
I just use the example code in https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012
But when I changed the number of gpu(TITAN X) from 2 to 8, I found that the speed up compared with just using single gpu is almost the same. Can you give me some hints about this?
Hi,
I've tried to run the multi_gpu program to parallelize a convolution neural network that is based on U-Net. I am trying to parallelize on a g2.8xlarge to take advantage of the 4 GPUs.
Anyways, when trying to run the code I got an error. Below are both the full error, as well as the function used to define the model/call the multi_gpu function (make_parallel). The part that calls make_parallel is pretty much at the very end of the script/this post.
This may be super simple but I have no experience with tf and am just starting with keras. Any suggestions would be greatly appreciated.
Thanks,
Anthony.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-b2175f08a7a9> in <module>()
----> 1 modelTest = get_unet()
2 print(modelTest.summary())
<ipython-input-18-7318b7dd3672> in get_unet()
40
41 model = Model(input=inputs, output=conv10)
---> 42 model = make_parallel(model,3)
43 model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
44
/vol/programs/keras-extras/utils/multi_gpu.pyc in make_parallel(model, gpu_count)
29 inputs.append(slice_n)
30
---> 31 outputs = model(inputs)
32
33 if not isinstance(outputs, list):
/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in __call__(self, x, mask)
483 if inbound_layers:
484 # this will call layer.build() if necessary
--> 485 self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
486 input_added = True
487
/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in add_inbound_node(self, inbound_layers, node_indices, tensor_indices)
541 # creating the node automatically updates self.inbound_nodes
542 # as well as outbound_nodes on inbound layers.
--> 543 Node.create_node(self, inbound_layers, node_indices, tensor_indices)
544
545 def get_output_shape_for(self, input_shape):
/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in create_node(cls, outbound_layer, inbound_layers, node_indices, tensor_indices)
146
147 if len(input_tensors) == 1:
--> 148 output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
149 output_masks = to_list(outbound_layer.compute_mask(input_tensors[0], input_masks[0]))
150 # TODO: try to auto-infer shape if exception is raised by get_output_shape_for
/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in call(self, input, mask)
1920 return self._output_tensor_cache[cache_key]
1921 else:
-> 1922 output_tensors, output_masks, output_shapes = self.run_internal_graph(inputs, masks)
1923 return output_tensors
1924
/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/engine/topology.pyc in run_internal_graph(self, inputs, masks)
2062 if len(computed_data) == 1:
2063 computed_tensor, computed_mask = computed_data[0]
-> 2064 output_tensors = to_list(layer.call(computed_tensor, computed_mask))
2065 output_masks = to_list(layer.compute_mask(computed_tensor, computed_mask))
2066 computed_tensors = [computed_tensor]
/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/layers/convolutional.pyc in call(self, x, mask)
1062 def call(self, x, mask=None):
1063 return K.resize_images(x, self.size[0], self.size[1],
-> 1064 self.dim_ordering)
1065
1066 def get_config(self):
/home/ubuntu/miniconda/lib/python2.7/site-packages/Keras-1.0.4-py2.7.egg/keras/backend/tensorflow_backend.pyc in resize_images(X, height_factor, width_factor, dim_ordering)
506 X = tf.image.resize_nearest_neighbor(X, new_shape)
507 X = permute_dimensions(X, [0, 3, 1, 2])
--> 508 X.set_shape((None, None, original_shape[2] * height_factor, original_shape[3] * width_factor))
509 return X
510 elif dim_ordering == 'tf':
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
def get_unet():
inputs = Input((1, img_rows, img_cols))
conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(inputs)
conv1 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(pool1)
conv2 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(pool2)
conv3 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(pool3)
conv4 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(pool4)
conv5 = Convolution2D(512, 3, 3, activation='relu', border_mode='same')(conv5)
up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)
conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(up6)
conv6 = Convolution2D(256, 3, 3, activation='relu', border_mode='same')(conv6)
up7 = merge([UpSampling2D(size=(2, 2))(conv6), conv3], mode='concat', concat_axis=1)
conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(up7)
conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv7)
up8 = merge([UpSampling2D(size=(2, 2))(conv7), conv2], mode='concat', concat_axis=1)
conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(up8)
conv8 = Convolution2D(64, 3, 3, activation='relu', border_mode='same')(conv8)
up9 = merge([UpSampling2D(size=(2, 2))(conv8), conv1], mode='concat', concat_axis=1)
conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(up9)
conv9 = Convolution2D(32, 3, 3, activation='relu', border_mode='same')(conv9)
conv10 = Convolution2D(1, 1, 1, activation='sigmoid')(conv9)
model = Model(input=inputs, output=conv10)
model = make_parallel(model,3)
model.compile(optimizer=Adam(lr=1e-5), loss=dice_coef_loss, metrics=[dice_coef])
return model
I am trying to reproduce a multiple GPU implementation of my keras model using some of the code from your blog post. I have slightly modified it to take a list of GPUs (in case I want to specify which GPUs I am using). I am using the Tensorflow backend of Keras, and they are both up to date. I have four NVIDIA Titan X GPUs. Below is a small example using MNIST.
from keras.layers import concatenate
from keras.layers.core import Lambda
from keras.models import Model
import tensorflow as tf
def make_parallel(model, gpu_list):
def get_slice(data, idx, parts):
shape = tf.shape(data)
size = tf.concat([ shape[:1] // parts, shape[1:] ], axis=0)
stride = tf.concat([ shape[:1] // parts, shape[1:]*0 ], axis=0)
start = stride * idx
return tf.slice(data, start, size)
outputs_all = []
for i in range(len(model.outputs)):
outputs_all.append([])
#Place a copy of the model on each GPU, each getting a slice of the batch
gpu_count = len(gpu_list)
for i in range(gpu_count):
with tf.device('/gpu:%d' % gpu_list[i]):
with tf.name_scope('tower_%d' % gpu_list[i]) as scope:
inputs = []
#Slice each input into a piece for processing on this GPU
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_n = Lambda(get_slice, output_shape=input_shape, arguments={'idx':i,'parts':gpu_count})(x)
inputs.append(slice_n)
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
#Save all the outputs for merging back together later
for l in range(len(outputs)):
outputs_all[l].append(outputs[l])
# merge outputs on CPU
with tf.device('/cpu:0'):
merged = []
for outputs in outputs_all:
merged.append(concatenate(outputs, axis=0))
return Model(inputs=model.inputs, outputs=merged)
if __name__ == "__main__":
from keras.models import Sequential
from keras.layers import Dense
from keras.datasets import mnist
from keras.utils import to_categorical
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, -1)
x_test = x_test.reshape(10000, -1)
model = Sequential()
model.add(Dense(64, input_shape=(784,), activation='relu'))
model.add(Dense(10, activation='softmax'))
parallel_model = make_parallel(model , [0,1,2,3])
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
parallel_model.compile(optimizer='nadam', loss='categorical_crossentropy',
metrics=['accuracy'])
parallel_model.fit(x_train, y_train, batch_size=128,
validation_data=(x_test, y_test))
This code works when I select two or four GPUs; but when I select three GPUs, I get the following error:
Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
Traceback (most recent call last):
File "<ipython-input-1-524a8053f5a2>", line 1, in <module>
runfile('/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py', wdir='/home/rmk6217/Documents/kemker/machine_learning')
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py", line 71, in <module>
validation_data=(x_test, y_test))
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 1485, in fit
initial_epoch=initial_epoch)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 1140, in _fit_loop
outs = f(ins_batch)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/backend/tensorflow_backend.py", line 2102, in __call__
feed_dict=feed_dict)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: Incompatible shapes: [128] vs. [126]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Caused by op 'Equal', defined at:
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/ipython/start_kernel.py", line 227, in <module>
main()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/ipython/start_kernel.py", line 223, in main
kernel.start()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/ioloop.py", line 831, in start
self._run_callback(callback)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/ioloop.py", line 604, in _run_callback
ret = callback()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 258, in enter_eventloop
self.eventloop(self)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/eventloops.py", line 93, in loop_qt5
return loop_qt4(kernel)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/eventloops.py", line 87, in loop_qt4
start_event_loop_qt4(kernel.app)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/lib/guisupport.py", line 144, in start_event_loop_qt4
app.exec_()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/eventloops.py", line 39, in process_stream_events
kernel.do_one_iteration()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 291, in do_one_iteration
stream.flush(zmq.POLLIN, 1)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 352, in flush
self._handle_recv()
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2827, in run_ast_nodes
if self.run_code(code, result):
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-524a8053f5a2>", line 1, in <module>
runfile('/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py', wdir='/home/rmk6217/Documents/kemker/machine_learning')
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/rmk6217/Documents/kemker/machine_learning/multi_gpu.py", line 68, in <module>
metrics=['accuracy'])
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 952, in compile
append_metric(i, 'acc', masked_fn(y_true, y_pred, mask=masks[i]))
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/engine/training.py", line 479, in masked
score_array = fn(y_true, y_pred)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/metrics.py", line 25, in categorical_accuracy
K.argmax(y_pred, axis=-1)),
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/Keras-2.0.2-py3.5.egg/keras/backend/tensorflow_backend.py", line 1347, in equal
return tf.equal(x, y)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 721, in equal
result = _op_def_lib.apply_op("Equal", x=x, y=y, name=name)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/rmk6217/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [128] vs. [126]
[[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
I have dug through the debugger for a while now, but I can seem to track the issue. I can't help feeling that I am doing something stupid, so I was hoping another set of eyes might see things I didn't Any assistance would be appreciated. Thanks!
TensorFlow implementation of multi GPU treats weight regularization differently.
It computes partial losses and gradients on each GPU and then combines it on master CPU. It is similar to keras multi GPU.
In TensorFlow, weight regularization is not applied but each GPU. Master CPU computes weight regularization and add it final loss and gradients.
How this is implemented in keras multi GPU?
as discussed this is the example from keras with multi_gpu util.
autoencoder's output is double after merge (in make parallel)
https://gist.github.com/varoudis/d6a71f08f3d309cc3b7583f00616d9c0
When I use a model that uses BatchNorm, I get a lot (~100) of warnings in the following form:
WARNING:tensorflow:Tried to colocate gradients/tower_0/sequential_1/batch_normalization_5/moments/sufficient_statistics/count_grad/Rank with an op tower_0/sequential_1/batch_normalization_5/moments/sufficient_statistics/count that had a different device: /device:CPU:0 vs /device:GPU:0. Ignoring colocation property.
After googleing I found out it maybe has to do with some functions not being implemented on the GPU (yet).
How would it be possible to avoid those warnings? Or to fix the bug?
Can I just ignore the warning? After all it seems to train okayish.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.