Coder Social home page Coder Social logo

keras-multi-gpu's People

Contributors

bzamecnik avatar pasky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-multi-gpu's Issues

Argument constraints will be removed from Optimizer.get_updates() in Keras 2.0.7

We call Optimizer.get_updates with self.constraints as an argument from overriden Model._make_train_function(). However, in master (that will be released in 2.0.7) this argument is removed. Although there's some legacy interface adaptor it fails with:

  File "keras/keras/engine/training.py", line 1412, in fit
    self._make_train_function()
  File "rossum-multi-gpu/data_parallel_model.py", line 161, in _make_train_function
    self.constraints,
AttributeError: 'DataParallelModel' object has no attribute 'constraints'

Possible to use NCCL for optimized inter-GPU communication?

NCCL claims to provide optimized collective operations for multi-GPU communication. It's available via TensorFlow as well. In our case we could use:

  • all-gather for gradient averaging (sum of gradients normalized by number of replicas)
  • broadcast for propagating weights
  • all-scatter for providing input slices to replicas

We could use TF NCCL operation tf.contrib.nccl.all_sum. It's all-reduce with sum reduction, ie. reduce followed by broadcast of the result. We can use that for gradient averaging. The gradients are available on all devices so that weights can be located and updated on all devices and do not need to be broadcast.

Operation all-scatter is not provided in tf.contrib.nccl. Instead we could utilize TF queue mechanism.

Can't convert Operation 'StagingArea_put' to Tensor

When running using tensoflow-1.12.0 and Keras-2.2.4:

CUDA_VISIBLE_DEVICES=3 python keras_staging_area_cifar10.py

I get the following error:

training pipelined model:
Traceback (most recent call last):
  File "keras_staging_area_cifar10.py", line 73, in <module>
    callbacks=[staging_area_callback, gauge])
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/engine/training.py", line 1010, in fit
    self._make_train_function()
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/engine/training.py", line 519, in _make_train_function
    **self._function_kwargs)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2744, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2567, in __init__
    self.fetches = [tf.identity(x) for x in self.fetches]
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 81, in identity
    return gen_array_ops.identity(input, name=name)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3454, in identity
    "Identity", input=input, name=name)
  File "/home/bzamecnik/.virtualenvs/rossum/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 513, in _apply_op_helper
    raise err
TypeError: Can't convert Operation 'StagingArea_put' to Tensor (target dtype=None, name=u'input', as_ref=False)

Similar: https://stackoverflow.com/questions/47750300/tensorflow-cant-convert-operation-to-tensor

The cause looks like the StagingArea.put operation is wrapped via tf.identity():

 # (since the outputs of fetches are never returned).
   2566         # This requires us to wrap fetches in `identity` ops.
-> 2567         self.fetches = [tf.identity(x) for x in self.fetches]

a thought

hey guys,

first I wanna say that it's so nice to see people sharing their thoughts and work like this.

I just wanted to ask, wrt. to the Keras distributed tests, are you scaling batch size with the number of gpus? (as Keras just splits the given batchsize, across the cards. so for a batchsize of 256 on 4 cards the real batchsize is 64 per card. (I honestly think this should be changed, but c'est la vie)

so this may be why you see less efficiency on the cards.

here's a plot from my tests, that shows quasilinear speedups on EC2 instances.

pasted image at 2017_11_13 05_31 pm

hope this helps!!

How to use multi-gpu in Keras with shared weights applications model

System information

  • Linux Ubuntu 16.04)
  • TensorFlow backend
  • TensorFlow version: 1.10.0

I want to use the keras in multi-gpus with the applications (such VGG16). But there are some error.

I try to use the single-gpus it is correct. But the multi-gpus is wrong.
The code like this:

import keras
    with tf.device('/cpu:0'):
        input1 = keras.layers.Input(config.input_shape)
        input2 = keras.layers.Input(config.input_shape)
        sub_model = keras.applications.VGG16(include_top=False, weights=config.VGG_MODEL_PATH,
                                             input_shape=config.input_shape)
        output1 = sub_model(input1)
        output2 = sub_model(input1)
        model = keras.Model(inputs=[input1, input2], outputs=[output1, output2])
    parallel_model = keras.utils.multi_gpu_model(model, gpus=3)
    parallel_model.compile('sgd', loss=['mse', 'mse'])
    parallel_model.fit((np.random.random([10, 128, 128, 3]), np.random.random([10, 128, 128, 3])),
                       (np.random.random([10, 4, 4, 512]), np.random.random([10, 4, 4, 512])))

The error message is

Traceback (most recent call last):
  File "/data00/home/liangdong.tony/PycharmProject/RetrievalCCWebVideo/AE/demo.py", line 145, in <module>
    parallel_model = keras.utils.multi_gpu_model(model, gpus=3)
  File "/data00/home/liangdong.tony/.local/lib/python2.7/site-packages/keras/utils/training_utils.py", line 177, in multi_gpu_model
    return Model(model.inputs, merged)
  File "/data00/home/liangdong.tony/.local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/data00/home/liangdong.tony/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 1811, in __init__
    'Layer names: ', all_names)
RuntimeError: ('The name "vgg16" is used 2 times in the model. All layer names should be unique. Layer names: ', ['input_1', 'input_2', 'lambda_1', 'lambda_2', 'lambda_3', 'lambda_4', 'lambda_5', 'lambda_6', 'model_1', 'vgg16', 'vgg16'])

In a word, I want to use the VGG16 as backbone. I have to input, which used as the input of VGG16 which shared weights between the two inputs.
Do you have any suggestion about that?
Thank you, looking for your reply!

tensorflow : 'NoneType' object has no attribute 'update'

I tried running all your example on 8 Nvidia V100 however I get this error across all of them:

  File "/opt/conda/lib/python3.5/copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/opt/conda/lib/python3.5/copy.py", line 297, in _reconstruct
    state = deepcopy(state, memo)
  File "/opt/conda/lib/python3.5/copy.py", line 155, in deepcopy
    y = copier(x, memo)
  File "/opt/conda/lib/python3.5/copy.py", line 243, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/opt/conda/lib/python3.5/copy.py", line 182, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/opt/conda/lib/python3.5/copy.py", line 306, in _reconstruct
    y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'

This is in a docker environment with the following versions:
**

  • Tensorflow 1.4.0
  • Keras 2.1.2
  • Python 3.5.4
  • CUDA 9.0

**
Could this be version compatibility or GPU device compatibility? Thanks for any pointers.

Shape [-1] has negative dimensions

Running on 2 GPUs (GTX 1070):

CUDA_VISIBLE_DEVICES=0,1 python data_parallel_mnist_cnn.py
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
2017-08-10 14:55:47.483599: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 14:55:47.483631: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-10 14:55:48.831409: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-10 14:55:48.831460: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
	 [[Node: replica_1_1/model_1_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-10 14:55:48.849021: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1,-1] has negative dimensions
2017-08-10 14:55:48.849064: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1,-1] has negative dimensions
	 [[Node: replica_0_1/model_1_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
2017-08-10 14:55:48.865190: W tensorflow/core/framework/op_kernel.cc:1148] Invalid argument: Shape [-1] has negative dimensions
2017-08-10 14:55:48.865233: E tensorflow/core/common_runtime/executor.cc:644] Executor failed to create kernel. Invalid argument: Shape [-1] has negative dimensions
	 [[Node: replica_0_1/model_1_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Traceback (most recent call last):
  File "/Users/bzamecnik/anaconda/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/Users/bzamecnik/anaconda/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/Users/bzamecnik/anaconda/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Users/bzamecnik/anaconda/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [-1] has negative dimensions
	 [[Node: replica_0_1/model_1_sample_weights = Placeholder[dtype=DT_FLOAT, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

loss stuck when using multi_gpu

I'm trying to use make_parallel() using Keras XCeption, and a generator which yields two classes, batch_size=2.

When using one gpu without make_parallel, the model gets to loss=0 acc=1 in 2 epochs.
However, when using multi_gpu with gpus=2, the model gets stuck in acc=0.5 with loss=8.0591.

I'm guessing this is related somehow to the loss aggregation being collected only from one GPU instead of both, but I am not sure why.

When trying to train 4 classes, batch_size=4, the training gets to acc=0.97 after 11 epochs, while single gpu gets acc=1 within 2 epochs.

Any idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.