My enviroment : RedHat Linux 7.2 gcc 4.8.5 nvidia-driver : 410.48 cuda

I tried it with 0.8.0 and I am still getting this error. <a class="u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Training on GPU with the local executor stack. That is, calling <code class="notransla

Cannot use GPU to train in 0.5.0 about federated HOT 7 CLOSED

tensorflow commented on May 9, 2024

Cannot use GPU to train in 0.5.0

from federated.

Comments (7)

ProfXGiter commented on May 9, 2024 1

I run model.fit() without TFF, it is OK in GPU Training.

And I also run the mnist example in tf-gpu:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python import layers
from tensorflow.python import nn

mnist = input_data.read_data_sets("./MNIST_data/", one_hot=True)

sess = tf.InteractiveSession()

x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])

conv1 = layers.conv2d(x_image, 32, 5, padding='same', name='conv1')
relu1 = nn.relu(conv1, name='relu1')
maxppool1 = layers.max_pooling2d(relu1, 2, 2, name='maxpool1')

conv2 = layers.conv2d(maxppool1, 64, 5, padding='same', name='conv2')
relu2 = nn.relu(conv2, name='relu2')
maxppool2 = layers.max_pooling2d(relu2, 2, 2, name='maxpool2')

flattern = layers.flatten(maxppool2, name='flattern')
fc1 = layers.dense(flattern, 1024, activation=tf.nn.relu, name='fc1')
fc1_dropout = layers.dropout(fc1, 0.8, name='fc1_dropout')
fc2 = layers.dense(fc1_dropout, 10, activation=tf.nn.softmax, name='fc2')

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(fc2), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediciton = tf.equal(tf.argmax(fc2, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediciton, tf.float32))

tf.global_variables_initializer().run()
for i in range(1, 10001):
batch = mnist.train.next_batch(64)
if i % 10 == 0:
train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1]})
print("step %d, examples %d, training accuracy %g" % (i, i * 64, train_accuracy))
print("test accuracy %g" % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
train_step.run(feed_dict={x: batch[0], y_: batch[1]})

print("test accuracy %g" % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

It also work.
@ZacharyGarrett

from federated.

anupamme commented on May 9, 2024 1

Hello!

I am experiencing the same issue with version 0.7.0 and tf-nightly-gpu (gpu)

And it works just fine with tf-nightly (cpu)

@ZacharyGarrett if we use model.fit instead of tff.learning, is that a workaround for now?

Or do we have to wait for some sort of fix here?

from federated.

ZacharyGarrett commented on May 9, 2024

Could you try training the Keras model without an TFF and see if the same error is raised? i.e. call model.fit() without using any of the tff.learning modules.

from federated.

jiachangliu commented on May 9, 2024

I installed with version 0.8.0 and am able to run codes with GPU.

from federated.

anupamme commented on May 9, 2024

I tried it with 0.8.0 and I am still getting this error.

@jiachangliu is your code open source? Can you point me to it?

My code is open source [1]. Run federated.py.

https://github.com/anupamme/CheXpert-Keras

from federated.

jiachangliu commented on May 9, 2024

@anupamme I'm actually just running the image classification tutorial from federated learning. https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification

from federated.

jkr26 commented on May 9, 2024

Training on GPU with the local executor stack. That is, calling tff.framework.set_default_executor(tff.framework.create_local_executor()) before execution, should allow you to utilize the GPU.

In particular, if you run the image classification colab with a GPU-backed cloud-hosted runtime, you will see a roughly 2X speedup compared with the CPU-backed runtime.

We've seen local training working with our recent versions as well, utilizing GPU resources; closing this issue.

from federated.

Cannot use GPU to train in 0.5.0 about federated HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent