Coder Social home page Coder Social logo

Comments (13)

lukeiwanski avatar lukeiwanski commented on September 21, 2024

Hi @jarrellmark

Thanks for reporting this!

That has been addressed in 5cc8cdd. Could you please try it out?

SYCL default allocator did not take alignment into consideration. That now has been addressed in Eigen, where we are passing the required alignment to the custom allocator. C++ is great!

Thanks,

from tensorflow-opencl.

jarrellmark avatar jarrellmark commented on September 21, 2024

Hey @lukeiwanski,

The IsAligned() message went away, but I'm getting this message now:

2017-03-20 21:30:14.765017: W ./tensorflow/core/common_runtime/sycl/sycl_util.h:44] No OpenCL GPU found that is supported by ComputeCpp, trying OpenCL CPU

Is there a way to force the GPU?

from tensorflow-opencl.

lukeiwanski avatar lukeiwanski commented on September 21, 2024

Currently we have an issue with memory alignment on Intel GPUs and have set the Intel GPU as "blacklisted" in Eigen. This means Eigen will not try to target Intel GPUs at the moment. We are working on a resolution for this and will update you when we have a fix available.

from tensorflow-opencl.

jarrellmark avatar jarrellmark commented on September 21, 2024

Thanks, Luke.

I appreciate it and am excited about the progress that tensorflow-opencl is making.

from tensorflow-opencl.

nicholaslarusstone avatar nicholaslarusstone commented on September 21, 2024

Hi @lukeiwanski,

I'm having the same issue and was wondering if you have added Eigen support for Intel GPUs yet. If not, is there some way I can un-blacklist the Intel GPU?

Thanks for your hard work on this project!

from tensorflow-opencl.

lukeiwanski avatar lukeiwanski commented on September 21, 2024

Can you give it a spin on this branch: https://github.com/lukeiwanski/tensorflow/tree/dev/eigen_mehdi ?

from tensorflow-opencl.

nicholaslarusstone avatar nicholaslarusstone commented on September 21, 2024

That fixed the error, thanks a lot!

An unrelated question, tensorflow keeps telling me I'm running on a SYCL device, but then it calls that device a CPU. When I run sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)), I get the following output:

/job:localhost/replica:0/task:0/device:SYCL:0 -> id: 0, type: CPU, name: Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, vendor: Intel(R) Corporation, profile: FULL_PROFILE

Running tensorflow.python.client.device_lib.list_local_devices() gives me the following:

[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 177593382533810523, name: "/device:SYCL:0"
device_type: "SYCL"
memory_limit: 268435456
locality {
}
incarnation: 1258559034356206920
physical_device_desc: "id: 0, type: CPU, name: Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, vendor: Intel(R) Corporation, profile: FULL_PROFILE"]

However, this device is NOT my GPU, as can be seen from when I run clinfo:

Platform Name Intel(R) OpenCL
Number of devices 2
Device Name Intel(R) HD Graphics
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.0
Driver Version r5.0.63503
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Profile FULL_PROFILE

.
.
.

Device Name Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.0 (Build 475)
Driver Version 1.2.0.475
Device OpenCL C Version OpenCL C 2.0
Device Type CPU
Device Profile FULL_PROFILE

Thanks for all your help already!

from tensorflow-opencl.

nicholaslarusstone avatar nicholaslarusstone commented on September 21, 2024

However, I am later getting this error when I try to run a simple keras model (just 2 dense layers): InternalError: Unknown error detected on device /job:localhost/replica:0/task:0/device:SYCL:0

from tensorflow-opencl.

lukeiwanski avatar lukeiwanski commented on September 21, 2024

That's interesting.. could you provide code to reproduce that issue?

from tensorflow-opencl.

nicholaslarusstone avatar nicholaslarusstone commented on September 21, 2024

I'm having trouble reproducing this issue because the code seems to just be hanging (I'm getting a lot of these messages:
./tensorflow/core/common_runtime/executor.cc:1556] Process node: 48 step 2 mul_3 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:SYCL:0"](beta_1/read, Variable/read) is dead: 0

But here's my code:
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
model = Sequential()
model.add(Dense(32, input_shape=(timesteps, D_in)))
model.add(Dense(D_out))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=N, epochs=5, validation_data=(X_test, y_test))

from tensorflow-opencl.

nicholaslarusstone avatar nicholaslarusstone commented on September 21, 2024

Ah, ok I've reproduced the earlier error by using LSTM layers. It may be unreasonable for me to expect LSTM layers to work, but I am also having trouble with just dense layers (see above). Here's my code:

from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.layers.wrappers import TimeDistributed
from keras import optimizers
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_dim=D_in, input_length=timesteps))
model.add(LSTM(32, return_sequences=True))
model.add(TimeDistributed(Dense(D_out, activation='softmax')))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=N, epochs=1, validation_data=(X_test, y_test))

And here's a trace of the error message:
InternalError Traceback (most recent call last)
in ()
----> 1 model.fit(X_train, y_train, batch_size=N, epochs=1, validation_data=(X_test, y_test))

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/keras/models.pyc in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
861 class_weight=class_weight,
862 sample_weight=sample_weight,
--> 863 initial_epoch=initial_epoch)
864
865 def evaluate(self, x, y, batch_size=32, verbose=1,

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/keras/engine/training.pyc in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, **kwargs)
1428 val_f=val_f, val_ins=val_ins, shuffle=shuffle,
1429 callback_metrics=callback_metrics,
-> 1430 initial_epoch=initial_epoch)
1431
1432 def evaluate(self, x, y, batch_size=32, verbose=1, sample_weight=None):

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/keras/engine/training.pyc in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch)
1077 batch_logs['size'] = len(batch_ids)
1078 callbacks.on_batch_begin(batch_index, batch_logs)
-> 1079 outs = f(ins_batch)
1080 if not isinstance(outs, list):
1081 outs = [outs]

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.pyc in call(self, inputs)
2266 updated = session.run(self.outputs + [self.updates_op],
2267 feed_dict=feed_dict,
-> 2268 **self.session_kwargs)
2269 return updated[:len(self.outputs)]
2270

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
1116 if final_fetches or final_targets or (handle and feed_dict_tensor):
1117 results = self._do_run(handle, final_targets, final_fetches,
-> 1118 feed_dict_tensor, options, run_metadata)
1119 else:
1120 results = []

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1313 if handle is None:
1314 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1315 options, run_metadata)
1316 else:
1317 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/home/nicholas/.virtualenvs/tensorflow-luke/local/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1332 except KeyError:
1333 pass
-> 1334 raise type(e)(node_def, op, message)
1335
1336 def _extend_graph(self):

InternalError: Unknown error detected on device /job:localhost/replica:0/task:0/device:SYCL:0

from tensorflow-opencl.

mihailescu2m avatar mihailescu2m commented on September 21, 2024

Hi @lukeiwanski

I am having the same issue (Check failed: IsAligned()) with tf-coriander (https://github.com/hughperkins/tf-coriander) using a Mali T-728 GPU.

Do you have a patch I could try to fix this? Or an advice on how to go about fixing it?
Thanks!

from tensorflow-opencl.

DeadZen avatar DeadZen commented on September 21, 2024

Ping on this issue

from tensorflow-opencl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.