The lecture-slides's discuss from handsonopencl

Some notes on setting up OpenCL

Hi there,

I've spent a couple of hours on setting up OpenCL on my machine, which is a Ubuntu guest running inside Virtualbox on a Mac with an older Core 2 Duo.
The information in the slides was helpful, but I thought I'd post my findings here anyway.

On recent Ubuntu versions (I'm running 13.04), OpenCL headers can be installed with a simple

sudo apt-get install ocl-icd-opencl-dev

You still need a OpenCL driver though and this is where things get complicated. As I'm running in a virtual machine I was looking for a simple CPU driver.

According to the lecture notes and their website http://software.intel.com/en-us/articles/intel-sdk-for-opencl-applications-2013-release-notes#_Installation_Notes, the drivers that ship with Intel current SDK only support relatively new CPUs, including Xeon Core processors supporting SSE 4.2.

AMD's APP, however, does not only support AMD CPUs and GPUs, it can also be used in conjunction with older Intel CPUs. This is what I currently use to run OpenCL on my Intel CPU.

A very good tutorial on how to setup OpenCL can be found here http://mhr3.blogspot.co.uk/2013/06/opencl-on-ubuntu-1304.html, which also includes instructions on how to setup Intel's and AMD's SDK.

A few sample applications to make sure you've set up OpenCL properly can be found on this website: http://wiki.tiker.net/OpenCLHowTo#Testing, under Testing.

Need to add section on working with kernels from Python

Currently slide 66 is the start of a placeholder for the Python slides, must remember to put this in!

C++ code on slides is out of date with latest header file

Saved as TODO - must not forget this!

Do we need to add something about the ICD?

Do we need to say anything about the ICD in the getting started section?

Pull in Profiling slides

Add a slide on using OpenCL on Intel GPUs

We've had feedback from Intel that it would be nice to include a slide on how to use OpenCL on their GPUs. This will be great to include, but the information is not public yet, at least for Linux, so we'll do this once we've got the info from Intel.

We actually could do this for Mac OS X 10.9 "Mavericks" when that's released, as Apple have already announced that this will include OpenCL 1.2 support for integrated Intel graphics (and Nvidia too).

Slide 55: "C++ interface: The vadd host program", Buffer default explanation correct?

On slide 55, "C++ interface: The vadd host program", we explain the following three lines:

d_a = Buffer(begin(h_a), end(h_a), true);
d_b = Buffer(begin(h_b), end(h_b), true);
d_c = Buffer(begin(h_c), end(h_c));

Thusly:

Note: These “trues” stipulate that we want to copy an array on the host (i.e. from a host pointer) into the OpenCL buffer. Without this true parameter the buffer is not copied from the host, and created as uninitialized on the device.
True means READ ONLY from the device’s point of view.

This needs cleaning up as we update the C++ API.

Instructions for Xeon Phi

These are currently missing from the slides and we at least need to point people to the right documentation if we can't provide detailed instructions as for the other accelerators.

Add version number to slides

It's not clear from the slides what version they are. Should probably add the version number to the title page.

Just want to add that you can use it only any Intel HD 3000 and above CPU

Not just Xeon. Beignet 1.1 and above is now quite stable.

Update optimised matrix multiply example

We need to replace the existing optimised matrix multiply example with the new one from SC13, which was derived from a clean sheet of paper.

Importing both std and cl namespace is a bad idea

The slide "C++ Interface: setting up the host program" recommends using

using namespace cl;
using namespace std;

However, these namespaces have conflicts, particularly size_t and copy. This can lead to some very strange errors (for example, changing a buffer from cl::Buffer to cl::BufferGL causes the std::copy template to become the best match and it tries to treat cl::BufferGL as an output iterator; or declaring a size_t variable leads to errors because it was expecting a template parameter).

Even if the code works, I think that it is useful to make it clear which things are coming from the cl namespace.

Inconsistent Buffer constructor use for result

Slide 35: did we really get the version of cl::Buffer that lets us do:

cl::Buffer d_c(context, CL_MEM_WRITE_ONLY,
sizeof(float)*count);

If we did, do the C++ host examples in later slides consistently use this for write-only device buffers?

Pointers from function argument as host data (memory allocation)

Say I want to add two vectors as in example two. In the example, memory space is allocated and host data is created inside the code:

    float*       h_a = (float*) calloc(LENGTH, sizeof(float));       // a vector
    float*       h_b = (float*) calloc(LENGTH, sizeof(float));  // a vector
     int i = 0;
     int count = LENGTH;
     for(i = 0; i < count; i++){
        h_a[i] = rand() / (float)RAND_MAX;
        h_b[i] = rand() / (float)RAND_MAX;
    }

I'm trying to pass vectors a and b from R. They are passed as arguments to the main function as: vadd(double *a, double *b, int *n) (in RI have to change mainfor some other name like vadd). I allocate memory and I dereference the arguments as:

int i = 0;
    int count = LENGTH;
    for(i = 0; i < count; i++){
        h_a[i] = a[i];
        h_b[i] = b[i];
    }

This dereferencing procedure is taking too much time. Is there a way so that these pointers passed as arguments to the function are set as host data directly without the loop? Something like:

h_a = *a; h_b = *b;

Split Exercise 07 - matrix multiply

Typo in Slide 14: Setting up with NVIDIA GPUs

The command for installing dependencies (sudo apt-get install build-essential linux-header-generic opencl-headers) is missing a s in linux-headers-generic.

Things to add on slide for "Other Resources"

Ideas for now:

Anything else?

Add material on buffer management

Feedback from a user in a large commercial OpenCL-supporting company:

"My 2 cents: i) add a chapter on buffer managements (map and read/write) as part of the opencl intro. This is to include a discussion about the usage and semantic differences between map/unmap and read/write and migrate and ii) discuss the ability to overlap communication with computations via out of order queues or multiple queues in the optimization chapter. This is one of the more important optimizations for OpenCL on discrete accelerators."

Information on PyOpenCL set_scalar_arg_dtypes

Saved as TODO

128x128 work-group size is unreasonable

The slide labeled "An N-dimensional domain of work-items" indicates a 128x128 local size - but this is far bigger than most devices will actually support.

Typo on Slide 87 (Solving Ax = b)

The slide shows LU Decomposition. However the value for a32 (matrix A, 3rd row, 2nd value) should be 1 instead of 2. eg. whole row should be 1,1,4 but is 1,2,4. (1 is the correct result and that's how the matrix actually looks on the previous slide)

Should add something about the license

Need to make sure the license for use of this material is clear. We're going for creative commons, I think this one (most open):

http://creativecommons.org/licenses/by/3.0/

Might need to use a different courier font

Feedback from Neil Trevett:

"On my machine Courier Font is very block – Courier New is much better."

We should check what this looks like on a couple of different machines and see what an appropriate fix is so that the Courier font looks good on most machines.

More useful feedback

Can we add some instructions on how to use Intel's GPUs with OpenCL? If info is available, could add around slide 5 and 10.
Probably shouldn't mention HSA in an OpenCL tutorial, might be confusing.
Separate discussions of HLM and OpenCL 2.0. Under OpenCL 2.0, make sure we mention “nested parallelism”, SVM and a detailed memory model. Sub-groups are cool too.

Need to add host code slide on Python

Saved as TODO.

Currently left to insert vadd host code on slide 62 and then explain anything on another slide.

RE: Exercise07, MacOSX and specifying work group size.

From a call to clGetDeviceInfo, I found that the max work-group size for my Intel Core i7 processor is 1024. This should definitely be able to complete the matrix multiply with the given parameters in the solution (64 work elements per work group). However, I get the following error:

Error code was "CL_INVALID_WORK_GROUP_SIZE" (-54)

Some quick googling led me to this link which states that the OpenCL implementation on Mac OS is "a little funky" and that "it is very hard to support CPUs on OSX".

I just changed the device index in the code to select something other than the CPU (on my machine, I have the integrated Intel Iris GPU as well as a dedicated GPU) and the example worked fine. Just a heads up for people who were banging their head against the keyboard like me!

auto vadd = make_kernel<Buffer, Buffer, Buffer, int>(program, "vadd");

which is an unnecessarily roundabout way to say

make_kernel<Buffer, Buffer, Buffer, int> vadd(program, "vadd");

and also makes it appear that the bindings depend on C++11.

handsonopencl / lecture-slides Goto Github PK

lecture-slides's Issues

Recommend Projects

Recommend Topics

Recommend Org