handsonopencl / lecture-slides Goto Github PK
View Code? Open in Web Editor NEWLecture Slide Issue Tracking
License: Other
Lecture Slide Issue Tracking
License: Other
Hi there,
I've spent a couple of hours on setting up OpenCL on my machine, which is a Ubuntu guest running inside Virtualbox on a Mac with an older Core 2 Duo.
The information in the slides was helpful, but I thought I'd post my findings here anyway.
On recent Ubuntu versions (I'm running 13.04), OpenCL headers can be installed with a simple
sudo apt-get install ocl-icd-opencl-dev
You still need a OpenCL driver though and this is where things get complicated. As I'm running in a virtual machine I was looking for a simple CPU driver.
According to the lecture notes and their website http://software.intel.com/en-us/articles/intel-sdk-for-opencl-applications-2013-release-notes#_Installation_Notes, the drivers that ship with Intel current SDK only support relatively new CPUs, including Xeon Core processors supporting SSE 4.2.
AMD's APP, however, does not only support AMD CPUs and GPUs, it can also be used in conjunction with older Intel CPUs. This is what I currently use to run OpenCL on my Intel CPU.
A very good tutorial on how to setup OpenCL can be found here http://mhr3.blogspot.co.uk/2013/06/opencl-on-ubuntu-1304.html, which also includes instructions on how to setup Intel's and AMD's SDK.
A few sample applications to make sure you've set up OpenCL properly can be found on this website: http://wiki.tiker.net/OpenCLHowTo#Testing, under Testing.
Currently slide 66 is the start of a placeholder for the Python slides, must remember to put this in!
Saved as TODO - must not forget this!
Do we need to say anything about the ICD in the getting started section?
We've had feedback from Intel that it would be nice to include a slide on how to use OpenCL on their GPUs. This will be great to include, but the information is not public yet, at least for Linux, so we'll do this once we've got the info from Intel.
We actually could do this for Mac OS X 10.9 "Mavericks" when that's released, as Apple have already announced that this will include OpenCL 1.2 support for integrated Intel graphics (and Nvidia too).
On slide 55, "C++ interface: The vadd host program", we explain the following three lines:
d_a = Buffer(begin(h_a), end(h_a), true);
d_b = Buffer(begin(h_b), end(h_b), true);
d_c = Buffer(begin(h_c), end(h_c));
Thusly:
Note: These “trues” stipulate that we want to copy an array on the host (i.e. from a host pointer) into the OpenCL buffer. Without this true parameter the buffer is not copied from the host, and created as uninitialized on the device.
True means READ ONLY from the device’s point of view.
This needs cleaning up as we update the C++ API.
These are currently missing from the slides and we at least need to point people to the right documentation if we can't provide detailed instructions as for the other accelerators.
It's not clear from the slides what version they are. Should probably add the version number to the title page.
Not just Xeon. Beignet 1.1 and above is now quite stable.
We need to replace the existing optimised matrix multiply example with the new one from SC13, which was derived from a clean sheet of paper.
The slide "C++ Interface: setting up the host program" recommends using
using namespace cl;
using namespace std;
However, these namespaces have conflicts, particularly size_t
and copy
. This can lead to some very strange errors (for example, changing a buffer from cl::Buffer to cl::BufferGL causes the std::copy template to become the best match and it tries to treat cl::BufferGL as an output iterator; or declaring a size_t variable leads to errors because it was expecting a template parameter).
Even if the code works, I think that it is useful to make it clear which things are coming from the cl namespace.
Slide 35: did we really get the version of cl::Buffer that lets us do:
cl::Buffer d_c(context, CL_MEM_WRITE_ONLY,
sizeof(float)*count);
If we did, do the C++ host examples in later slides consistently use this for write-only device buffers?
Say I want to add two vectors as in example two. In the example, memory space is allocated and host data is created inside the code:
float* h_a = (float*) calloc(LENGTH, sizeof(float)); // a vector
float* h_b = (float*) calloc(LENGTH, sizeof(float)); // a vector
int i = 0;
int count = LENGTH;
for(i = 0; i < count; i++){
h_a[i] = rand() / (float)RAND_MAX;
h_b[i] = rand() / (float)RAND_MAX;
}
I'm trying to pass vectors a
and b
from R
. They are passed as arguments to the main
function as: vadd(double *a, double *b, int *n)
(in R
I have to change main
for some other name like vadd
). I allocate memory and I dereference the arguments as:
int i = 0;
int count = LENGTH;
for(i = 0; i < count; i++){
h_a[i] = a[i];
h_b[i] = b[i];
}
This dereferencing procedure is taking too much time. Is there a way so that these pointers passed as arguments to the function are set as host data directly without the loop? Something like:
h_a = *a; h_b = *b;
The command for installing dependencies (sudo apt-get install build-essential linux-header-generic opencl-headers
) is missing a s
in linux-headers-generic
.
Ideas for now:
Anything else?
Feedback from a user in a large commercial OpenCL-supporting company:
"My 2 cents: i) add a chapter on buffer managements (map and read/write) as part of the opencl intro. This is to include a discussion about the usage and semantic differences between map/unmap and read/write and migrate and ii) discuss the ability to overlap communication with computations via out of order queues or multiple queues in the optimization chapter. This is one of the more important optimizations for OpenCL on discrete accelerators."
Saved as TODO
The slide labeled "An N-dimensional domain of work-items" indicates a 128x128 local size - but this is far bigger than most devices will actually support.
The slide shows LU Decomposition. However the value for a32 (matrix A, 3rd row, 2nd value) should be 1 instead of 2. eg. whole row should be 1,1,4 but is 1,2,4. (1 is the correct result and that's how the matrix actually looks on the previous slide)
Need to make sure the license for use of this material is clear. We're going for creative commons, I think this one (most open):
Feedback from Neil Trevett:
"On my machine Courier Font is very block – Courier New is much better."
We should check what this looks like on a couple of different machines and see what an appropriate fix is so that the Courier font looks good on most machines.
Can we add some instructions on how to use Intel's GPUs with OpenCL? If info is available, could add around slide 5 and 10.
Probably shouldn't mention HSA in an OpenCL tutorial, might be confusing.
Separate discussions of HLM and OpenCL 2.0. Under OpenCL 2.0, make sure we mention “nested parallelism”, SVM and a detailed memory model. Sub-groups are cool too.
Saved as TODO.
Currently left to insert vadd host code on slide 62 and then explain anything on another slide.
From a call to clGetDeviceInfo, I found that the max work-group size for my Intel Core i7 processor is 1024. This should definitely be able to complete the matrix multiply with the given parameters in the solution (64 work elements per work group). However, I get the following error:
Error code was "CL_INVALID_WORK_GROUP_SIZE" (-54)
Some quick googling led me to this link which states that the OpenCL implementation on Mac OS is "a little funky" and that "it is very hard to support CPUs on OSX".
I just changed the device index in the code to select something other than the CPU (on my machine, I have the integrated Intel Iris GPU as well as a dedicated GPU) and the example worked fine. Just a heads up for people who were banging their head against the keyboard like me!
¿Does anyone have the solutions for exercises 10,11 and 12?
this repo is for issue tracking the slides. but where are the slides themselves? I cannot find them anywhere. The website HandsOnOpenCl doesn't explain where to get the slides?
When the slides create a kernel function object, they say
auto vadd = make_kernel<Buffer, Buffer, Buffer, int>(program, "vadd");
which is an unnecessarily roundabout way to say
make_kernel<Buffer, Buffer, Buffer, int> vadd(program, "vadd");
and also makes it appear that the bindings depend on C++11.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.