handsonopencl / exercises-solutions Goto Github PK
View Code? Open in Web Editor NEWC, C++ and Python Code for Exercises and Solutions
License: Other
C, C++ and Python Code for Exercises and Solutions
License: Other
Need to make sure the license for use of this material is clear. We're going for creative commons, I think this one (most open):
In the C++ code, some examples (matmul) use wtime()
and some examples (pi_ocl) use the util::Timer
.
They should all probably be consistent with themselves.
Exercise04 is completely empty. I think we need to put some template code here, possibly even the solution to Exercise 2/3.
A potential bug report from a user:
"I had to make a change to matrix_lib.cpp.
if(isnan(errsq) || ...
I had to add "std::"
if(std::isnan(errsq) || …"
They were using Mac OS X 10.7 "Lion" with gcc 4.8:
"g++-mp-4.8 -std=c++11"
If I run make ; ./mult
in Solutions/Exercise06/C
or Solutions/Exercise06/Cpp
I get the following output:
===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
7.67 seconds at 279.9 MFLOPS
===== OpenCL, matrix mult, C(i,j) per work item, order 1024 ======
5.01 seconds at 428.9 MFLOPS
Errors in multiplication: 168394460495872.000000
This is the output from the C
executable, although the Cpp
one gives similar results.
Am I correct in thinking that the error should be somewhat smaller?
Is this a known bug?
I'm running OS X 10.9.5, Core i7, Intel HD Graphics 4000, NVIDIA GeForce GT 650M 1024 MB.
I believe the OpenCL kernel runs on the GeForce in these examples.
I just noticed that not all the dependencies on header files are correctly captured in the Makefiles. This can lead to some erroneous behaviour when recompiling. The matrix multiply example and solution is one specific set of examples that suffers from this bug.
Just trying the solutions on my Apple Macbook Air. After changing the Makefiles to use -framework OpenCL and -DAPPLE, they compile OK. But the C code assumes it will find a GPU in the following code:
// Set up OpenCL context. queue, kernel, etc.
cl_uint numPlatforms;
// Find number of platforms
err = clGetPlatformIDs(0, NULL, &numPlatforms);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
printf("Error: Failed to find a platform!\n",err_code(err));
return EXIT_FAILURE;
}
// Get all platforms
cl_platform_id Platform[numPlatforms];
err = clGetPlatformIDs(numPlatforms, Platform, NULL);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
printf("Error: Failed to get the platform!\n",err_code(err));
return EXIT_FAILURE;
}
// Secure a device
for (int i = 0; i < numPlatforms; i++)
{
err = clGetDeviceIDs(Platform[i], DEVICE, 1, &device_id, NULL);
if (err == CL_SUCCESS)
break;
}
if (device_id == NULL)
{
printf("Error: Failed to create a device group!\n",err_code(err));
return EXIT_FAILURE;
}
DEVICE is defined in matmul.h to be CL_DEVICE_TYPE_GPU.
This means the program exits with "Error: Failed to create a device group!".
Whereas the Python solution assumes any valid OpenCL device.
So, what do we want this to do? Make it CL_DEVICE_TYPE_DEFAULT in the C code?
Saved as TODO
Saved as TODO
In the latest version, the err_code() function contained a bug where the variable err_in was misnamed as err_int at the end of the function.
In what is now slide 82, we list the solution for Exercise06, where the student should have written their own kernel for the first time by converting the sequential C code into a simple matrix multiply kernel.
The solution in the slides has a body that looks like this:
{
int k;
int i = get_global_id(0);
int j = get_global_id(1);
float tmp = 0.0f;
for (k = 0; k < Pdim; k++)
tmp += A[i_Ndim+k] * B[k_Pdim+j];
}
C[i*Ndim+j] += tmp;
}
Whereas in the sequential C code solution provided in source form inside matrix_lib.c, its body looks like this:
for (i=0; i<Ndim; i++){
for (j=0; j<Mdim; j++){
tmp = 0.0;
for(k=0;k<Pdim;k++){
/* C(i,j) = sum(over k) A(i,k) * B(k,j) */
tmp += *(A+(i*Ndim+k)) * *(B+(k*Pdim+j));
}
*(C+(i*Ndim+j)) = tmp;
}
}
This is a very different style of array addressing and could confuse the students. We should change the sequential C code inside matrix_lib.c in both the Exercise and the Solution so that the body looks like this:
for (i=0; i<Ndim; i++) {
for (j=0; j<Mdim; j++) {
tmp = 0.0f;
for (k=0; k<Pdim; k++) {
/* C(i,j) = sum(over k) A(i,k) * B(k,j) */
tmp += A[i*Ndim+k] * B[k*Pdim+j];
}
C[i*Ndim+j] += tmp;
}
}
Notice I've also added a few spaces inside the "for" statements, and also changed the definition of tmp to 0.0f from 0.0 (just good practise!).
Note that in the actual OpenCL kernel solution for Exercise06 the code is as we would want it, i.e. consistent with the above, apart from the 0.0 also needs to be changed to 0.0f.
Saved as TODO
Remove this - TODO!
Makes it nasty for intel and clang compiler and is unnecessary for g++ anyway
Saved as TODO
When running the Python solution code on Blue Crystal, the initial CPU code is so slow, it feels like it's hung. For example, on my Nehalem test machine (a GPU node in Blue Crystal phase 1), I get:
===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
1256.22704506 seconds at 1.70947095626 MFLOPS
20 minutes is a long time to wait, especially when the C version only takes about 10 seconds on the same machine:
===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
10.31 seconds at 208.2 MFLOPS
I think this is too long to wait, users will think something it wrong.
Either we need to make the Python faster on the CPU, or leave the CPU version commented out by default!
Saved as TODO.
C has the error printed out, but C++ has the number. This isn't very helpful!
Saved as TODO
If you use a recent Xcode on Mac OS X, it won't build gameoflife from Exercise13:
cc gameoflife.c -O3 -std=c99 -o gameoflife
gameoflife.c:102:5: error: second parameter of 'main' (argument array) must be of type 'char *_'
int main(int argc, void *_argv)
^
This is with the following version of the tools:
$ cc --version
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.5.0
Thread model: posix
The fix is obvious.
Saved as TODO
Saved as TODO
When float variable are initialized with a constant I have some opencl warning :
"/tmp/OCL9395T11.cl", line 19: warning: double-precision constant is
represented as single-precision constant because double is not
enabled
tmp = 0.0;
^
warn(text, CompilerWarning)
A change of all 0.0 with 0.0f will remove this kind of warning.
I know this is a very minor issue, would you like that I make the change.
The C helper function we provide, int err_code (cl_int err_in) in err_code.c, has an incomplete list of error codes it will recognise. In particular, it doesn't know about CL_DEVICE_NOT_FOUND, which is quite an important one.
This has already bitten me when one solution code expected a GPU, but my MBA doesn't expose one.
It would be worth updating the list in err_code() against the latest OpenCL v1.1 header file and making it a complete set.
In fact, a simple script that would take the appropriate chunk from cl.h and turn it into err_code() would be useful as we migrate this to support v1.2 and 2.0 etc.
The exercise 6 code with the kernel deleted doesn't produce an error when the buffer is returned. As no kernel is running the buffer shouldn't have the correct result in!
There's no Python directory or solution for Exercise 2, should there be?
Needs commenting out, as we did with the solution.
Saved as TODO
On an Nvidia M2050 and a fast Nehalem host, the C code takes about 0.9s while the Python version takes about 63s. These should ideally take an almost identical amount of time.
Saved as TODO.
Saved as TODO
Saved as TODO
The top level README describes how to build and run the C and Python exercises and solutions, but doesn't mention the C++ ones. Need to add this.
Saved as TODO
I add no problem to launch most of the example, but the solution of the exercice9 I get this error message :
===== OpenCL, A and B in block form in local memory, order 1024 ======
Traceback (most recent call last):
File "matmul.py", line 187, in
d_a, d_b, d_c, localmem1, localmem2)
File "/usr/local/lib/python2.7/dist-packages/pyopencl/init.py", line 466, in kernel_call
global_offset, wait_for, g_times_l=g_times_l)
pyopencl.LogicError: clEnqueueNDRangeKernel failed: invalid work group size
Saved as TODO.
Saved as TODO
For the Exercises and Solutions, it doesn't take much to get them all compiled and running on a Mac. All we have to do is modify two lines in the Makefiles from something that looks like this:
CCFLAGS=-O3 -lm -std=c99 -ffast-math
LIBS = -fopenmp -lOpenCL
To:
CCFLAGS=-O3 -lm -std=c99 -ffast-math -DAPPLE
LIBS = -fopenmp -framework OpenCL
There are two main ways we could do this:
Use a condition inside the Makefile itself that looks for APPLE
Use a make.def which we modify for each platform.
For previous versions of the course we used 2) with great effect, and I still have make.def files for Nvidia, AMD, Intel and Mac OSX.
Saved as TODO
Saved as a TODO
The timer in Cpp_common/util.cpp might not work on some Apple systems. Trying to use the C++ timer program on some Mac OS X laptops can give absurd times (186302452924.23423 seconds for a simple vadd, for example).
Provide a matrix multiply host code to just run a single kernel, rather than multiple kernels along with the serial version.
Saved as a TODO.
We should include a sanity checking test suite for the game of life - provide some simple inputs and outputs and check the final states. This will be helpful when completing the exercise too.
Suggestions from Andreas Kloeckner, creator of PyOpenCL:
From a brief look at the slides, the only feedback would be that
prg.kernel(...)
reexecutes clCreateKernel() on every launch, so storing a reference to
the kernel may be more efficient. In addition, the issue of having to
cast arguments to numpy types can be alleviated by
http://documen.tician.de/pyopencl/runtime.html#pyopencl.Kernel.set_scalar_arg_dtypes
I'm not suggesting that you include this information (it might well be
that you left it out on purpose), I'm just trying to make sure you're
aware of it. :)
The pi_vocl program doen’t work. The kernel enqueue throws an exception, especially when trying to run on a CPU.
Update the README
Apple has moved to using the LLVM framework by default for Xcode 5, from gcc in previous versions of Xcode. The new Xcode doesn't have support for OpenMP by default, and so the -fopenmp flag in many of the Makefiles causes an error on Mac OS X.
This flag isn't strictly needed, so the fix is simply to remove it, especially on Mac OS X platforms.
On Mac OS X we want to make sure we always use CLANG so that we get the C++11 support we need for the C++ wrapper, cl.hpp. But the top-level Makefiles set CC to gcc even on Mac OS X. Need to fix this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.