handsonopencl / exercises-solutions Goto Github PK

View Code? Open in Web Editor NEW

490.0 40.0 184.0 429 KB

C, C++ and Python Code for Exercises and Solutions

License: Other

C 21.95% C++ 68.17% Makefile 2.20% Python 5.48% Cuda 1.30% Shell 0.02% Objective-C 0.88%

exercises-solutions's People

Contributors

Stargazers

Watchers

Forkers

aestey alfiememo jakebolewski awolfmann bmerry azuredsky massinissalounis raoulchartreuse yusufameri ominiavincit v3c70r quantummechanist benelgar xgitty shayanc an4 holygeneralk nivertech chayao2015 brijesh68kumar ssouyris pritish4 jrprice jimlaimun quantscientist3 byshiny jghoman didwardfrenkel cloudswenable thelac phoenixstar7 abhimahatu123 linan7788626 ahmed-azri matzfan calufrax jeorme jamesadamhughes gallagth cirosantilli jamielikeschickens jeffheifetz jdanecki biomining pranavcode umangparekh niklas-peter robodhruv chiachun richom ceandrews95 zackishome koneko096 kartikaygarg archenroot charudatta10 profcab jimmysitu pablitos92 natlang silviu-at perp azatsman slongofono gauthamicme aromazyl fr33dz bhupesh19 toggled c00lrain xuexianwu hlspolito benjins blebo jdschmitz sivagnanamn eshnil2000 siddhart92 gpuworld juansalmeronmoya iamkevinzhao ballber type-a dimap8889 zbie akshit-sharma dailyactie krishnaw14 ptran1999 pratikkulkarni228 sangkwun morphbc dellytaken mgniew bourboncreams nineaxe jjasoliya fejiso j105rob xianjunzhengbackup

exercises-solutions's Issues

Need to add something on which license we're using

Need to make sure the license for use of this material is clear. We're going for creative commons, I think this one (most open):

http://creativecommons.org/licenses/by/3.0/

Inconsistent use of timers

In the C++ code, some examples (matmul) use wtime() and some examples (pi_ocl) use the util::Timer.

They should all probably be consistent with themselves.

Global make clean doesn't work as expected

Template code for Exercise04 missing

Exercise04 is completely empty. I think we need to put some template code here, possibly even the solution to Exercise 2/3.

Might need to add std:: prefix to isnan() call in matrix_lib.cpp

A potential bug report from a user:

"I had to make a change to matrix_lib.cpp.

 if(isnan(errsq) || ...

I had to add "std::"

 if(std::isnan(errsq) || …"

They were using Mac OS X 10.7 "Lion" with gcc 4.8:

"g++-mp-4.8 -std=c++11"

Exercise 6 C and Cpp solutions are incorrect

If I run make ; ./mult in Solutions/Exercise06/C or Solutions/Exercise06/Cpp I get the following output:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
 7.67 seconds at 279.9 MFLOPS 

===== OpenCL, matrix mult, C(i,j) per work item, order 1024 ======
 5.01 seconds at 428.9 MFLOPS 

 Errors in multiplication: 168394460495872.000000

This is the output from the C executable, although the Cpp one gives similar results.

Am I correct in thinking that the error should be somewhat smaller?
Is this a known bug?

I'm running OS X 10.9.5, Core i7, Intel HD Graphics 4000, NVIDIA GeForce GT 650M 1024 MB.
I believe the OpenCL kernel runs on the GeForce in these examples.

Error output for C doesn't use the err_code.c file

Header file dependencies are missing from some Makefiles

I just noticed that not all the dependencies on header files are correctly captured in the Makefiles. This can lead to some erroneous behaviour when recompiling. The matrix multiply example and solution is one specific set of examples that suffers from this bug.

Solution for Exercise06 assumes a GPU in C, but anything in Python

Just trying the solutions on my Apple Macbook Air. After changing the Makefiles to use -framework OpenCL and -DAPPLE, they compile OK. But the C code assumes it will find a GPU in the following code:

// Set up OpenCL context. queue, kernel, etc.
cl_uint numPlatforms;
// Find number of platforms
err = clGetPlatformIDs(0, NULL, &numPlatforms);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
    printf("Error: Failed to find a platform!\n",err_code(err));
    return EXIT_FAILURE;
}
// Get all platforms
cl_platform_id Platform[numPlatforms];
err = clGetPlatformIDs(numPlatforms, Platform, NULL);
if (err != CL_SUCCESS || numPlatforms <= 0)
{
    printf("Error: Failed to get the platform!\n",err_code(err));
    return EXIT_FAILURE;
}
// Secure a device
for (int i = 0; i < numPlatforms; i++)
{
    err = clGetDeviceIDs(Platform[i], DEVICE, 1, &device_id, NULL);
    if (err == CL_SUCCESS)
        break;
}
if (device_id == NULL)
{
    printf("Error: Failed to create a device group!\n",err_code(err));
    return EXIT_FAILURE;
}

DEVICE is defined in matmul.h to be CL_DEVICE_TYPE_GPU.

This means the program exits with "Error: Failed to create a device group!".

Whereas the Python solution assumes any valid OpenCL device.

So, what do we want this to do? Make it CL_DEVICE_TYPE_DEFAULT in the C code?

C++ Versions of Exercise 06 and 07 (inc. Solutions) to add

Saved as TODO

Include final_state.dat for Game of Life examples

C++ global makefile

Saved as TODO

The err_code() function wouldn't compile

In the latest version, the err_code() function contained a bug where the variable err_in was misnamed as err_int at the end of the function.

Differences between Exercise06 solution in slides and code

In what is now slide 82, we list the solution for Exercise06, where the student should have written their own kernel for the first time by converting the sequential C code into a simple matrix multiply kernel.

The solution in the slides has a body that looks like this:

{
int k;
int i = get_global_id(0);
int j = get_global_id(1);
float tmp = 0.0f;
for (k = 0; k < Pdim; k++)
tmp += A[i_Ndim+k] * B[k_Pdim+j];
}
C[i*Ndim+j] += tmp;
}

Whereas in the sequential C code solution provided in source form inside matrix_lib.c, its body looks like this:

for (i=0; i<Ndim; i++){
    for (j=0; j<Mdim; j++){
        tmp = 0.0;
        for(k=0;k<Pdim;k++){
             /* C(i,j) = sum(over k) A(i,k) * B(k,j) */
             tmp += *(A+(i*Ndim+k)) *  *(B+(k*Pdim+j));
         }
         *(C+(i*Ndim+j)) = tmp;
      }
}

This is a very different style of array addressing and could confuse the students. We should change the sequential C code inside matrix_lib.c in both the Exercise and the Solution so that the body looks like this:

for (i=0; i<Ndim; i++) {
    for (j=0; j<Mdim; j++) {
        tmp = 0.0f;
        for (k=0; k<Pdim; k++) {
             /* C(i,j) = sum(over k) A(i,k) * B(k,j) */
             tmp += A[i*Ndim+k] * B[k*Pdim+j];
         }
         C[i*Ndim+j] += tmp;
      }
}

Notice I've also added a few spaces inside the "for" statements, and also changed the definition of tmp to 0.0f from 0.0 (just good practise!).

Note that in the actual OpenCL kernel solution for Exercise06 the code is as we would want it, i.e. consistent with the above, apart from the 0.0 also needs to be changed to 0.0f.

Matrix Multiply C code for exercise 6

Saved as TODO

No need for -std=gnu++11 flag

Remove this - TODO!

Makes it nasty for intel and clang compiler and is unnecessary for g++ anyway

Matrix Multiply C code for solution 7

Saved as TODO

Exercise 06 Python solution code takes too long on the host

When running the Python solution code on Blue Crystal, the initial CPU code is so slow, it feels like it's hung. For example, on my Nehalem test machine (a GPU node in Blue Crystal phase 1), I get:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======

1256.22704506 seconds at 1.70947095626 MFLOPS

20 minutes is a long time to wait, especially when the C version only takes about 10 seconds on the same machine:

===== Sequential, matrix mult (dot prod), order 1024 on host CPU ======
10.31 seconds at 208.2 MFLOPS

I think this is too long to wait, users will think something it wrong.

Either we need to make the Python faster on the CPU, or leave the CPU version commented out by default!

Check for OS X defined APPLE preprocessor define

Nicer error output for C++

Saved as TODO.

C has the error printed out, but C++ has the number. This isn't very helpful!

No Python serial code for Exercise 8

Saved as TODO

Gameoflife example won't build on Mac OS X

If you use a recent Xcode on Mac OS X, it won't build gameoflife from Exercise13:

cc gameoflife.c -O3 -std=c99 -o gameoflife
gameoflife.c:102:5: error: second parameter of 'main' (argument array) must be of type 'char *_'
int main(int argc, void *_argv)
^

This is with the following version of the tools:

$ cc --version
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.5.0
Thread model: posix

The fix is obvious.

Platform Information in C

Saved as TODO

C VADD kernel for exercise 2

Saved as TODO

Warning with float initialization

When float variable are initialized with a constant I have some opencl warning :
"/tmp/OCL9395T11.cl", line 19: warning: double-precision constant is
represented as single-precision constant because double is not
enabled
tmp = 0.0;
^

warn(text, CompilerWarning)

A change of all 0.0 with 0.0f will remove this kind of warning.
I know this is a very minor issue, would you like that I make the change.

C helper function has incomplete list of error numbers

The C helper function we provide, int err_code (cl_int err_in) in err_code.c, has an incomplete list of error codes it will recognise. In particular, it doesn't know about CL_DEVICE_NOT_FOUND, which is quite an important one.

This has already bitten me when one solution code expected a GPU, but my MBA doesn't expose one.

It would be worth updating the list in err_code() against the latest OpenCL v1.1 header file and making it a complete set.

In fact, a simple script that would take the appropriate chunk from cl.h and turn it into err_code() would be useful as we migrate this to support v1.2 and 2.0 etc.

Exercise 06 matrix mul doesn't report error

The exercise 6 code with the kernel deleted doesn't produce an error when the buffer is returned. As no kernel is running the buffer shouldn't have the correct result in!

Python for Exercise 2 missing

There's no Python directory or solution for Exercise 2, should there be?

Python version of Exercise06 still running 20 minute host version

Needs commenting out, as we did with the solution.

Pi serial C code for Exercise 8

Saved as TODO

Python version of Exercise08 much slower than C version

On an Nvidia M2050 and a fast Nehalem host, the C code takes about 0.9s while the Python version takes about 63s. These should ideally take an almost identical amount of time.

Global make file for the C code

Saved as TODO.

C++ OpenCL version of exercise 08

Saved as TODO

Pi C code for Solution 8

Saved as TODO

Need to update top-level README with instructions for building C++

The top level README describes how to build and run the C and Python exercises and solutions, but doesn't mention the C++ ones. Need to add this.

C code for D=A+B+C for solution 5

Saved as TODO

Python solution of Exercise08 prb with C_block_form.cl part

I add no problem to launch most of the example, but the solution of the exercice9 I get this error message :
===== OpenCL, A and B in block form in local memory, order 1024 ======

Traceback (most recent call last):
File "matmul.py", line 187, in
d_a, d_b, d_c, localmem1, localmem2)
File "/usr/local/lib/python2.7/dist-packages/pyopencl/init.py", line 466, in kernel_call
global_offset, wait_for, g_times_l=g_times_l)
pyopencl.LogicError: clEnqueueNDRangeKernel failed: invalid work group size

Change compiler for OS X

Saved as TODO.

You need to add '-stdlib=libc++' to the compiler flags when building for C++11 with clang++, so I guess this should be added to the Makefiles for OS X.
I was building with 'make CPPC=clang++', but I guess you could also have the Makefiles automatically select clang++ if on OS X
Clang doesn't support OpenMP, so problem for Exercise06

C chained vadd for solution 4

Saved as TODO

Can we make it easier to use Mac OSX?

For the Exercises and Solutions, it doesn't take much to get them all compiled and running on a Mac. All we have to do is modify two lines in the Makefiles from something that looks like this:

CCFLAGS=-O3 -lm -std=c99 -ffast-math

LIBS = -fopenmp -lOpenCL

To:

CCFLAGS=-O3 -lm -std=c99 -ffast-math -DAPPLE

LIBS = -fopenmp -framework OpenCL

There are two main ways we could do this:

Use a condition inside the Makefile itself that looks for APPLE
Use a make.def which we modify for each platform.

For previous versions of the course we used 2) with great effect, and I still have make.def files for Nvidia, AMD, Intel and Mac OSX.

Matrix Multiply C code for solution 6

Saved as TODO

No Python host code for Exercise 6

Saved as a TODO

C++ timer not working?

The timer in Cpp_common/util.cpp might not work on some Apple systems. Trying to use the C++ timer program on some Mac OS X laptops can give absurd times (186302452924.23423 seconds for a simple vadd, for example).

Matrix multiply host code is too complicated

Provide a matrix multiply host code to just run a single kernel, rather than multiple kernels along with the serial version.

Tests for Game of Life

Saved as a TODO.

We should include a sanity checking test suite for the game of life - provide some simple inputs and outputs and check the final states. This will be helpful when completing the exercise too.

Python improvements

Suggestions from Andreas Kloeckner, creator of PyOpenCL:

From a brief look at the slides, the only feedback would be that

prg.kernel(...)

reexecutes clCreateKernel() on every launch, so storing a reference to
the kernel may be more efficient. In addition, the issue of having to
cast arguments to numpy types can be alleviated by

http://documen.tician.de/pyopencl/runtime.html#pyopencl.Kernel.set_scalar_arg_dtypes

I'm not suggesting that you include this information (it might well be
that you left it out on purpose), I'm just trying to make sure you're
aware of it. :)

handsonopencl / exercises-solutions Goto Github PK

exercises-solutions's People

Contributors

Stargazers

Watchers

Forkers

exercises-solutions's Issues

Recommend Projects

Recommend Topics

Recommend Org