Coder Social home page Coder Social logo

lecture-slides's People

Contributors

simonmcs avatar tomdeakin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lecture-slides's Issues

Pointers from function argument as host data (memory allocation)

Say I want to add two vectors as in example two. In the example, memory space is allocated and host data is created inside the code:

    float*       h_a = (float*) calloc(LENGTH, sizeof(float));       // a vector
    float*       h_b = (float*) calloc(LENGTH, sizeof(float));  // a vector
     int i = 0;
     int count = LENGTH;
     for(i = 0; i < count; i++){
        h_a[i] = rand() / (float)RAND_MAX;
        h_b[i] = rand() / (float)RAND_MAX;
    }

I'm trying to pass vectors a and b from R. They are passed as arguments to the main function as: vadd(double *a, double *b, int *n) (in RI have to change mainfor some other name like vadd). I allocate memory and I dereference the arguments as:

int i = 0;
    int count = LENGTH;
    for(i = 0; i < count; i++){
        h_a[i] = a[i];
        h_b[i] = b[i];
    }

This dereferencing procedure is taking too much time. Is there a way so that these pointers passed as arguments to the function are set as host data directly without the loop? Something like:

h_a = *a; h_b = *b;

Instructions for Xeon Phi

These are currently missing from the slides and we at least need to point people to the right documentation if we can't provide detailed instructions as for the other accelerators.

Some notes on setting up OpenCL

Hi there,

I've spent a couple of hours on setting up OpenCL on my machine, which is a Ubuntu guest running inside Virtualbox on a Mac with an older Core 2 Duo.
The information in the slides was helpful, but I thought I'd post my findings here anyway.

On recent Ubuntu versions (I'm running 13.04), OpenCL headers can be installed with a simple

sudo apt-get install ocl-icd-opencl-dev

You still need a OpenCL driver though and this is where things get complicated. As I'm running in a virtual machine I was looking for a simple CPU driver.

According to the lecture notes and their website http://software.intel.com/en-us/articles/intel-sdk-for-opencl-applications-2013-release-notes#_Installation_Notes, the drivers that ship with Intel current SDK only support relatively new CPUs, including Xeon Core processors supporting SSE 4.2.

AMD's APP, however, does not only support AMD CPUs and GPUs, it can also be used in conjunction with older Intel CPUs. This is what I currently use to run OpenCL on my Intel CPU.

A very good tutorial on how to setup OpenCL can be found here http://mhr3.blogspot.co.uk/2013/06/opencl-on-ubuntu-1304.html, which also includes instructions on how to setup Intel's and AMD's SDK.

A few sample applications to make sure you've set up OpenCL properly can be found on this website: http://wiki.tiker.net/OpenCLHowTo#Testing, under Testing.

Add material on buffer management

Feedback from a user in a large commercial OpenCL-supporting company:

"My 2 cents: i) add a chapter on buffer managements (map and read/write) as part of the opencl intro. This is to include a discussion about the usage and semantic differences between map/unmap and read/write and migrate and ii) discuss the ability to overlap communication with computations via out of order queues or multiple queues in the optimization chapter. This is one of the more important optimizations for OpenCL on discrete accelerators."

Inconsistent Buffer constructor use for result

Slide 35: did we really get the version of cl::Buffer that lets us do:

cl::Buffer d_c(context, CL_MEM_WRITE_ONLY,
sizeof(float)*count);

If we did, do the C++ host examples in later slides consistently use this for write-only device buffers?

Importing both std and cl namespace is a bad idea

The slide "C++ Interface: setting up the host program" recommends using

using namespace cl;
using namespace std;

However, these namespaces have conflicts, particularly size_t and copy. This can lead to some very strange errors (for example, changing a buffer from cl::Buffer to cl::BufferGL causes the std::copy template to become the best match and it tries to treat cl::BufferGL as an output iterator; or declaring a size_t variable leads to errors because it was expecting a template parameter).

Even if the code works, I think that it is useful to make it clear which things are coming from the cl namespace.

Add a slide on using OpenCL on Intel GPUs

We've had feedback from Intel that it would be nice to include a slide on how to use OpenCL on their GPUs. This will be great to include, but the information is not public yet, at least for Linux, so we'll do this once we've got the info from Intel.

We actually could do this for Mac OS X 10.9 "Mavericks" when that's released, as Apple have already announced that this will include OpenCL 1.2 support for integrated Intel graphics (and Nvidia too).

128x128 work-group size is unreasonable

The slide labeled "An N-dimensional domain of work-items" indicates a 128x128 local size - but this is far bigger than most devices will actually support.

More useful feedback

  1. Can we add some instructions on how to use Intel's GPUs with OpenCL? If info is available, could add around slide 5 and 10.

  2. Probably shouldn't mention HSA in an OpenCL tutorial, might be confusing.

  3. Separate discussions of HLM and OpenCL 2.0. Under OpenCL 2.0, make sure we mention “nested parallelism”, SVM and a detailed memory model. Sub-groups are cool too.

Might need to use a different courier font

Feedback from Neil Trevett:

"On my machine Courier Font is very block – Courier New is much better."

We should check what this looks like on a couple of different machines and see what an appropriate fix is so that the Courier font looks good on most machines.

Slide 55: "C++ interface: The vadd host program", Buffer default explanation correct?

On slide 55, "C++ interface: The vadd host program", we explain the following three lines:

d_a = Buffer(begin(h_a), end(h_a), true);
d_b = Buffer(begin(h_b), end(h_b), true);
d_c = Buffer(begin(h_c), end(h_c));

Thusly:

Note: These “trues” stipulate that we want to copy an array on the host (i.e. from a host pointer) into the OpenCL buffer. Without this true parameter the buffer is not copied from the host, and created as uninitialized on the device.
True means READ ONLY from the device’s point of view.

This needs cleaning up as we update the C++ API.

Typo on Slide 87 (Solving Ax = b)

The slide shows LU Decomposition. However the value for a32 (matrix A, 3rd row, 2nd value) should be 1 instead of 2. eg. whole row should be 1,1,4 but is 1,2,4. (1 is the correct result and that's how the matrix actually looks on the previous slide)

Gratuitous use of 'auto' keyword

When the slides create a kernel function object, they say

auto vadd = make_kernel<Buffer, Buffer, Buffer, int>(program, "vadd");

which is an unnecessarily roundabout way to say

make_kernel<Buffer, Buffer, Buffer, int> vadd(program, "vadd");

and also makes it appear that the bindings depend on C++11.

where are the actual slides

this repo is for issue tracking the slides. but where are the slides themselves? I cannot find them anywhere. The website HandsOnOpenCl doesn't explain where to get the slides?

RE: Exercise07, MacOSX and specifying work group size.

From a call to clGetDeviceInfo, I found that the max work-group size for my Intel Core i7 processor is 1024. This should definitely be able to complete the matrix multiply with the given parameters in the solution (64 work elements per work group). However, I get the following error:

Error code was "CL_INVALID_WORK_GROUP_SIZE" (-54)

Some quick googling led me to this link which states that the OpenCL implementation on Mac OS is "a little funky" and that "it is very hard to support CPUs on OSX".

I just changed the device index in the code to select something other than the CPU (on my machine, I have the integrated Intel Iris GPU as well as a dedicated GPU) and the example worked fine. Just a heads up for people who were banging their head against the keyboard like me!

Add version number to slides

It's not clear from the slides what version they are. Should probably add the version number to the title page.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.