Comments (16)
Can I just confirm that your request was for Kernel.execute(16) so the
globalSize we pass through JNI does match your request?
It looks like the calculation we do for localSize is failing if localSize is
small (possibly less than 64).
As a work around specify 64 as your globalSize (kernel.execute(64)) and guard
your kernel using
new Kernel(){
public void run(){
if (getGlobalId()<16){
// your code here
}
}
} ;
Apologies for this. Clearly we need some test cases for low range values.
Note that unless your kernel is doing a lot of work (computation + loops) it
is unlikely that a kernel with such a small 'range' will be very performant.
Original comment by [email protected]
on 15 Nov 2011 at 4:20
- Changed state: Accepted
from aparapi.
Nope, my call was Kernel.execute(32, 1).
It looks like the value passed through JNI is correct, but it is changed in
line 1073.
I have tried with sizes 32,64,128,256,512 and all have the same problem, they
show "global=x/2 local=x" in the error message.
Original comment by [email protected]
on 15 Nov 2011 at 4:23
from aparapi.
Oh bugger. How did that ever work ;) Let me take a closer look.
Original comment by [email protected]
on 15 Nov 2011 at 4:29
from aparapi.
So this is the remnants of me attempting to push compute across multiple
devices. I thought I had backed this code out before open sourcing.
My intent was that I would dispatch half the compute to one device and half to
another (your 6990 is seen as two separate GPU devices- you probably knew that
already), but this required the Kernel to be very very careful and allow the
buffers to be divided equally.
I can fix this (i.e make it work), but I suspect that you will be dissapointed
because the fix will mean that only one half of your GPU will be used (and any
other dual device - I have a 5990 which I can test with here, which will
exhibit the same error).
Clearly I have not tested enough with this card.
Original comment by [email protected]
on 15 Nov 2011 at 4:36
from aparapi.
Here is a suggested hack. To get you up and running
Around line #446 in aparapi.cpp
// Get the # of devices
status = clGetDeviceIDs(platforms[i], deviceType, 0, NULL, &deviceIdc);
// now check if this platform supports the requested device type (GPU or CPU)
if (status == CL_SUCCESS && deviceIdc >0 ){
platform = platforms[i];
...
Add
deviceIdc=1;
As the first statement in the conditional. Giving us
// Get the # of devices
status = clGetDeviceIDs(platforms[i], deviceType, 0, NULL, &deviceIdc);
// now check if this platform supports the requested device type (GPU or CPU)
if (status == CL_SUCCESS && deviceIdc >0 ){
deviceIdc=1; // Hack here for issue #18
platform = platforms[i];
...
Hopefully this will get you back up and running. I need to decide whether to
re-enable (and fix ) multiple device support or whether to remove it. This will
need some more work.
Again apologies for this, and also apologies that you are discovering all these
bugs. I do appreciate your help uncovering these.
Gary
Original comment by [email protected]
on 15 Nov 2011 at 4:48
from aparapi.
It is a new project, I do not expect it to be free from bugs.
It is also a strange field working with high-level languages and low-level
execution, so it will take some time for a project like this to mature and
attract users.
Anyway, I am a PhD student, so I actually get paid for trying stuff like this
and finding/fixing errors :)
If you want to fix it, there could be an issue with uneven workloads, say 4
devices and global/local = 5, perhaps just revert to "single unit" or something
in this case. It is also problematic that the data needs to be copied multiple
times, and merging back the results could be a real problem.
I will apply the idc = 1 fix and re-compile the library and test tomorrow.
Thanks for making the project open-source and actually responding to these
reports :)
Original comment by [email protected]
on 15 Nov 2011 at 6:23
from aparapi.
Just to add some confusion. I tested with my 5970 (I mistyped earlier when I
referenced a 5990) it gets detected as two devices. It worked (but was much
slower) when sharing execution across devices. Mandel for example was 20fps
instead of 55fps when I applied the suggested hack above. NBody also slowed
considerably.
This needs a lot of thought, I agree that non balanced workloads will be even
more scary.
Maybe we need to expose the devices. So the user can request multiple devices
if they feel that it will benefit. I really wanted to avoid this.
I note that JOCL has a method which discovers the device with max flops..
Another idea might be to run in both modes (assuming I/we fix the bug ;))
initially and then 'learn' which is most performant. Hmmm
Let me know if the hack above at least works for you.
Gary
Original comment by [email protected]
on 15 Nov 2011 at 6:35
from aparapi.
Revision #110 contains the above hack if you want to try it out.
I guarded the warning behind the -Dcom.amd.aparapi.enableVerboseJNI=true flag
Will keep this open, because (as indicate above) this is not a fix, just a
workaround.
Original comment by [email protected]
on 15 Nov 2011 at 6:46
from aparapi.
The workaround enables Aparapi to run the sample applications, and it is pretty
fast on the AMD based machine, but the NVidia machine is now running slower
than the JAVA version. The strange thing is that the JOCL version is running
fast on both machines.
Original comment by [email protected]
on 16 Nov 2011 at 2:04
from aparapi.
Does the NVidia machine report it's card as multiple devices? Is that why it is
being negatively impacted by this workaround.
If so I guess we could make this 'hack' conditional? i.e only for AMD Devices
if that helps.
Can we also confirm that the NVidia driver is OpenCL 1.1 ?
Original comment by [email protected]
on 16 Nov 2011 at 5:37
from aparapi.
Yep, the NVidia machine reports the same "two devices", it did not work before
the workaround, it gave the exact same error as the AMD machine.
Making the hack optional does not solve the issue, because then we go back to
the original problem.
Yes, it reports OpenCL 1.1.
I will have a go tomorrow to try and figure out why this happens. I can compare
the stuff done by JOCL to what Aparapi does and hopefully guess where it goes
wrong.
Original comment by [email protected]
on 16 Nov 2011 at 6:40
from aparapi.
After running some more tests, I can see that the NVidia machine does in fact
offer a speedup.
On the AMD machine, the speedup obtained through Aparapi and JOCL is pretty
much the same, with JOCL only being slightly faster (~2%).
On the NVidia machine the difference is much larger (~40%). After scaling the
problem to a suitable size, there is a clear performance gain using either
method though. So the hack does work correctly on the NVidia machine as well.
Looking at the generated OpenCL code, there is really no difference from the
hand-generated OpenCL, except that the Aparapi version uses a few local
variables. But this is not really related to the original issue though, and is
likely just some special case where the NVidia kernel is slower.
Original comment by [email protected]
on 25 Nov 2011 at 12:26
from aparapi.
Kenneth I think the recent Range related changes should have fixed this.
Can you confirm for me.
Gary
Original comment by [email protected]
on 23 Feb 2012 at 8:13
from aparapi.
Based on final comment, and the fact that the last activity was over a year
ago, this issue may likely be closed.
Original comment by [email protected]
on 29 Mar 2013 at 11:35
from aparapi.
Yes, I think you can close it.
I do not have access to the machines that exhibited the problem anymore, so I
cannot verify.
Original comment by [email protected]
on 1 Apr 2013 at 11:19
from aparapi.
Original comment by [email protected]
on 20 Apr 2013 at 12:31
- Changed state: WontFix
from aparapi.
Related Issues (20)
- Problem when running with NVIDIA GPUs HOT 8
- Generating OpenCL
- Patch for /trunk/samples/add/src/com/amd/aparapi/sample/add/Main.java
- fatal error when disposing a 2D float execution kernel HOT 2
- Failed to load aparapi native library
- 2D arrays management HOT 2
- FFT Extension example fails to run HOT 1
- Add support for Intel Xeon Phi
- Trouble running samples in lambda branch on Kaveri HOT 6
- Release in the downloads section is old and no guide on how to compile on Mac HOT 1
- High total processing/running time on GPU mode w/Aparapi HOT 2
- Aparapi can't find OpenCL HOT 21
- Mandel Works Fine With The GPU But, When I Run My Code From BlueJ It Doesn't Work HOT 1
- Can i Run Aparapi on "Nvidia Gpu" or "Intel Gpu"? HOT 4
- Missing Sync or Volatile with Aparapi HOT 2
- Dump modified Java bytecode HOT 4
- Please update the tutorial about HSA settings
- Adding Vector data type at Aparapi HOT 1
- OpenCL compile fails... sometimes? (w/ Processing)
- How Learn Aparapi and use it!Any pdf , tutorial and ... HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aparapi.