Comments (26)
Here is profiled result: https://gist.github.com/Kubuxu/da34b3d00e3f7f9a4a18b7117631d583
from silentarmy.
Looks like problem is very slow call to clEnqueueReadBuffer at main.c:785 although I have no idea why.
from silentarmy.
I have tried waiting on events before reads, in reads, and non blocking reads. I don't know what else could help, so far it looks like bug in nvidia opencl driver for Linux. They might be doing some busy polling.
from silentarmy.
Did you install cuda 8.0? Might help.
from silentarmy.
Yes I am using Cuda 8.0.44 from Arch repository.
from silentarmy.
I tried using clEnqueueMapBuffer in read mode to hopefully mitigate this, unfortunately the core still spins at 100%.
from silentarmy.
100% CPU usage on Nvidia is due to busy waiting in their OpenCL implementation. I am going to ship a workaround, based on this solution: https://bitcointalk.org/index.php?topic=181328.0
from silentarmy.
For those who really can't wait for this Nvidia CPU usage fix, see these steps to implement the workaround: https://bitcointalk.org/index.php?topic=1666489.msg16819293#msg16819293
from silentarmy.
I don't know if you've seen my #60 but it works quite well and is a lot less hacky than overwriting arbitrary library function (it is the function that the debugger most commonly breaks at).
from silentarmy.
but it might be better solution, I don't have really time to evaluate them both.
from silentarmy.
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <dlfcn.h>
#include <assert.h>
#include <time.h>
/*
Temprorary fix for silentarmy - nvidia
The MIT License (MIT) Copyright (c) 2016 krnlx, kernelx at me.com
*/
int inited=0;
void *libc = NULL;
int (*libc_clock_gettime)(clockid_t clk_id, struct timespec *tp) = NULL;
static void attribute ((constructor)) lib_init(void) {
if(inited)
return;
libc = dlopen("libc.so.6", RTLD_LAZY);
assert(libc);
libc_clock_gettime = dlsym(libc, "clock_gettime");
assert(libc_clock_gettime);
inited++;
}
useconds_t sleep_time = 100;
//const long INTERVAL_MS = 500 * 100;
//struct timespec sleepValue = {0};
//sleepValue.tv_nsec = INTERVAL_MS;
//nanosleep(&sleepValue, NULL);
int clock_gettime(clockid_t clk_id, struct timespec _tp){
lib_init();
//printf(".");
usleep(sleep_time);
// sleepValue.tv_nsec = INTERVAL_MS;
// nanosleep(&sleepValue, NULL);
// sched_yield();
int r = (_libc_clock_gettime)(clk_id, tp);
return r;
}
from silentarmy.
gcc -O2 -fPIC -shared -Wl,-soname,libtime.so -o libtime.so libtime.c
from silentarmy.
os.environ["LD_PRELOAD"]="./libtime.so"
in python before launch
from silentarmy.
@krnlx: Don't you think that Kubuxu's solution in #60 might be a cleaner/simpler approach? Have you tested yours and his? Any difference in performance?
from silentarmy.
testing now. it works. It seems #60 achieves better performance, need time to test.
from silentarmy.
Kubuxu's solution +1-2% performance, but loads cpu more
2954 krnl 20 0 29.249g 108996 90400 S 7.6 2.7 0:45.65 sa-solver
2958 krnl 20 0 29.249g 109644 90116 R 7.6 2.7 0:45.49 sa-solver
2953 krnl 20 0 29.249g 106936 90172 S 7.3 2.7 0:45.15 sa-solver
2955 krnl 20 0 29.249g 108996 90400 S 7.3 2.7 0:45.53 sa-solver
2957 krnl 20 0 29.249g 108808 90216 S 7.3 2.7 0:45.66 sa-solver
2956 krnl 20 0 29.249g 108764 90172 S 7.0 2.7 0:45.66 sa-solver
My solution:
3514 krnl 20 0 29.251g 108152 90180 S 4.0 2.7 0:01.42 sa-solver
3512 krnl 20 0 29.251g 108132 90152 S 3.7 2.7 0:01.39 sa-solver
3513 krnl 20 0 29.251g 108272 90296 R 3.7 2.7 0:01.44 sa-solver
3515 krnl 20 0 29.251g 106384 90448 S 3.7 2.7 0:01.42 sa-solver
3516 krnl 20 0 29.251g 106152 90220 R 3.7 2.7 0:01.42 sa-solver
3517 krnl 20 0 29.251g 108148 90176 S 3.7 2.7 0:01.42 sa-solver
from silentarmy.
Ok, so a 2x CPU increase with Kubuxu's solution... But still reasonable at 7.5% per core, per process. What model is your CPU?
from silentarmy.
On my Intel® Core™ i5-4200U CPU @ 1.60GHz × 4 the cpu load is roughly the same in both solutions. But the videocard is lagging less when I am scrolling the web with Kubuxu solution. Honestly I don't know what command is krnlx using to measure.
from silentarmy.
I am also getting 7.5% to 8% on the i7-6800k. Check out my suggestion in #60 (comment).
from silentarmy.
@montvid krnlx used top
@Kubuxu Ok. I guess the % of CPU time used is tweakable by adjusting how long we sleep. I like that you measure the average running time. I'll probably merge your fix, unless krnlx has more feedback or ideas.
from silentarmy.
Yes it can be tweaked but at the cost of possibly reducing the performance.
from silentarmy.
This should be fixed by a6c3517
from silentarmy.
I've read that this bug could possibly be worked around by using clWaitForEvents() but I'm not a programmer, so if I'm talking nonsense I promptly apologize.
from silentarmy.
@birdie-github clWaitForEvents() also busy waits.
from silentarmy.
You're right: openmm/openmm#1541
It's appaling that NVIDIA does nothing to resolve this bug. It looks like they really care only about CUDA.
from silentarmy.
Tell me how they care about CUDA? Tromp has a CUDA solver and it has the same problem. Do you know any code that would fix this in CUDA?
from silentarmy.
Related Issues (20)
- Build under MAC HOT 1
- Problems getting started: make test and sa-solver -use 0 fail
- Cannot find libcl.so HOT 3
- Use with Intel Xeon Phi CPU 7250 HOT 1
- Docs for Xeon Phi HOT 2
- Error:pipe closed by peer or os.write(pipe, data) raised exception. HOT 3
- Debian 8 & HD 6970: 0 Sols
- API / Stats
- Add equihash solver 48,5 HOT 6
- Correctly validate hashReserved
- Android running, how to reduce the use of gpu. HOT 2
- Error compiling on Ubuntu HOT 1
- ./sa-solver: error while loading shared libraries: libcl.so: cannot open shared object file: No such file or directory
- When i launch sa-solver i get this errors
- Pool error help wanted
- Doesn't do anything on GTX 670 HOT 7
- Error after reboot
- Updated version
- clCreateCommandQueue (-6) HOT 3
- Stratum server sent us the first job Mining on 1 device HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from silentarmy.