Coder Social home page Coder Social logo

Comments (26)

Kubuxu avatar Kubuxu commented on July 2, 2024

Here is profiled result: https://gist.github.com/Kubuxu/da34b3d00e3f7f9a4a18b7117631d583

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

Looks like problem is very slow call to clEnqueueReadBuffer at main.c:785 although I have no idea why.

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

I have tried waiting on events before reads, in reads, and non blocking reads. I don't know what else could help, so far it looks like bug in nvidia opencl driver for Linux. They might be doing some busy polling.

from silentarmy.

montvid avatar montvid commented on July 2, 2024

Did you install cuda 8.0? Might help.

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

Yes I am using Cuda 8.0.44 from Arch repository.

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

I tried using clEnqueueMapBuffer in read mode to hopefully mitigate this, unfortunately the core still spins at 100%.

from silentarmy.

mbevand avatar mbevand commented on July 2, 2024

100% CPU usage on Nvidia is due to busy waiting in their OpenCL implementation. I am going to ship a workaround, based on this solution: https://bitcointalk.org/index.php?topic=181328.0

from silentarmy.

mbevand avatar mbevand commented on July 2, 2024

For those who really can't wait for this Nvidia CPU usage fix, see these steps to implement the workaround: https://bitcointalk.org/index.php?topic=1666489.msg16819293#msg16819293

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

I don't know if you've seen my #60 but it works quite well and is a lot less hacky than overwriting arbitrary library function (it is the function that the debugger most commonly breaks at).

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

but it might be better solution, I don't have really time to evaluate them both.

from silentarmy.

krnlx avatar krnlx commented on July 2, 2024

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <dlfcn.h>
#include <assert.h>
#include <time.h>

/*
Temprorary fix for silentarmy - nvidia
The MIT License (MIT) Copyright (c) 2016 krnlx, kernelx at me.com
*/

int inited=0;

void *libc = NULL;

int (*libc_clock_gettime)(clockid_t clk_id, struct timespec *tp) = NULL;

static void attribute ((constructor)) lib_init(void) {
if(inited)
return;

libc = dlopen("libc.so.6", RTLD_LAZY);
assert(libc);

libc_clock_gettime = dlsym(libc, "clock_gettime");
assert(libc_clock_gettime);

inited++;
}

useconds_t sleep_time = 100;
//const long INTERVAL_MS = 500 * 100;

//struct timespec sleepValue = {0};

//sleepValue.tv_nsec = INTERVAL_MS;
//nanosleep(&sleepValue, NULL);

int clock_gettime(clockid_t clk_id, struct timespec _tp){
lib_init();
//printf(".");
usleep(sleep_time);
// sleepValue.tv_nsec = INTERVAL_MS;
// nanosleep(&sleepValue, NULL);
// sched_yield();
int r = (_libc_clock_gettime)(clk_id, tp);
return r;
}

from silentarmy.

krnlx avatar krnlx commented on July 2, 2024

gcc -O2 -fPIC -shared -Wl,-soname,libtime.so -o libtime.so libtime.c

from silentarmy.

krnlx avatar krnlx commented on July 2, 2024
    os.environ["LD_PRELOAD"]="./libtime.so"

in python before launch

from silentarmy.

mbevand avatar mbevand commented on July 2, 2024

@krnlx: Don't you think that Kubuxu's solution in #60 might be a cleaner/simpler approach? Have you tested yours and his? Any difference in performance?

from silentarmy.

krnlx avatar krnlx commented on July 2, 2024

testing now. it works. It seems #60 achieves better performance, need time to test.

from silentarmy.

krnlx avatar krnlx commented on July 2, 2024

Kubuxu's solution +1-2% performance, but loads cpu more
2954 krnl 20 0 29.249g 108996 90400 S 7.6 2.7 0:45.65 sa-solver
2958 krnl 20 0 29.249g 109644 90116 R 7.6 2.7 0:45.49 sa-solver
2953 krnl 20 0 29.249g 106936 90172 S 7.3 2.7 0:45.15 sa-solver
2955 krnl 20 0 29.249g 108996 90400 S 7.3 2.7 0:45.53 sa-solver
2957 krnl 20 0 29.249g 108808 90216 S 7.3 2.7 0:45.66 sa-solver
2956 krnl 20 0 29.249g 108764 90172 S 7.0 2.7 0:45.66 sa-solver

My solution:

3514 krnl 20 0 29.251g 108152 90180 S 4.0 2.7 0:01.42 sa-solver
3512 krnl 20 0 29.251g 108132 90152 S 3.7 2.7 0:01.39 sa-solver
3513 krnl 20 0 29.251g 108272 90296 R 3.7 2.7 0:01.44 sa-solver
3515 krnl 20 0 29.251g 106384 90448 S 3.7 2.7 0:01.42 sa-solver
3516 krnl 20 0 29.251g 106152 90220 R 3.7 2.7 0:01.42 sa-solver
3517 krnl 20 0 29.251g 108148 90176 S 3.7 2.7 0:01.42 sa-solver

from silentarmy.

mbevand avatar mbevand commented on July 2, 2024

Ok, so a 2x CPU increase with Kubuxu's solution... But still reasonable at 7.5% per core, per process. What model is your CPU?

from silentarmy.

montvid avatar montvid commented on July 2, 2024

On my Intel® Core™ i5-4200U CPU @ 1.60GHz × 4 the cpu load is roughly the same in both solutions. But the videocard is lagging less when I am scrolling the web with Kubuxu solution. Honestly I don't know what command is krnlx using to measure.

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

I am also getting 7.5% to 8% on the i7-6800k. Check out my suggestion in #60 (comment).

from silentarmy.

mbevand avatar mbevand commented on July 2, 2024

@montvid krnlx used top

@Kubuxu Ok. I guess the % of CPU time used is tweakable by adjusting how long we sleep. I like that you measure the average running time. I'll probably merge your fix, unless krnlx has more feedback or ideas.

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

Yes it can be tweaked but at the cost of possibly reducing the performance.

from silentarmy.

mbevand avatar mbevand commented on July 2, 2024

This should be fixed by a6c3517

from silentarmy.

birdie-github avatar birdie-github commented on July 2, 2024

@mbevand

I've read that this bug could possibly be worked around by using clWaitForEvents() but I'm not a programmer, so if I'm talking nonsense I promptly apologize.

from silentarmy.

Kubuxu avatar Kubuxu commented on July 2, 2024

@birdie-github clWaitForEvents() also busy waits.

from silentarmy.

birdie-github avatar birdie-github commented on July 2, 2024

@Kubuxu

You're right: openmm/openmm#1541

It's appaling that NVIDIA does nothing to resolve this bug. It looks like they really care only about CUDA.

from silentarmy.

montvid avatar montvid commented on July 2, 2024

Tell me how they care about CUDA? Tromp has a CUDA solver and it has the same problem. Do you know any code that would fix this in CUDA?

from silentarmy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.