abduld / libwb Goto Github PK

View Code? Open in Web Editor NEW

88.0 15.0 70.0 385 KB

License: University of Illinois/NCSA Open Source License

CMake 1.10% Makefile 0.81% C 26.01% C++ 72.07%

libwb's Introduction

libWB

libwb's People

Contributors

Stargazers

Watchers

Forkers

iamaaditya watsoncui profbbrown ulidtko barefootlance jbzdak andyras evangeloskatsavrias rtoskit hfoffani sean-dougherty binuwada d3rzky domenipavec ryanzhao oggy bretmckee anthonycpcheung blukee efutch vlsi1217 andre-orr astenmark rishikes ludovicvalet starqiu vodkabuaa udemirezen zonca neven47 yurieco smithkellya richascroesus liubicai jiachl babooppa6 liondance zge raghuna4 fransal lloydchan morbalint sergiopd johshoff manewert dualvim world2005 xyggary asuresh1 sujesha thnguyn2 lukasapaukstys rustiron jmnjmnjmn csullivan joseph-zhong zestrada umagourish sumukha-pk zhengheshangguan bgp2112 xiaolong-yun nicholascw enguang2 jamesl07 c-ssy abezukor suryakiranmg jiadong5 jcstrang-edu

libwb's Issues

Accessing wbTimers with local installation

I'm running a localized installation of libwb. While invocations of wbLog generate stdout output, I don't know how to access the timings collected by the wbTimers (via the macros wbTime_start and wbTime_stop). Guidance would be much appreciated!

Add opencl template and build instructions

Add opencl template for mp13 and visual studio build instructions

-arch not included in nvcc argument list, causes kernels not to launch on older GPUs

I've run into a problem getting code that has been accepted as correct through webGPU to execute locally. It generates completely random and incorrect results for even simple vector addition. It looks like the kernel was not writing anything in the allocated memory for the output vector, and what I got was simply the pre-existing memory content. So I started with a minimal test case, and built from there.

TL;DR:
"-arch=sm_xx" is not included in the nvcc arguments, this prevented my code from launching kernels.

Apparently, GPUs with older architectures will not function when the arch argument is not provided with compilation.

Os: os x 10.8.0
Cuda version: 6.5
GPU: NVIDIA GeForce 8600 GT (compute capability 1.1)

Here is my test case:

#include <stdio.h>
#include <cuda.h>

__global__ void simpleCalc(float *input)
{
  int i = threadIdx.x + blockDim.x * blockIdx.x;
  input[i]++;
}

int wbCheck(cudaError_t err)
{
  do {
    if (err != cudaSuccess) {
      printf("Got CUDA error ... %s", cudaGetErrorString(err));
      return -1;
    }
    return 0;
  } while(0);
}

int main(){
  float testVector[5]={ 2.0, 2.0, 2.0, 2.0, 2.0 };
  float *deviceTestVector;
  float *hostTestVector = ( float * )malloc( 5 * sizeof(float));

  wbCheck(cudaMalloc((void**) &deviceTestVector, 5 * sizeof(float)));
  wbCheck(cudaMemcpy(deviceTestVector, testVector, 5 * sizeof(float), cudaMemcpyHostToDevice));

  simpleCalc<<<1,5>>>(deviceTestVector);
  cudaDeviceSynchronize();

  wbCheck(cudaMemcpy(hostTestVector, deviceTestVector, 5 * sizeof(float), cudaMemcpyDeviceToHost));
  wbCheck(cudaFree(deviceTestVector));
  for( int i=0; i<5; i++) {
    printf("%f", hostTestVector[i]);
  }
  free(hostTestVector);
  return 0;
}

When I compile this as follows:

nvcc -arch=sm_11 cudatest.cu -o cudatest

It's execution output is 3.0 (repeated five times) (expected output).

However, when I compile it without the arch argument:

nvcc cudatest.cu -o cudatest

It's execution output is 2.0 (repeated times five). The kernel did not execute.

Now when I change the code a bit, to work with wblib:

// MP 1
#include <wb.h>
#define wbCheck(stmt) do {                                                    \
        cudaError_t err = stmt;                                               \
        if (err != cudaSuccess) {                                             \
            wbLog(ERROR, "Failed to run stmt ", #stmt);                       \
            wbLog(ERROR, "Got CUDA error ...  ", cudaGetErrorString(err));    \
            return -1;                                                        \
        }                                                                     \
    } while(0)

__global__ void simpleCalc(float *input)
{
  int i = threadIdx.x + blockDim.x * blockIdx.x;
  input[i]++;
}

int main(){
  float testVector[5]={ 2.0, 2.0, 2.0, 2.0, 2.0 };
  float *deviceTestVector;
  float *hostTestVector = ( float * )malloc( 5 * sizeof(float));

  wbCheck(cudaMalloc((void**) &deviceTestVector, 5 * sizeof(float)));
  wbCheck(cudaMemcpy(deviceTestVector, testVector, 5 * sizeof(float), cudaMemcpyHostToDevice));

  simpleCalc<<<1,5>>>(deviceTestVector);
  cudaDeviceSynchronize();

  wbCheck(cudaMemcpy(hostTestVector, deviceTestVector, 5 * sizeof(float), cudaMemcpyDeviceToHost));
  wbCheck(cudaFree(deviceTestVector));
  for( int i=0; i<5; i++) {
    printf("%f", hostTestVector[i]);
  }
  free(hostTestVector);
  return 0;
}

I compiled this with the provided makefiles, and got 2.0 (repeated five times) as execution result.

I wouldn't mind fixing this myself, and submitting a pull request, only I'm not exactly experienced with CMake, and have no idea where to start.

If someone can point me in the right direction, that would be much appreciated. Concretely, where should I implement this check (or better said, where can I find the offending code), and is it a good idea to have cmake compile a short piece of cuda code to output the compute capability for this, or is there an easier way of doing it?

WebGPU isn't working

I'm trying to compile & run but it doesn't work.
Is it because the course is over ?

libwb JSON viewer (wbLogger)

Is there an available JSON parser (javascript) similar to the WebGPU site that can be used to view the output of the wbLogger? I am able to compile and run the WebGPU labs without difficulty, but reading JSON in the terminal from the logger is not pleasant.

Many thanks!

libwb json issue

From: Joe Bungo
Sent: Tuesday, October 18, 2016 1:41 PM
To: 'G. Jan Wilms'
Subject: RE: GPU Teaching Kit QwikLAB Access

Hi Jan,

Can you try upgrading to the latest version of CUDA? This has solved a lot of compatibility issues so far.

Joe Bungo
GPU Educators Program Manager
NVIDIA Corporation | Academic Programs
developer.nvidia.com/educators
Office: +1 (512) 401-4505
Mobile: +1 (512) 293-7324
[email protected]

From: G. Jan Wilms [mailto:[email protected]]
Sent: Tuesday, October 11, 2016 8:17 PM
To: Joe Bungo
Subject: Re: GPU Teaching Kit QwikLAB Access

I have found the support libraries (libwb) to be very finicky, particularly the 3rd party hpp code in the vendor folder. On machines with identical setup (VStudio 2013 Update 5, CudaToolkit 7.5) the wbTime_stop() function works fine on some machines and crashes on others. The following code is the culprit:
json11::Json json = json11::Json::object{
{"type", "timer"},
{"id", wbTimerNode_getId(node)},
{"session_id", wbTimerNode_getSessionId(node)},
{"data", wbTimerNode_toJSONObject(node)}};
std::cout << json.dump() << std::endl;

The assignment of “timer” to type in the json struct is invalid (in some machines) and causes the crash:

I have been able to work around it by commenting out the wbStop function call (or the assignment to type), but it is still puzzling.

Blessings,
-gjw

Putting Chinese in the code will finally get garbled text after running on the WebGPU

The code cu template skel text files are all encoded with ANSI, not UTF-8.
When I wrote some comments in Chinese in the cu files, after running on the WebGPU, I got this

Please fix this, thanks

OSX build link

The link explaining how to modify CMakeLists.txt in the README wont work for the current course session: https://class.coursera.org/hetero-002/forum/thread?thread_id=83

Runtime errors not recorded in attempts tab in web UI?

On the web ui accessed via the coursera class, I was getting occasional errors trying to run the data sets for the MP1 assignment. They seemed tied to server load since rerunning the set fixed the errors. But of the 4 or 5 errors saying either the data set output was not correct or that an exception occurred, none of them were recorded in my attempts tab at all. So I am thinking that perhaps run time errors or exceptions with attempts are not being recorded, because I only see compile errors or successful runs logged. Either that or the server is just acting weird due to current workload.

librt required but not mentioned in CMakeLists.txt

Building libwb fails on my system with the message:

/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: CMakeFiles/MP0.dir/wbTimer.cpp.o: undefined reference to symbol 'clock_gettime@@GLIBC_2.2.5'
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: note: 'clock_gettime@@GLIBC_2.2.5' is defined in DSO /lib64/librt.so.1 so try adding it to the linker command line
/lib64/librt.so.1: could not read symbols: Invalid operation

After adding -lrt to the linker flags the compilation succeeds. I guess this error is due to my glibc version being 2.16.

I suggest adding something along the lines of

include(CheckFunctionExists)
set(CMAKE_EXTRA_INCLUDE_FILES time.h)
CHECK_FUNCTION_EXISTS(clock_gettime HAVE_CLOCK_GETTIME)
if(NOT HAVE_CLOCK_GETTIME)
find_library(LIBRT_LIBRARIES rt)
if(NOT LIBRT_LIBRARIES)
message(FATAL_ERROR "librt not found")
else(NOT LIBRT_LIBRARIES)
set( CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LIBRT_LIBRARIES}" )
endif(NOT LIBRT_LIBRARIES)
endif(NOT HAVE_CLOCK_GETTIME)

to the CMakeLists.txt

Change genRandom() min and max params from int

the genRandom() family of functions takes integer min and max values, which makes it hard to create datasets with fine-grained boundaries.

wbImport doesn't support cudaHostAlloc

wbImport allocates data using malloc and doesn't provide any way to use cudaHostAlloc.
Either provide a switch to enable it or provide an api that allows to read the inputLength without allocating or loading the data. Once inputLength is known I can use cudaHostAlloc for the allocation. I will then need another api that only loads the data and doesn't calculate inputLength.

Print formatting support for wbLog

Please add print formatting support for wbLog. As an alternative show STDOUT and STDERR in the website. It would be nice for some light debugging

The system fails when reading CSV files created on Windows

The issue is with the wbFile_read() function in wbFile.cpp. When Windows-formatted CSV files are used the file has \r\n line endings, the files are being opened in the default text mode, which means they are replaced by \n in the buffer. As a result the size of the res=fread() is less than the count (file length), which kills the function. The following comment and amendment can be applied for a quick fix, though they remove the assertion, so probably it is better to perform binary reads taking into account \r\n EOLs.

/* if (res != count) {
wbLog(ERROR, "Failed to read data from ", wbFile_getFileName(file));
wbDelete(buffer);
return NULL;
}
buffer[bufferlen - 1] = '\0'; // make valid C string
*/

buffer[size * res] = '\0'; // make valid C string

Win64: Division by zero in _MSC_VER macro.

Division by zero in _MSC_VER macro line 23:

return ((uint64_t) counter.LowPart * NANOSEC / _hrtime_frequency) +
       (((uint64_t) counter.HighPart * NANOSEC / _hrtime_frequency) << 32);

Full Call Stack:
MP0.exe!_hrtime() Line 24 + 0xd bytes C++
MP0.exe!wb_init() Line 40 C++
MP0.exe!wbArg_new() Line 10 C++
MP0.exe!wbArg_read(int argc, char * * argv) Line 82 + 0xd bytes C++
MP0.exe!main(int argc, char * * argv) Line 10 + 0x1c bytes C++
MP0.exe!__tmainCRTStartup() Line 555 + 0x19 bytes C
MP0.exe!mainCRTStartup() Line 371 C

OpenCL header causes build fail on OS X Mavericks.

With de0d699, the inclusion of the OpenCL header, builds on Mavericks fail for me with clang complaining about SSE intrinsics.

I haven't had a chance to work out a fix, so I've resorted to commenting it out until I'm done with the current assignment. I will update this when I have a fix, if someone else doesn't get to it first.

I also have a minor patch to allow the cmake script to build for Mavericks as well. Unfortunately I do not have a 10.8, or earlier machine to verify correctness on pre-Mavericks machines.