abduld / libwb Goto Github PK
View Code? Open in Web Editor NEWLicense: University of Illinois/NCSA Open Source License
License: University of Illinois/NCSA Open Source License
I'm running a localized installation of libwb. While invocations of wbLog generate stdout output, I don't know how to access the timings collected by the wbTimers (via the macros wbTime_start and wbTime_stop). Guidance would be much appreciated!
Add opencl template for mp13 and visual studio build instructions
I've run into a problem getting code that has been accepted as correct through webGPU to execute locally. It generates completely random and incorrect results for even simple vector addition. It looks like the kernel was not writing anything in the allocated memory for the output vector, and what I got was simply the pre-existing memory content. So I started with a minimal test case, and built from there.
TL;DR:
"-arch=sm_xx" is not included in the nvcc arguments, this prevented my code from launching kernels.
Apparently, GPUs with older architectures will not function when the arch argument is not provided with compilation.
Os: os x 10.8.0
Cuda version: 6.5
GPU: NVIDIA GeForce 8600 GT (compute capability 1.1)
Here is my test case:
#include <stdio.h>
#include <cuda.h>
__global__ void simpleCalc(float *input)
{
int i = threadIdx.x + blockDim.x * blockIdx.x;
input[i]++;
}
int wbCheck(cudaError_t err)
{
do {
if (err != cudaSuccess) {
printf("Got CUDA error ... %s", cudaGetErrorString(err));
return -1;
}
return 0;
} while(0);
}
int main(){
float testVector[5]={ 2.0, 2.0, 2.0, 2.0, 2.0 };
float *deviceTestVector;
float *hostTestVector = ( float * )malloc( 5 * sizeof(float));
wbCheck(cudaMalloc((void**) &deviceTestVector, 5 * sizeof(float)));
wbCheck(cudaMemcpy(deviceTestVector, testVector, 5 * sizeof(float), cudaMemcpyHostToDevice));
simpleCalc<<<1,5>>>(deviceTestVector);
cudaDeviceSynchronize();
wbCheck(cudaMemcpy(hostTestVector, deviceTestVector, 5 * sizeof(float), cudaMemcpyDeviceToHost));
wbCheck(cudaFree(deviceTestVector));
for( int i=0; i<5; i++) {
printf("%f", hostTestVector[i]);
}
free(hostTestVector);
return 0;
}
When I compile this as follows:
nvcc -arch=sm_11 cudatest.cu -o cudatest
It's execution output is 3.0 (repeated five times) (expected output).
However, when I compile it without the arch argument:
nvcc cudatest.cu -o cudatest
It's execution output is 2.0 (repeated times five). The kernel did not execute.
Now when I change the code a bit, to work with wblib:
// MP 1
#include <wb.h>
#define wbCheck(stmt) do { \
cudaError_t err = stmt; \
if (err != cudaSuccess) { \
wbLog(ERROR, "Failed to run stmt ", #stmt); \
wbLog(ERROR, "Got CUDA error ... ", cudaGetErrorString(err)); \
return -1; \
} \
} while(0)
__global__ void simpleCalc(float *input)
{
int i = threadIdx.x + blockDim.x * blockIdx.x;
input[i]++;
}
int main(){
float testVector[5]={ 2.0, 2.0, 2.0, 2.0, 2.0 };
float *deviceTestVector;
float *hostTestVector = ( float * )malloc( 5 * sizeof(float));
wbCheck(cudaMalloc((void**) &deviceTestVector, 5 * sizeof(float)));
wbCheck(cudaMemcpy(deviceTestVector, testVector, 5 * sizeof(float), cudaMemcpyHostToDevice));
simpleCalc<<<1,5>>>(deviceTestVector);
cudaDeviceSynchronize();
wbCheck(cudaMemcpy(hostTestVector, deviceTestVector, 5 * sizeof(float), cudaMemcpyDeviceToHost));
wbCheck(cudaFree(deviceTestVector));
for( int i=0; i<5; i++) {
printf("%f", hostTestVector[i]);
}
free(hostTestVector);
return 0;
}
I compiled this with the provided makefiles, and got 2.0 (repeated five times) as execution result.
I wouldn't mind fixing this myself, and submitting a pull request, only I'm not exactly experienced with CMake, and have no idea where to start.
If someone can point me in the right direction, that would be much appreciated. Concretely, where should I implement this check (or better said, where can I find the offending code), and is it a good idea to have cmake compile a short piece of cuda code to output the compute capability for this, or is there an easier way of doing it?
I'm trying to compile & run but it doesn't work.
Is it because the course is over ?
Is there an available JSON parser (javascript) similar to the WebGPU site that can be used to view the output of the wbLogger? I am able to compile and run the WebGPU labs without difficulty, but reading JSON in the terminal from the logger is not pleasant.
Many thanks!
From: Joe Bungo
Sent: Tuesday, October 18, 2016 1:41 PM
To: 'G. Jan Wilms'
Subject: RE: GPU Teaching Kit QwikLAB Access
Hi Jan,
Can you try upgrading to the latest version of CUDA? This has solved a lot of compatibility issues so far.
Joe Bungo
GPU Educators Program Manager
NVIDIA Corporation | Academic Programs
developer.nvidia.com/educators
Office: +1 (512) 401-4505
Mobile: +1 (512) 293-7324
[email protected]
From: G. Jan Wilms [mailto:[email protected]]
Sent: Tuesday, October 11, 2016 8:17 PM
To: Joe Bungo
Subject: Re: GPU Teaching Kit QwikLAB Access
I have found the support libraries (libwb) to be very finicky, particularly the 3rd party hpp code in the vendor folder. On machines with identical setup (VStudio 2013 Update 5, CudaToolkit 7.5) the wbTime_stop() function works fine on some machines and crashes on others. The following code is the culprit:
json11::Json json = json11::Json::object{
{"type", "timer"},
{"id", wbTimerNode_getId(node)},
{"session_id", wbTimerNode_getSessionId(node)},
{"data", wbTimerNode_toJSONObject(node)}};
std::cout << json.dump() << std::endl;
The assignment of “timer” to type in the json struct is invalid (in some machines) and causes the crash:
I have been able to work around it by commenting out the wbStop function call (or the assignment to type), but it is still puzzling.
Blessings,
-gjw
The link explaining how to modify CMakeLists.txt in the README wont work for the current course session: https://class.coursera.org/hetero-002/forum/thread?thread_id=83
On the web ui accessed via the coursera class, I was getting occasional errors trying to run the data sets for the MP1 assignment. They seemed tied to server load since rerunning the set fixed the errors. But of the 4 or 5 errors saying either the data set output was not correct or that an exception occurred, none of them were recorded in my attempts tab at all. So I am thinking that perhaps run time errors or exceptions with attempts are not being recorded, because I only see compile errors or successful runs logged. Either that or the server is just acting weird due to current workload.
Building libwb fails on my system with the message:
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: CMakeFiles/MP0.dir/wbTimer.cpp.o: undefined reference to symbol 'clock_gettime@@GLIBC_2.2.5'
/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../x86_64-pc-linux-gnu/bin/ld: note: 'clock_gettime@@GLIBC_2.2.5' is defined in DSO /lib64/librt.so.1 so try adding it to the linker command line
/lib64/librt.so.1: could not read symbols: Invalid operation
After adding -lrt to the linker flags the compilation succeeds. I guess this error is due to my glibc version being 2.16.
I suggest adding something along the lines of
include(CheckFunctionExists)
set(CMAKE_EXTRA_INCLUDE_FILES time.h)
CHECK_FUNCTION_EXISTS(clock_gettime HAVE_CLOCK_GETTIME)
if(NOT HAVE_CLOCK_GETTIME)
find_library(LIBRT_LIBRARIES rt)
if(NOT LIBRT_LIBRARIES)
message(FATAL_ERROR "librt not found")
else(NOT LIBRT_LIBRARIES)
set( CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LIBRT_LIBRARIES}" )
endif(NOT LIBRT_LIBRARIES)
endif(NOT HAVE_CLOCK_GETTIME)
to the CMakeLists.txt
the genRandom() family of functions takes integer min and max values, which makes it hard to create datasets with fine-grained boundaries.
wbImport allocates data using malloc and doesn't provide any way to use cudaHostAlloc.
Either provide a switch to enable it or provide an api that allows to read the inputLength without allocating or loading the data. Once inputLength is known I can use cudaHostAlloc for the allocation. I will then need another api that only loads the data and doesn't calculate inputLength.
Please add print formatting support for wbLog. As an alternative show STDOUT and STDERR in the website. It would be nice for some light debugging
The issue is with the wbFile_read() function in wbFile.cpp. When Windows-formatted CSV files are used the file has \r\n line endings, the files are being opened in the default text mode, which means they are replaced by \n in the buffer. As a result the size of the res=fread() is less than the count (file length), which kills the function. The following comment and amendment can be applied for a quick fix, though they remove the assertion, so probably it is better to perform binary reads taking into account \r\n EOLs.
/* if (res != count) {
wbLog(ERROR, "Failed to read data from ", wbFile_getFileName(file));
wbDelete(buffer);
return NULL;
}
buffer[bufferlen - 1] = '\0'; // make valid C string
*/
buffer[size * res] = '\0'; // make valid C string
Division by zero in _MSC_VER macro line 23:
return ((uint64_t) counter.LowPart * NANOSEC / _hrtime_frequency) +
(((uint64_t) counter.HighPart * NANOSEC / _hrtime_frequency) << 32);
Full Call Stack:
MP0.exe!_hrtime() Line 24 + 0xd bytes C++
MP0.exe!wb_init() Line 40 C++
MP0.exe!wbArg_new() Line 10 C++
MP0.exe!wbArg_read(int argc, char * * argv) Line 82 + 0xd bytes C++
MP0.exe!main(int argc, char * * argv) Line 10 + 0x1c bytes C++
MP0.exe!__tmainCRTStartup() Line 555 + 0x19 bytes C
MP0.exe!mainCRTStartup() Line 371 C
With de0d699, the inclusion of the OpenCL header, builds on Mavericks fail for me with clang complaining about SSE intrinsics.
I haven't had a chance to work out a fix, so I've resorted to commenting it out until I'm done with the current assignment. I will update this when I have a fix, if someone else doesn't get to it first.
I also have a minor patch to allow the cmake script to build for Mavericks as well. Unfortunately I do not have a 10.8, or earlier machine to verify correctness on pre-Mavericks machines.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.