Coder Social home page Coder Social logo

viennacl / viennacl-dev Goto Github PK

View Code? Open in Web Editor NEW
281.0 50.0 89.0 22.65 MB

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.

License: Other

C 2.26% C++ 96.01% CMake 0.49% Python 0.08% Shell 0.04% Cuda 0.40% Makefile 0.01% TeX 0.66% MATLAB 0.04%

viennacl-dev's Introduction

Build Status Developer Repository for ViennaCL

Looking for ViennaCL releases? Visit http://viennacl.sourceforge.net/

This is the developer repository of ViennaCL including the latest features and changes. Feel free to clone the repository, send us your pull requests, or update the Wiki here at github. All contributions are welcome. You might also want to subscribe to our developer mailing list. There are no 'stupid questions', so don't hesitate to get in contact with us.

To build the developer version of ViennaCL, simply clone the repository and issue the following commands (the following steps are for Unix-based systems):

$> cd viennacl-dev
$> mkdir build && cd build
$> cmake ..
$> make

(Feel free to use parallel builds through make -j4, but keep in mind that each build might take up to one GB of RAM)

Follow similar steps on Windows:

  • Launch the CMake-GUI and point the source-directory to viennacl-dev and the build-directory to viennacl-dev/build.
  • Confirm that CMake should create the build-folder for you.
  • Click on Configure and select your compilation environment.
  • Provide any missing paths to Boost and/or OpenCL, or deselect ENABLE_UBLAS and/or ENABLE_OPENCL.
  • Click on Configure again and then on Generate.
  • You will now find the generated project files in the build-folder, which you process with your compiler environment.

(Feedback from developers on Windows on the build process of the developer version are welcome)

System requirements for the developer version:

  • Boost libraries >= 1.45 (feel free to disable BUILD_TESTING and ENABLE_UBLAS in CMake in order to build without Boost)
  • CMake 2.8 or higher (for building the tests and examples)
  • A not-too-ancient C++ compiler

Optional:

  • OpenMP-enabled C++ compiler
  • One or more OpenCL SDKs (1.1 or higher)
  • CUDA toolkit (4.x or higher)
  • Eigen (3.0 or higher)
  • MTL 4

Sending Pull Requests

We strive for high code quality and maintainability. Before sending pull requests, please ensure that your contribution passes the following minimum requirements, which are commonly considered good practice in software engineering:

  • The new code passes all tests when running 'make test'.
  • The new code compiles cleanly at high warning levels (-Wall -pedantic, just enable ENABLE_PEDANTIC_FLAGS within CMake) on at least GCC and/or Clang, ideally also Visual Studio compilers. The more the better, but at least one of the compilers should have been tested.
  • For new functions or classes, please add Doxygen comments to the code. This makes it easier for others to quickly build on top of your code.
  • Don't use tabs. Configure your editor such that it uses two spaces instead.

Thanks! :-)

viennacl-dev's People

Contributors

albertz avatar albertzaharovits avatar bollig avatar cdeterman avatar ddemidov avatar franz-s avatar intelfx avatar josefweinbub avatar karlrupp avatar lvella avatar marty1885 avatar matusi143 avatar naibaf7 avatar psanan avatar psyhtest avatar ptillet avatar robinchrist avatar rombur avatar sghkk avatar shanagr avatar smarthi avatar syst3mw0rm avatar tamuratak avatar tperry-amd avatar tsmithe avatar x-4321 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

viennacl-dev's Issues

Polish and document the generator's code

The generator is merged in the master branch, but the code can still be polished (some base classes to introduce, dirty code to remove ...) and is yet to be documented.

Level scheduling for Block-ILU

Since synchronizations are cheap for block-ILU, it makes sense to extend the level scheduling logic from ILU to block-ILU.

Refactor SPAI

The current SPAI preconditioners should be refactored to support multiple compute backends. Also, a better overlapping of computations on CPU and GPU should be provided.

Use base classes for reduced compilation times

Currently vector<>, vector_range<>, and vector_slice<> are entirely unrelated types. Similarly for matrix<>, matrix_range<>, and matrix_slice<>. To reduce compiler load and thus compilation times, the triplets can be unified in a common base class, e.g. vector_base and matrix_base. This will also help in reducing the necessary operator overloads.

linalg::inner_prod issue

Thanks again for the quick resolution of the one below. This one does not show up as a leak in the profiler but uses an extra 17MB every 10k iterations. OSX 10.9.1 viennacl-dev

d9e55d4

void vcl_inner_prod_MemoryTest() {
    viennacl::vector<float> v1 = viennacl::scalar_vector<float>(42, 42.0f);
    float f;

    for (int ix = 0; ix < 10000000; ix++) {
        f = viennacl::linalg::inner_prod(v1,v1);
        viennacl::backend::finish();
        if (ix % 1000 == 0) cout << "Iter:" << ix << endl;

    }
}

Sparse triangular solver

We would like to provide good performance operations like x = solve(A, y, upper_tag()); with sparse A properly. The level scheduling implemented for LU certainly helps, but any factorization is missing for a dense direct solver. Ideas from the various sparse packages like UMFPACK or SuperLU can be reused. Latency, however, can be a show-stopper.

Collision of direct solvers and gmres.hpp

Toby noted on viennacl-support (Feb 18, 20:28 MEZ) that the inclusion of direct_solve.hpp and gmres.hpp in the same compilation unit causes ambiguity problems on GCC 4.8 (and possibly others). Needs to be fixed.

symbolic_vector and symbolic_matrix ?

In the doc, there is the following example:

// Instantiation of the symbolic variables
symbolic_vector<NumericT, 0> sX;
symbolic_matrix<NumericT, 1> sA;
symbolic_vector<NumericT, 2> sY;
symbolic_vector<NumericT, 3> sZ;

//Creation of the custom operation
custom_operation my_op( sX = prod(sA, inner_prod(sY, sY+sZ) * sY + sZ),
    "operation_name" );

However, I don't see those types anywhere defined in the code. So, is the doc outdated/wrong?

(A bit OT: I want to do something like this. I also asked on SO about it.)

Possible memory leak

The following

void VCLmemoryTest() {

    viennacl::matrix<float,viennacl::column_major> A;
    viennacl::matrix<float,viennacl::column_major> B = viennacl::identity_matrix<float>(1024);


    for (int i = 0; i < 1000000000; i++) {
        A = viennacl::linalg::prod(B, B);
    }
}

uses more and more memory. The output of the profiler is here

screen shot 2014-01-07 at 8 32 12 am

`prod` doesn't work with `matrix` and `compressed_matrix`

Code:

// c++ test_prod_sparse.cpp  -std=c++11

#include <viennacl/vector.hpp>
#include <viennacl/matrix.hpp>
#include <viennacl/compressed_matrix.hpp>
#include <viennacl/linalg/prod.hpp>
#include <viennacl/linalg/vector_operations.hpp>
#include <viennacl/linalg/matrix_operations.hpp>
#include <viennacl/scalar.hpp>
#include <viennacl/matrix_proxy.hpp>

int main() {
    using namespace viennacl;
    using namespace viennacl::linalg;

    auto a = matrix<float>(10,10);
    auto b = compressed_matrix<float>(10,10);
    auto v = prod(a, b);

    return 0;
}

Error:

az@azmacbookpro ~/P/N/NN-OCR> c++ test_prod_sparse.cpp  -std=c++11
test_prod_sparse.cpp:18:11: error: no matching function for call to 'prod'
        auto v = prod(a, b);
                 ^~~~
/usr/local/include/viennacl/linalg/prod.hpp:91:5: note: candidate template
      ignored: failed template argument deduction
    prod(std::vector< std::vector<T, A1>, A2 > const & matrix, VectorT c...
    ^
/usr/local/include/viennacl/linalg/prod.hpp:106:5: note: candidate template
      ignored: failed template argument deduction
    prod(std::vector< std::map<KEY, DATA, COMPARE, AMAP>, AVEC > const& ...
    ^
/usr/local/include/viennacl/linalg/prod.hpp:142:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_base<NumericT, F1> const & A,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:158:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_base<NumericT, F1> const & A,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:178:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F1>,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:201:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F1>,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:225:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_base<NumericT, F> const & matrix,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:241:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F>,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:261:5: note: candidate template
      ignored: failed template argument deduction
    prod(const SparseMatrixType & mat,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:275:5: note: candidate template
      ignored: failed template argument deduction
    prod(const SparseMatrixType & sp_mat,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:292:5: note: candidate template
      ignored: failed template argument deduction
    prod(const SparseMatrixType & A,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:310:5: note: candidate template
      ignored: failed template argument deduction
    prod(const StructuredMatrixType & mat,
    ^
1 error generated.

Implement the diag() operations

Implement a diag-like matlab operator in the generator. diag(matrix) would return the vector of the diagonal elements, and diag(vector) a symbolic diagonal matrix whose diagonal elements are specified by vector...
Useful for postprocessing an SVD decomposition for example, and compute the inverse/matrix square root of the input.

Fix the BLAS3 autotuning procedure

While the BLAS3 autotuning procedure works well on the NVidia SDK, it crashes on the Intel MIC as well as on the latest version of the AMD App SDK.

Fuse a scheduler with the kernel generator

For operations such as
x = y + z;
x = y - z;
there are currently two separate kernels launched, leading to unnecessary memory transfers. Expression templates are not enough to resolve this, so we need a micro-scheduler for fusing operations and passing them on to a kernel generator facility.

including `viennacl/generator/generate.hpp` produces errors

When I do

#include <viennacl/generator/generate.hpp>

I get the following error:

In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:627:29: error: variable has
      incomplete type 'viennacl::ocl::kernel'
      viennacl::ocl::kernel temp(kernel_handle, *this, *p_context_, kern...
                            ^
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
      'viennacl::ocl::kernel'
    class kernel;
          ^
In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:640:15: error: member access into
      incomplete type 'viennacl::ocl::kernel'
        if (it->name() == name)
              ^
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
      'viennacl::ocl::kernel'
    class kernel;
          ^
In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:650:32: error: incomplete type
      'viennacl::ocl::kernel' named in nested name specifier
    inline void viennacl::ocl::kernel::set_work_size_defaults()
                ~~~~~~~~~~~~~~~^~~~~~~~
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
      'viennacl::ocl::kernel'
    class kernel;
          ^

Update pugiXML

Version of pugiXML in ViennaCL 1.4.2 does not build on MacOS, update required:
#43

Tune block-ILU setup stage

Copy from GPU to CPU via two format conversions, even though a direct dump is possible. Also, OpenMP acceleration of the factor transposition possible.

Properly check for self-assignments

Special cases like x = prod(A,x) require additional attention. There are checks in ViennaCL for this case, but they don't provide a unified behavior: Some create a temporary vector (good), others get stuck in asserts() (not so good). Some checks are overly restrictive when ranges are used.

Fix matrix parameter tuner

Due to an internal change in the kernels used, the matrix parameter tuner is currently not working. This issue interacts with the ongoing kernel generator integration, so there may be a more general replacement in 1.5.0

Assertion failure when copying empty matrix

Since ViennaCL 1.4.2 (the bug is not here in 1.4.1), the simple code

include <viennacl/matrix.hpp>

int main(){
viennacl::matrix A;
viennacl::matrix B(A);
}

fails with

./viennacl/matrix.hpp:662: void viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::resize(viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type, viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type, bool) [with SCALARTYPE = double; F = viennacl::row_major; SizeType = long unsigned int; DistanceType = long int; viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type = long unsigned int]: Assertion `(rows > 0 && columns > 0) && bool("Check failed in matrix::resize(): Number of rows and columns must be positive!")' failed.

Add BLAS1,2,3 tuning profiles for the Intel MIC platform

Now that the tuning procedures are (supposedly) bug free, it would be nice to have some default profiles to add in the builtin database for the Intel MIC platform, by just runing the provided blas{1,2,3}_tuning targets ( and see if it doesn't crash!)

Dense Solver with Pivoting

Current LU factorization shows atrocious performance. With fast GEMM for submatrices available we should be able to come up with a portable high-performance implementation at a strikingly high level of abstraction.

OpenMP: Use signed integers only

OpenMP up to version 2.5 only specifies parallel for-loops for signed integer types. The use of std::size_t is not sufficient here...

Fix GEMM for A^T * A, A * A^T, A * A... operation

In this case, A and A^T have different semantics in the kernel, but refer to the same handle and are considered equal by the generator... I am really not sure on how to handle this. Plus, I'm pretty sure A_A^T and A_A can be implemented using a better kernel... Should I just forbid the handle of LHS and RHS to be the same in that case (and in a later version dispatch to different kernels)? I will try to find out a way to handle this, but this problem seems to lay deep down in the generator's structure... I had really not anticipated that the same handle could refer to two different was of accessing memory in the same kernel !

entry_proxy.hpp: `explicit entry_proxy(unsigned int mem_offset...`

This leads to the warning:

/usr/local/include/viennacl/vector.hpp:700:47: Implicit conversion loses integer precision: 'unsigned long' to 'unsigned int'

size_type should be used instead.

Same thing for explicit const_entry_proxy(unsigned int mem_offset, ...:

/usr/local/include/viennacl/matrix.hpp:565:46: Implicit conversion loses integer precision: 'vcl_size_t' (aka 'unsigned long') to 'unsigned int'

It seems that in many places, std::size_t is used directly instead of ...::size_type (or vcl_size_t). Is that by purpose? If so, it seems odd that vcl_size_t exists at all.

Also, in sparse_matrix_operations.hpp, in prod_impl, is that by purpose?

unsigned int const * coords = detail::extract_raw_pointer<unsigned int>(mat.handle2());

Or should it use vcl_size_t instead?

2 threads with different contexts

Hello,
My program needs to have two threads and each thread needs to have its own context.
(The cg1.4xlarge instance has dual gpus) At the moment when I run one thread with a context containing a GPU my program runs fine. The program uses custom kernels.

When I run two threads it crashes with
terminate called after throwing an instance of 'viennacl::ocl::invalid_mem_object'
what(): ViennaCL: FATAL ERROR: CL_INVALID_MEM_OBJECT.

which I am pretty sure is because I am not setting up the classes to have completely
separate contexts, programs and queues etc.

Could you please indicate how to best implement the toy program below
which is mainly the multithreads example you provided (which runs fine).

(1) How do I add the program to the context? ctx.opencl_context() is a const. Rather than a
viennacl::context should I use a veinal::ocl::context?
(2) When I viennacl::ocl::enqueue do I have to specify the queue for that device in the context?

I have been getting greats results with viennacl, just having a bit of trouble here.

template
class worker
{
public:
worker(std::size_t tid) : thread_id_(tid) {}

void operator()()
{
    std::size_t N = 6;

    viennacl::context ctx(viennacl::ocl::get_context(static_cast<long>(thread_id_)));
    viennacl::vector<NumericT> u = viennacl::scalar_vector<NumericT>(N, NumericT(1) * NumericT(thread_id_ + 1), ctx);
    viennacl::vector<NumericT> v = viennacl::scalar_vector<NumericT>(N, NumericT(2) * NumericT(thread_id_ + 1), ctx);
    viennacl::matrix<NumericT> A = viennacl::linalg::outer_prod(u, v);
    viennacl::vector<NumericT> x(u);

    u += v;
    NumericT result = viennacl::linalg::norm_2(u);

    std::stringstream ss;
    ss << "Result of thread " << thread_id_ << " on device " << viennacl::ocl::get_context(static_cast<long>(thread_id_)).devices()[0].name() << ": " << result << std::endl;
    ss << "  A: " << A << std::endl;
    ss << "  x: " << x << std::endl;

    message_ = ss.str();
}

std::string message() const { return message_; }

private:
std::string message_;
std::size_t thread_id_;
};

does not build on MacOSX

Scanning dependencies of target matrix_col_int-test-opencl
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f tests/CMakeFiles/matrix_col_int-test-opencl.dir/build.make tests/CMakeFiles/matrix_col_int-test-opencl.dir/build
/usr/local/Cellar/cmake/2.8.12/bin/cmake -E cmake_progress_report /Users/az/Programmierung/viennacl-dev/build/CMakeFiles 44
[ 62%] Building CXX object tests/CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o
cd /Users/az/Programmierung/viennacl-dev/build/tests && /usr/bin/c++    -I/Users/az/Programmierung/viennacl-dev -I/opt/local/include -I/usr/local/include -I/Users/az/Programmierung/viennacl-dev/external -I/Users/az/Programmierung/viennacl-dev/libviennacl/include    -DVIENNACL_WITH_OPENCL -o CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o -c /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:18:
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:31:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/scalar.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/memory.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/mem_handle.hpp:32:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/opencl.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/backend.hpp:26:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/context.hpp:36:
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:232:20: error: 
      member reference base type 'const long' is not a structure or union
        assert(&val.handle().opencl_handle().context() == &handle_.conte...
                ~~~^~~~~~~
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
    (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__, __LINE...
                        ^
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:348:118: note: in
      instantiation of function template specialization
      'viennacl::ocl::kernel::arg<long>' requested here
  ...arg(4, t4); arg(5, t5); arg(6, t6); arg(7, t7); arg(8, t8); arg(9, t9);
                                                                 ^
/Users/az/Programmierung/viennacl-dev/viennacl/linalg/opencl/matrix_operations.hpp:222:32: note: 
      in instantiation of function template specialization
      'viennacl::ocl::kernel::operator()<viennacl::ocl::handle<_cl_mem *>,
      unsigned int, unsigned int, unsigned int, unsigned int, unsigned int,
      unsigned int, unsigned int, unsigned int, long>' requested here
        viennacl::ocl::enqueue(k(viennacl::traits::opencl_handle(mat),
                               ^
/Users/az/Programmierung/viennacl-dev/viennacl/linalg/matrix_operations.hpp:161:11: note: 
      in instantiation of function template specialization
      'viennacl::linalg::opencl::matrix_assign<long, viennacl::column_major>'
      requested here
          viennacl::linalg::opencl::matrix_assign(mat, s, clear);
          ^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:632:9: note: in
      instantiation of function template specialization
      'viennacl::linalg::matrix_assign<long, viennacl::column_major>' requested
      here
        viennacl::linalg::matrix_assign(*this, SCALARTYPE(0), true);
        ^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:261:11: note: in
      instantiation of member function 'viennacl::matrix_base<long,
      viennacl::column_major, unsigned long, long>::clear' requested here
          clear();
          ^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:757:105: note: in
      instantiation of member function 'viennacl::matrix_base<long,
      viennacl::column_major, unsigned long, long>::matrix_base' requested here
  ...columns, viennacl::context ctx = viennacl::context()) : base_type(rows, ...
                                                             ^
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:675:19: note: in
      instantiation of member function 'viennacl::matrix<long,
      viennacl::column_major, 1>::matrix' requested here
    VCLMatrixType vcl_A_full(4 * dim_rows, 4 * dim_cols);
                  ^
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:42:7: note: 
      in instantiation of function template specialization
      'run_test<viennacl::column_major, long>' requested here
  if (run_test<viennacl::column_major, long>(epsilon) != EXIT_SUCCESS)
      ^
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:18:
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:31:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/scalar.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/memory.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/mem_handle.hpp:32:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/opencl.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/backend.hpp:26:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/context.hpp:36:
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:234:26: error: 
      member reference base type 'const long' is not a structure or union
        cl_mem temp = val.handle().opencl_handle().get();
                      ~~~^~~~~~~
2 errors generated.
make[2]: *** [tests/CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o] Error 1
make[1]: *** [tests/CMakeFiles/matrix_col_int-test-opencl.dir/all] Error 2
make: *** [all] Error 2

Refactor AMG

Is fairly important for users. Need to be partly rewritten for higher efficiency and for supporting multiple compute backends.

Unified behavior for copy()

There are a few inconsistencies when using viennacl::copy(). Sometimes an empty object is resized accordingly, sometimes it is not. A single unified behavior is desirable.

Use of Doxygen 1.8 or above

Allows for much better HTML docs (search box, navigation panel on the left). Allows us to improve accessibility of the documentation and directly refer to HTML pages rather than pointing at page numbers in the PDF. Manual can be included in HTML docs directly, allowing better cross-links.

Current PDF-manual can be extracted from Doxygen then, but presumably requires some extractor-script.

Support for integer types

We should support viennacl::vector and the like.
Pitfalls:

  • Operations on vectors with different type, e.g. adding vector to vector
  • Result type of norm_2(x) for an integer-vector?
  • Conversion between host types and OpenCL types (cl_int and int might have different binary format). Fortunately this is already addressed with the new viennacl::backend::typesafe_host_array<> implementation.

Support for complex numbers

Quite a number of applications relies on complex arithmetics. The difficulty is the lack of native support for complex_t in OpenCL, thus all operations need to be emulated. Addition and subtraction are easy, but multiplication and divisions are tricky. Emulation of sqrt() and the like is also required.

Scaled Rank 1 update fails to compile for viennacl::scalar<> scaling factors

The following doesn't compile due to a missing template specialization

#include "viennacl/matrix.hpp"
#include "viennacl/vector.hpp"
int main(){
        viennacl::matrix<double> A(10,10);
    viennacl::vector<double> x(10);
    viennacl::scalar<double> alpha(2);
    A += alpha*viennacl::linalg::outer_prod(x,x);
}

This can be trivially solved by adding in viennacl/tools/tools.hpp:306

template <typename ScalarType, typename T>
struct MATRIX_EXTRACTOR_IMPL<viennacl::matrix_expression<const viennacl::vector_base<ScalarType>, T, op_prod>,
                             viennacl::scalar<ScalarType> >
{
  typedef viennacl::matrix<ScalarType, viennacl::row_major>   ResultType;
};

However, this seems to slightly overlap with the CPU Scalar case, and it is not impossible that several other parts from the code suffer from the same issue. Is this fix reasonable?

epsilon too small

[ 73%] Building CXX object tests/CMakeFiles/matrix_vector_int-test-opencl.dir/src/matrix_vector_int.cpp.o
cd /Users/az/Programmierung/viennacl-dev/build/tests && /usr/bin/c++    -I/Users/az/Programmierung/viennacl-dev -I/opt/local/include -I/usr/local/include -I/Users/az/Programmierung/viennacl-dev/external -I/Users/az/Programmierung/viennacl-dev/libviennacl/include    -DVIENNACL_WITH_OPENCL -o CMakeFiles/matrix_vector_int-test-opencl.dir/src/matrix_vector_int.cpp.o -c /Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp:821:29: warning: implicit conversion from
      'double' to 'NumericT' (aka 'long') changes value from 9.999999999999999E-12 to 0 [-Wliteral-conversion]
         NumericT epsilon = 1.0E-11;
                  ~~~~~~~   ^~~~~~~
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp:837:29: warning: implicit conversion from
      'double' to 'NumericT' (aka 'long') changes value from 9.999999999999999E-12 to 0 [-Wliteral-conversion]
         NumericT epsilon = 1.0E-11;
                  ~~~~~~~   ^~~~~~~
2 warnings generated.

This was probably not intended.

viennacl::linalg::element_prod() does not work in v1.4.2

Hello Karl, Philippe,

The following example produces compile errors (g++ -o vcl vcl.cpp) with v1.4.2:

#include <viennacl/vector.hpp>
#include <viennacl/linalg/prod.hpp>

int main() {
    viennacl::vector<double> x(100);
    viennacl::vector<double> y(100);
    viennacl::vector<double> z(100);

    x += viennacl::linalg::element_prod(y, z);
}

, and it does compile with 0e76809 or generator_multi-devices branch. If the change is intentional, how can one do the same thing with 1.4.2?

Best regards,
Denis

viennacl::linalg::prod possibly not using context of matrices

Running viennacl-dev on aws cg1x.4xlarge.

When I run the program below with the viennacl::ocl::switch_context(thread_id_); statement it runs fine. When I comment out the statement "viennacl::ocl::switch_context(thread_id_);" I get the following output

0x7f4b70ac2d70
0x7f4b70ac2df0
9,9
terminate called after throwing an instance of 'viennacl::ocl::invalid_mem_object'
what(): ViennaCL: FATAL ERROR: CL_INVALID_MEM_OBJECT.
If you think that this is a bug in ViennaCL, please report it at [email protected] and supply at least the following information:

  • Operating System
  • Which OpenCL implementation (AMD, NVIDIA, etc.)
  • ViennaCL version
    Many thanks in advance!
    Aborted (core dumped)

It seems like the viennacl::linalg::prod is not picking up the context from the objects.

//
// main.cpp
// NN-dual-gpu-test
//

ifndef VIENNACL_WITH_OPENCL

define VIENNACL_WITH_OPENCL

endif

// include necessary system headers

include

//include basic scalar and vector types of ViennaCL JH

include "viennacl/scalar.hpp"

include "viennacl/vector.hpp"

include "viennacl/matrix.hpp"

include "viennacl/context.hpp"

include "viennacl/linalg/prod.hpp"

include "viennacl/ocl/device.hpp"

include "viennacl/ocl/platform.hpp"

include "viennacl/ocl/backend.hpp"

//include the generic inner product functions of ViennaCL

include "viennacl/linalg/norm_2.hpp"

include

using namespace std;

template
class worker
{
public:
worker(std::size_t tid) : thread_id_(tid) {}

void operator()()
{

    viennacl::context ctx(const_cast<viennacl::ocl::context&>(viennacl::ocl::get_context(static_cast<long>(thread_id_))));

    unsigned int rows = 9;
    unsigned int cols = 9;

    viennacl::matrix<float,viennacl::column_major> tmat(rows, cols,ctx);
    viennacl::matrix<float,viennacl::column_major> amat(rows, cols,ctx);
    viennacl::matrix<float,viennacl::column_major> rmat(rows, cols,ctx);

    for (int ix = 0; ix <rows ; ++ix) {
        for (int iy = 0; iy < cols; ++iy) {
            tmat(ix,iy) = ix+iy;
            amat(ix,iy) = ix+iy;

        }
    }

    //viennacl::ocl::switch_context(thread_id_);

    rmat = viennacl::linalg::prod(tmat,amat);
    viennacl::backend::finish();

    cout << rmat<<endl;

}

std::string message() const { return message_; }

private:
std::string message_;
std::size_t thread_id_;
};

int main()
{
//Change this type definition to double if your gpu supports that
typedef float ScalarType;

if (viennacl::ocl::get_platforms().size() == 0)
{
    std::cerr << "Error: No platform found!" << std::endl;
    return EXIT_FAILURE;
}

//
// Part 1: Setup first device for first context, second device for second context:
//
viennacl::ocl::platform pf = viennacl::ocl::get_platforms()[0];
std::vector<viennacl::ocl::device> const & devices = pf.devices();

// Set first device to first context:
viennacl::ocl::setup_context(0, devices[0]);

// Set second device for second context (use the same device for the second context if only one device available):
if (devices.size() > 1)
    viennacl::ocl::setup_context(1, devices[1]);
else
    viennacl::ocl::setup_context(1, devices[0]);

viennacl::backend::finish();


//cout << devices[0].full_info()<<endl;
cout << devices[0].id()<<endl;

//cout << devices[1].full_info()<<endl;
cout << devices[1].id()<<endl;

//
// Part 2: Now let two threads operate on two GPUs in parallel
//

worker<ScalarType> work_functor0(0);
worker<ScalarType> work_functor1(1);

std::thread worker_thread_0(work_functor0);
std::thread worker_thread_1(work_functor1);

worker_thread_0.join();
worker_thread_1.join();

std::cout << "!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!" << std::endl;

return EXIT_SUCCESS;

}

Create different programs for different paddings and dispatch at runtime

I am not exactly sure on how to do that to be standard compliant (considering different paddings require different use of local memory, etc...). I am thinking about super conservative profile (when no autotuning is used), and modify the autotuning procedure to output the best profile under the constraint local_mem < 16kB, the best profile for padding 32*32, and the best profile. Is it ok?

scalar with no initial value is not allocated?

I just noticed the following code fails with custom kernels.

#include <iostream>

#include <viennacl/ocl/backend.hpp>
#include <viennacl/scalar.hpp>

using namespace std;

int main(){

    try{
        viennacl::scalar<float> a;

        string prog =
                "__kernel void\n"
                "set(__global float *ret)\n"
                "{\n"
                "   *ret = 1;\n"
                "}";

        viennacl::ocl::program& ref = viennacl::ocl::current_context().add_program(prog, "prog");
        ref.add_kernel("set");

        viennacl::ocl::kernel& set = viennacl::ocl::get_kernel("prog", "set");
        viennacl::ocl::enqueue(set(a));

        cout << a << endl;
    } catch (const exception& e) {
        cerr << e.what() << endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

It generates the following error message:

Assertion failed: (val_.get_active_handle_id() != viennacl::MEMORY_NOT_INITIALIZED && bool("Scalar not initialized, cannot read!")), function operator float, file ../../soft/ViennaCL-1.4.2/viennacl/scalar.hpp, line 278.
The program has unexpectedly finished.

Then I noticed that inside scalar.hpp, the empty constructor does not actually allocate memory for the object (although the comment says it does!). Changing the code to the following (and including memory allocation) seems to fix the my problem

/** @brief Allocates the memory for the scalar, but does not set it to zero. */
      scalar()
      {
          viennacl::backend::memory_create(val_, sizeof(SCALARTYPE));
      }  //No initialization yet in order to allow for global variables

I am not sure if there is something I'm doing wrong or this is actually a bug.

More sparse matrix formats

In addition to CSR, COO, ELL and HYB we should add at least DIA. Also, further improvements can be done obtained custom (specialized) formats.

Deal with flat copies of vector_base, matrix_base.

The following code creates surprising results if VectorType is vector_base rather than vector:

VectorType result = rhs;
viennacl::traits::clear(result);

Similar issues may arise with matrix_base. A good way of dealing with this is required: Disallow copy-CTOR of *_base? Always create a deep copy?

Improve performance of dense triangular solves

Currently only a single work group is used because synchronizations are required. For matrices above ~1000x1000 it makes sense to use panel-like updates, i.e.

  • solve a small triangular block on the diagonal using a single work group
  • batch-update the remaining system using high-performance matrix-vector multiplications

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.