viennacl / viennacl-dev Goto Github PK

Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.

License: Other

C 2.26% C++ 96.01% CMake 0.49% Python 0.08% Shell 0.04% Cuda 0.40% Makefile 0.01% TeX 0.66% MATLAB 0.04%

viennacl-dev's Issues

Create different programs for different paddings and dispatch at runtime

I am not exactly sure on how to do that to be standard compliant (considering different paddings require different use of local memory, etc...). I am thinking about super conservative profile (when no autotuning is used), and modify the autotuning procedure to output the best profile under the constraint local_mem < 16kB, the best profile for padding 32*32, and the best profile. Is it ok?

viennacl::linalg::element_prod() does not work in v1.4.2

Hello Karl, Philippe,

The following example produces compile errors (g++ -o vcl vcl.cpp) with v1.4.2:

#include <viennacl/vector.hpp>
#include <viennacl/linalg/prod.hpp>

int main() {
    viennacl::vector<double> x(100);
    viennacl::vector<double> y(100);
    viennacl::vector<double> z(100);

    x += viennacl::linalg::element_prod(y, z);
}

, and it does compile with 0e76809 or generator_multi-devices branch. If the change is intentional, how can one do the same thing with 1.4.2?

Best regards,
Denis

symbolic_vector and symbolic_matrix ?

In the doc, there is the following example:

// Instantiation of the symbolic variables
symbolic_vector<NumericT, 0> sX;
symbolic_matrix<NumericT, 1> sA;
symbolic_vector<NumericT, 2> sY;
symbolic_vector<NumericT, 3> sZ;

//Creation of the custom operation
custom_operation my_op( sX = prod(sA, inner_prod(sY, sY+sZ) * sY + sZ),
    "operation_name" );

However, I don't see those types anywhere defined in the code. So, is the doc outdated/wrong?

(A bit OT: I want to do something like this. I also asked on SO about it.)

Tune block-ILU setup stage

Copy from GPU to CPU via two format conversions, even though a direct dump is possible. Also, OpenMP acceleration of the factor transposition possible.

Bindings for VexCL

Denis provided a bit of code already. Merging and examples required.

Use of Doxygen 1.8 or above

Allows for much better HTML docs (search box, navigation panel on the left). Allows us to improve accessibility of the documentation and directly refer to HTML pages rather than pointing at page numbers in the PDF. Manual can be included in HTML docs directly, allowing better cross-links.

Current PDF-manual can be extracted from Doxygen then, but presumably requires some extractor-script.

does not build on MacOSX

Scanning dependencies of target matrix_col_int-test-opencl
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f tests/CMakeFiles/matrix_col_int-test-opencl.dir/build.make tests/CMakeFiles/matrix_col_int-test-opencl.dir/build
/usr/local/Cellar/cmake/2.8.12/bin/cmake -E cmake_progress_report /Users/az/Programmierung/viennacl-dev/build/CMakeFiles 44
[ 62%] Building CXX object tests/CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o
cd /Users/az/Programmierung/viennacl-dev/build/tests && /usr/bin/c++    -I/Users/az/Programmierung/viennacl-dev -I/opt/local/include -I/usr/local/include -I/Users/az/Programmierung/viennacl-dev/external -I/Users/az/Programmierung/viennacl-dev/libviennacl/include    -DVIENNACL_WITH_OPENCL -o CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o -c /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:18:
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:31:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/scalar.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/memory.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/mem_handle.hpp:32:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/opencl.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/backend.hpp:26:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/context.hpp:36:
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:232:20: error: 
      member reference base type 'const long' is not a structure or union
        assert(&val.handle().opencl_handle().context() == &handle_.conte...
                ~~~^~~~~~~
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
    (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__, __LINE...
                        ^
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:348:118: note: in
      instantiation of function template specialization
      'viennacl::ocl::kernel::arg<long>' requested here
  ...arg(4, t4); arg(5, t5); arg(6, t6); arg(7, t7); arg(8, t8); arg(9, t9);
                                                                 ^
/Users/az/Programmierung/viennacl-dev/viennacl/linalg/opencl/matrix_operations.hpp:222:32: note: 
      in instantiation of function template specialization
      'viennacl::ocl::kernel::operator()<viennacl::ocl::handle<_cl_mem *>,
      unsigned int, unsigned int, unsigned int, unsigned int, unsigned int,
      unsigned int, unsigned int, unsigned int, long>' requested here
        viennacl::ocl::enqueue(k(viennacl::traits::opencl_handle(mat),
                               ^
/Users/az/Programmierung/viennacl-dev/viennacl/linalg/matrix_operations.hpp:161:11: note: 
      in instantiation of function template specialization
      'viennacl::linalg::opencl::matrix_assign<long, viennacl::column_major>'
      requested here
          viennacl::linalg::opencl::matrix_assign(mat, s, clear);
          ^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:632:9: note: in
      instantiation of function template specialization
      'viennacl::linalg::matrix_assign<long, viennacl::column_major>' requested
      here
        viennacl::linalg::matrix_assign(*this, SCALARTYPE(0), true);
        ^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:261:11: note: in
      instantiation of member function 'viennacl::matrix_base<long,
      viennacl::column_major, unsigned long, long>::clear' requested here
          clear();
          ^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:757:105: note: in
      instantiation of member function 'viennacl::matrix_base<long,
      viennacl::column_major, unsigned long, long>::matrix_base' requested here
  ...columns, viennacl::context ctx = viennacl::context()) : base_type(rows, ...
                                                             ^
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:675:19: note: in
      instantiation of member function 'viennacl::matrix<long,
      viennacl::column_major, 1>::matrix' requested here
    VCLMatrixType vcl_A_full(4 * dim_rows, 4 * dim_cols);
                  ^
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:42:7: note: 
      in instantiation of function template specialization
      'run_test<viennacl::column_major, long>' requested here
  if (run_test<viennacl::column_major, long>(epsilon) != EXIT_SUCCESS)
      ^
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:18:
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:31:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/scalar.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/memory.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/mem_handle.hpp:32:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/opencl.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/backend.hpp:26:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/context.hpp:36:
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:234:26: error: 
      member reference base type 'const long' is not a structure or union
        cl_mem temp = val.handle().opencl_handle().get();
                      ~~~^~~~~~~
2 errors generated.
make[2]: *** [tests/CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o] Error 1
make[1]: *** [tests/CMakeFiles/matrix_col_int-test-opencl.dir/all] Error 2
make: *** [all] Error 2

viennacl::linalg::prod possibly not using context of matrices

Running viennacl-dev on aws cg1x.4xlarge.

When I run the program below with the viennacl::ocl::switch_context(thread_id_); statement it runs fine. When I comment out the statement "viennacl::ocl::switch_context(thread_id_);" I get the following output

0x7f4b70ac2d70
0x7f4b70ac2df0
9,9
terminate called after throwing an instance of 'viennacl::ocl::invalid_mem_object'
what(): ViennaCL: FATAL ERROR: CL_INVALID_MEM_OBJECT.
If you think that this is a bug in ViennaCL, please report it at [email protected] and supply at least the following information:

Operating System
Which OpenCL implementation (AMD, NVIDIA, etc.)
ViennaCL version
Many thanks in advance!
Aborted (core dumped)

It seems like the viennacl::linalg::prod is not picking up the context from the objects.

//
// main.cpp
// NN-dual-gpu-test
//

ifndef VIENNACL_WITH_OPENCL

define VIENNACL_WITH_OPENCL

endif

// include necessary system headers

include

//include basic scalar and vector types of ViennaCL JH

include "viennacl/scalar.hpp"

include "viennacl/vector.hpp"

include "viennacl/matrix.hpp"

include "viennacl/context.hpp"

include "viennacl/linalg/prod.hpp"

include "viennacl/ocl/device.hpp"

include "viennacl/ocl/platform.hpp"

include "viennacl/ocl/backend.hpp"

//include the generic inner product functions of ViennaCL

include "viennacl/linalg/norm_2.hpp"

include

using namespace std;

template
class worker
{
public:
worker(std::size_t tid) : thread_id_(tid) {}

void operator()()
{

    viennacl::context ctx(const_cast<viennacl::ocl::context&>(viennacl::ocl::get_context(static_cast<long>(thread_id_))));

    unsigned int rows = 9;
    unsigned int cols = 9;

    viennacl::matrix<float,viennacl::column_major> tmat(rows, cols,ctx);
    viennacl::matrix<float,viennacl::column_major> amat(rows, cols,ctx);
    viennacl::matrix<float,viennacl::column_major> rmat(rows, cols,ctx);

    for (int ix = 0; ix <rows ; ++ix) {
        for (int iy = 0; iy < cols; ++iy) {
            tmat(ix,iy) = ix+iy;
            amat(ix,iy) = ix+iy;

        }
    }

    //viennacl::ocl::switch_context(thread_id_);

    rmat = viennacl::linalg::prod(tmat,amat);
    viennacl::backend::finish();

    cout << rmat<<endl;

}

std::string message() const { return message_; }

private:
std::string message_;
std::size_t thread_id_;
};

int main()
{
//Change this type definition to double if your gpu supports that
typedef float ScalarType;

if (viennacl::ocl::get_platforms().size() == 0)
{
    std::cerr << "Error: No platform found!" << std::endl;
    return EXIT_FAILURE;
}

//
// Part 1: Setup first device for first context, second device for second context:
//
viennacl::ocl::platform pf = viennacl::ocl::get_platforms()[0];
std::vector<viennacl::ocl::device> const & devices = pf.devices();

// Set first device to first context:
viennacl::ocl::setup_context(0, devices[0]);

// Set second device for second context (use the same device for the second context if only one device available):
if (devices.size() > 1)
    viennacl::ocl::setup_context(1, devices[1]);
else
    viennacl::ocl::setup_context(1, devices[0]);

viennacl::backend::finish();


//cout << devices[0].full_info()<<endl;
cout << devices[0].id()<<endl;

//cout << devices[1].full_info()<<endl;
cout << devices[1].id()<<endl;

//
// Part 2: Now let two threads operate on two GPUs in parallel
//

worker<ScalarType> work_functor0(0);
worker<ScalarType> work_functor1(1);

std::thread worker_thread_0(work_functor0);
std::thread worker_thread_1(work_functor1);

worker_thread_0.join();
worker_thread_1.join();

std::cout << "!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!" << std::endl;

return EXIT_SUCCESS;

}

Fix GEMM for A^T * A, A * A^T, A * A... operation

In this case, A and A^T have different semantics in the kernel, but refer to the same handle and are considered equal by the generator... I am really not sure on how to handle this. Plus, I'm pretty sure A_A^T and A_A can be implemented using a better kernel... Should I just forbid the handle of LHS and RHS to be the same in that case (and in a later version dispatch to different kernels)? I will try to find out a way to handle this, but this problem seems to lay deep down in the generator's structure... I had really not anticipated that the same handle could refer to two different was of accessing memory in the same kernel !

Add BLAS1,2,3 tuning profiles for the Intel MIC platform

Now that the tuning procedures are (supposedly) bug free, it would be nice to have some default profiles to add in the builtin database for the Intel MIC platform, by just runing the provided blas{1,2,3}_tuning targets ( and see if it doesn't crash!)

epsilon too small

[ 73%] Building CXX object tests/CMakeFiles/matrix_vector_int-test-opencl.dir/src/matrix_vector_int.cpp.o
cd /Users/az/Programmierung/viennacl-dev/build/tests && /usr/bin/c++    -I/Users/az/Programmierung/viennacl-dev -I/opt/local/include -I/usr/local/include -I/Users/az/Programmierung/viennacl-dev/external -I/Users/az/Programmierung/viennacl-dev/libviennacl/include    -DVIENNACL_WITH_OPENCL -o CMakeFiles/matrix_vector_int-test-opencl.dir/src/matrix_vector_int.cpp.o -c /Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp:821:29: warning: implicit conversion from
      'double' to 'NumericT' (aka 'long') changes value from 9.999999999999999E-12 to 0 [-Wliteral-conversion]
         NumericT epsilon = 1.0E-11;
                  ~~~~~~~   ^~~~~~~
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp:837:29: warning: implicit conversion from
      'double' to 'NumericT' (aka 'long') changes value from 9.999999999999999E-12 to 0 [-Wliteral-conversion]
         NumericT epsilon = 1.0E-11;
                  ~~~~~~~   ^~~~~~~
2 warnings generated.

This was probably not intended.

Support for integer types

We should support viennacl::vector and the like.
Pitfalls:

Operations on vectors with different type, e.g. adding vector to vector
Result type of norm_2(x) for an integer-vector?
Conversion between host types and OpenCL types (cl_int and int might have different binary format). Fortunately this is already addressed with the new viennacl::backend::typesafe_host_array<> implementation.

Dense Solver with Pivoting

Current LU factorization shows atrocious performance. With fast GEMM for submatrices available we should be able to come up with a portable high-performance implementation at a strikingly high level of abstraction.

2 threads with different contexts

Hello,
My program needs to have two threads and each thread needs to have its own context.
(The cg1.4xlarge instance has dual gpus) At the moment when I run one thread with a context containing a GPU my program runs fine. The program uses custom kernels.

When I run two threads it crashes with
terminate called after throwing an instance of 'viennacl::ocl::invalid_mem_object'
what(): ViennaCL: FATAL ERROR: CL_INVALID_MEM_OBJECT.

which I am pretty sure is because I am not setting up the classes to have completely
separate contexts, programs and queues etc.

Could you please indicate how to best implement the toy program below
which is mainly the multithreads example you provided (which runs fine).

(1) How do I add the program to the context? ctx.opencl_context() is a const. Rather than a
viennacl::context should I use a veinal::ocl::context?
(2) When I viennacl::ocl::enqueue do I have to specify the queue for that device in the context?

I have been getting greats results with viennacl, just having a bit of trouble here.

template
class worker
{
public:
worker(std::size_t tid) : thread_id_(tid) {}

void operator()()
{
    std::size_t N = 6;

    viennacl::context ctx(viennacl::ocl::get_context(static_cast<long>(thread_id_)));
    viennacl::vector<NumericT> u = viennacl::scalar_vector<NumericT>(N, NumericT(1) * NumericT(thread_id_ + 1), ctx);
    viennacl::vector<NumericT> v = viennacl::scalar_vector<NumericT>(N, NumericT(2) * NumericT(thread_id_ + 1), ctx);
    viennacl::matrix<NumericT> A = viennacl::linalg::outer_prod(u, v);
    viennacl::vector<NumericT> x(u);

    u += v;
    NumericT result = viennacl::linalg::norm_2(u);

    std::stringstream ss;
    ss << "Result of thread " << thread_id_ << " on device " << viennacl::ocl::get_context(static_cast<long>(thread_id_)).devices()[0].name() << ": " << result << std::endl;
    ss << "  A: " << A << std::endl;
    ss << "  x: " << x << std::endl;

    message_ = ss.str();
}

std::string message() const { return message_; }

private:
std::string message_;
std::size_t thread_id_;
};

Fix matrix parameter tuner

Due to an internal change in the kernels used, the matrix parameter tuner is currently not working. This issue interacts with the ongoing kernel generator integration, so there may be a more general replacement in 1.5.0

including `viennacl/generator/generate.hpp` produces errors

When I do

#include <viennacl/generator/generate.hpp>

I get the following error:

In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:627:29: error: variable has
      incomplete type 'viennacl::ocl::kernel'
      viennacl::ocl::kernel temp(kernel_handle, *this, *p_context_, kern...
                            ^
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
      'viennacl::ocl::kernel'
    class kernel;
          ^
In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:640:15: error: member access into
      incomplete type 'viennacl::ocl::kernel'
        if (it->name() == name)
              ^
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
      'viennacl::ocl::kernel'
    class kernel;
          ^
In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:650:32: error: incomplete type
      'viennacl::ocl::kernel' named in nested name specifier
    inline void viennacl::ocl::kernel::set_work_size_defaults()
                ~~~~~~~~~~~~~~~^~~~~~~~
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
      'viennacl::ocl::kernel'
    class kernel;
          ^

GMRES solver doesn't work

See http://sourceforge.net/tracker/?func=detail&aid=3574227&group_id=322140&atid=1353700

scalar with no initial value is not allocated?

I just noticed the following code fails with custom kernels.

#include <iostream>

#include <viennacl/ocl/backend.hpp>
#include <viennacl/scalar.hpp>

using namespace std;

int main(){

    try{
        viennacl::scalar<float> a;

        string prog =
                "__kernel void\n"
                "set(__global float *ret)\n"
                "{\n"
                "   *ret = 1;\n"
                "}";

        viennacl::ocl::program& ref = viennacl::ocl::current_context().add_program(prog, "prog");
        ref.add_kernel("set");

        viennacl::ocl::kernel& set = viennacl::ocl::get_kernel("prog", "set");
        viennacl::ocl::enqueue(set(a));

        cout << a << endl;
    } catch (const exception& e) {
        cerr << e.what() << endl;
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

It generates the following error message:

Assertion failed: (val_.get_active_handle_id() != viennacl::MEMORY_NOT_INITIALIZED && bool("Scalar not initialized, cannot read!")), function operator float, file ../../soft/ViennaCL-1.4.2/viennacl/scalar.hpp, line 278.
The program has unexpectedly finished.

Then I noticed that inside scalar.hpp, the empty constructor does not actually allocate memory for the object (although the comment says it does!). Changing the code to the following (and including memory allocation) seems to fix the my problem

/** @brief Allocates the memory for the scalar, but does not set it to zero. */
      scalar()
      {
          viennacl::backend::memory_create(val_, sizeof(SCALARTYPE));
      }  //No initialization yet in order to allow for global variables

I am not sure if there is something I'm doing wrong or this is actually a bug.

`prod` doesn't work with `matrix` and `compressed_matrix`

Code:

// c++ test_prod_sparse.cpp  -std=c++11

#include <viennacl/vector.hpp>
#include <viennacl/matrix.hpp>
#include <viennacl/compressed_matrix.hpp>
#include <viennacl/linalg/prod.hpp>
#include <viennacl/linalg/vector_operations.hpp>
#include <viennacl/linalg/matrix_operations.hpp>
#include <viennacl/scalar.hpp>
#include <viennacl/matrix_proxy.hpp>

int main() {
    using namespace viennacl;
    using namespace viennacl::linalg;

    auto a = matrix<float>(10,10);
    auto b = compressed_matrix<float>(10,10);
    auto v = prod(a, b);

    return 0;
}

Error:

az@azmacbookpro ~/P/N/NN-OCR> c++ test_prod_sparse.cpp  -std=c++11
test_prod_sparse.cpp:18:11: error: no matching function for call to 'prod'
        auto v = prod(a, b);
                 ^~~~
/usr/local/include/viennacl/linalg/prod.hpp:91:5: note: candidate template
      ignored: failed template argument deduction
    prod(std::vector< std::vector<T, A1>, A2 > const & matrix, VectorT c...
    ^
/usr/local/include/viennacl/linalg/prod.hpp:106:5: note: candidate template
      ignored: failed template argument deduction
    prod(std::vector< std::map<KEY, DATA, COMPARE, AMAP>, AVEC > const& ...
    ^
/usr/local/include/viennacl/linalg/prod.hpp:142:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_base<NumericT, F1> const & A,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:158:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_base<NumericT, F1> const & A,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:178:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F1>,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:201:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F1>,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:225:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_base<NumericT, F> const & matrix,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:241:5: note: candidate template
      ignored: failed template argument deduction
    prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F>,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:261:5: note: candidate template
      ignored: failed template argument deduction
    prod(const SparseMatrixType & mat,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:275:5: note: candidate template
      ignored: failed template argument deduction
    prod(const SparseMatrixType & sp_mat,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:292:5: note: candidate template
      ignored: failed template argument deduction
    prod(const SparseMatrixType & A,
    ^
/usr/local/include/viennacl/linalg/prod.hpp:310:5: note: candidate template
      ignored: failed template argument deduction
    prod(const StructuredMatrixType & mat,
    ^
1 error generated.

Improve performance of dense triangular solves

Currently only a single work group is used because synchronizations are required. For matrices above ~1000x1000 it makes sense to use panel-like updates, i.e.

solve a small triangular block on the diagonal using a single work group
batch-update the remaining system using high-performance matrix-vector multiplications

OpenMP: Use signed integers only

OpenMP up to version 2.5 only specifies parallel for-loops for signed integer types. The use of std::size_t is not sufficient here...

Scaled Rank 1 update fails to compile for viennacl::scalar<> scaling factors

The following doesn't compile due to a missing template specialization

#include "viennacl/matrix.hpp"
#include "viennacl/vector.hpp"
int main(){
        viennacl::matrix<double> A(10,10);
    viennacl::vector<double> x(10);
    viennacl::scalar<double> alpha(2);
    A += alpha*viennacl::linalg::outer_prod(x,x);
}

This can be trivially solved by adding in viennacl/tools/tools.hpp:306

template <typename ScalarType, typename T>
struct MATRIX_EXTRACTOR_IMPL<viennacl::matrix_expression<const viennacl::vector_base<ScalarType>, T, op_prod>,
                             viennacl::scalar<ScalarType> >
{
  typedef viennacl::matrix<ScalarType, viennacl::row_major>   ResultType;
};

However, this seems to slightly overlap with the CPU Scalar case, and it is not impossible that several other parts from the code suffer from the same issue. Is this fix reasonable?

ILU-Preconditioners with pivoting

Increase the robustness of ILU preconditioners by (optional?) pivoting. Requested by Christopher Batty in this thread:
https://sourceforge.net/p/viennacl/discussion/1143678/thread/d104427f/

Please comment if you're interested in this feature so that we can prioritize it accordingly.

entry_proxy.hpp: `explicit entry_proxy(unsigned int mem_offset...`

This leads to the warning:

/usr/local/include/viennacl/vector.hpp:700:47: Implicit conversion loses integer precision: 'unsigned long' to 'unsigned int'

size_type should be used instead.

Same thing for explicit const_entry_proxy(unsigned int mem_offset, ...:

/usr/local/include/viennacl/matrix.hpp:565:46: Implicit conversion loses integer precision: 'vcl_size_t' (aka 'unsigned long') to 'unsigned int'

It seems that in many places, std::size_t is used directly instead of ...::size_type (or vcl_size_t). Is that by purpose? If so, it seems odd that vcl_size_t exists at all.

Also, in sparse_matrix_operations.hpp, in prod_impl, is that by purpose?

unsigned int const * coords = detail::extract_raw_pointer<unsigned int>(mat.handle2());

Or should it use vcl_size_t instead?

Use base classes for reduced compilation times

Currently vector<>, vector_range<>, and vector_slice<> are entirely unrelated types. Similarly for matrix<>, matrix_range<>, and matrix_slice<>. To reduce compiler load and thus compilation times, the triplets can be unified in a common base class, e.g. vector_base and matrix_base. This will also help in reducing the necessary operator overloads.

linalg::inner_prod issue

Thanks again for the quick resolution of the one below. This one does not show up as a leak in the profiler but uses an extra 17MB every 10k iterations. OSX 10.9.1 viennacl-dev

d9e55d4

void vcl_inner_prod_MemoryTest() {
    viennacl::vector<float> v1 = viennacl::scalar_vector<float>(42, 42.0f);
    float f;

    for (int ix = 0; ix < 10000000; ix++) {
        f = viennacl::linalg::inner_prod(v1,v1);
        viennacl::backend::finish();
        if (ix % 1000 == 0) cout << "Iter:" << ix << endl;

    }
}

Properly check for self-assignments

Special cases like x = prod(A,x) require additional attention. There are checks in ViennaCL for this case, but they don't provide a unified behavior: Some create a temporary vector (good), others get stuck in asserts() (not so good). Some checks are overly restrictive when ranges are used.

Refactor SPAI

The current SPAI preconditioners should be refactored to support multiple compute backends. Also, a better overlapping of computations on CPU and GPU should be provided.

Add viennacl::max() and min()

We should support viennacl::max(x) and viennacl::min(x) to better support a wider range of algorithms (cf. https://sourceforge.net/p/viennacl/discussion/1143678/thread/d0887e01/). Certainly makes sense for vectors, maybe also for matrices.

Assertion failure when copying empty matrix

Since ViennaCL 1.4.2 (the bug is not here in 1.4.1), the simple code

include <viennacl/matrix.hpp>

int main(){
viennacl::matrix A;
viennacl::matrix B(A);
}

fails with

./viennacl/matrix.hpp:662: void viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::resize(viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type, viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type, bool) [with SCALARTYPE = double; F = viennacl::row_major; SizeType = long unsigned int; DistanceType = long int; viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type = long unsigned int]: Assertion `(rows > 0 && columns > 0) && bool("Check failed in matrix::resize(): Number of rows and columns must be positive!")' failed.

Allow multiple inner products with the same vector in one kernel

Required for some flavors of GMRES.

Sparse triangular solver

We would like to provide good performance operations like x = solve(A, y, upper_tag()); with sparse A properly. The level scheduling implemented for LU certainly helps, but any factorization is missing for a dense direct solver. Ideas from the various sparse packages like UMFPACK or SuperLU can be reused. Latency, however, can be a show-stopper.

Fix issues with COO for strided vectors and/or small matrices

For tiny matrices (e.g. 4x5) there seem to be some problems.

Update pugiXML

Version of pugiXML in ViennaCL 1.4.2 does not build on MacOS, update required:
#43

More sparse matrix formats

In addition to CSR, COO, ELL and HYB we should add at least DIA. Also, further improvements can be done obtained custom (specialized) formats.

Fix the BLAS3 autotuning procedure

While the BLAS3 autotuning procedure works well on the NVidia SDK, it crashes on the Intel MIC as well as on the latest version of the AMD App SDK.

Possible memory leak

The following

void VCLmemoryTest() {

    viennacl::matrix<float,viennacl::column_major> A;
    viennacl::matrix<float,viennacl::column_major> B = viennacl::identity_matrix<float>(1024);


    for (int i = 0; i < 1000000000; i++) {
        A = viennacl::linalg::prod(B, B);
    }
}

uses more and more memory. The output of the profiler is here

Fuse a scheduler with the kernel generator

For operations such as
x = y + z;
x = y - z;
there are currently two separate kernels launched, leading to unnecessary memory transfers. Expression templates are not enough to resolve this, so we need a micro-scheduler for fusing operations and passing them on to a kernel generator facility.

Deal with flat copies of vector_base, matrix_base.

The following code creates surprising results if VectorType is vector_base rather than vector:

VectorType result = rhs;
viennacl::traits::clear(result);

Similar issues may arise with matrix_base. A good way of dealing with this is required: Disallow copy-CTOR of *_base? Always create a deep copy?

Collision of direct solvers and gmres.hpp

Toby noted on viennacl-support (Feb 18, 20:28 MEZ) that the inclusion of direct_solve.hpp and gmres.hpp in the same compilation unit causes ambiguity problems on GCC 4.8 (and possibly others). Needs to be fixed.

Level scheduling for Block-ILU

Since synchronizations are cheap for block-ILU, it makes sense to extend the level scheduling logic from ILU to block-ILU.

Allow products of sparse matrix with a dense matrix

Otherwise there's unnecessary copying. Not too hard to add this.

Merge the kernel generator into the master branch

Unified behavior for copy()

There are a few inconsistencies when using viennacl::copy(). Sometimes an empty object is resized accordingly, sometimes it is not. A single unified behavior is desirable.

Refactor AMG

Is fairly important for users. Need to be partly rewritten for higher efficiency and for supporting multiple compute backends.

Polish and document the generator's code

The generator is merged in the master branch, but the code can still be polished (some base classes to introduce, dirty code to remove ...) and is yet to be documented.

Out-of-the-box support for STL types in iterative solvers

It would be nice for users to pass STL-types to solvers directly, e.g. a std::vector<> and a sparse matrix of type std::vector<std::map<U, T> >. Only needs a bit of wrapper logic with respect to the sparse_matrix_adapter.

Implement the diag() operations

Implement a diag-like matlab operator in the generator. diag(matrix) would return the vector of the diagonal elements, and diag(vector) a symbolic diagonal matrix whose diagonal elements are specified by vector...
Useful for postprocessing an SVD decomposition for example, and compute the inverse/matrix square root of the input.

Support for complex numbers

Quite a number of applications relies on complex arithmetics. The difficulty is the lack of native support for complex_t in OpenCL, thus all operations need to be emulated. Addition and subtraction are easy, but multiplication and divisions are tricky. Emulation of sqrt() and the like is also required.

Low-level BLAS-like C-interface

Allows better integration into other languages such as Python. Also provides a performance-portable BLAS library.

viennacl / viennacl-dev Goto Github PK

viennacl-dev's Issues

ifndef VIENNACL_WITH_OPENCL

define VIENNACL_WITH_OPENCL

endif

include

include "viennacl/scalar.hpp"

include "viennacl/vector.hpp"

include "viennacl/matrix.hpp"

include "viennacl/context.hpp"

include "viennacl/linalg/prod.hpp"

include "viennacl/ocl/device.hpp"

include "viennacl/ocl/platform.hpp"

include "viennacl/ocl/backend.hpp"

include "viennacl/linalg/norm_2.hpp"

include

include <viennacl/matrix.hpp>

Recommend Projects

Recommend Topics

Recommend Org