viennacl / viennacl-dev Goto Github PK
View Code? Open in Web Editor NEWDeveloper repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.
License: Other
Developer repository for ViennaCL. Visit http://viennacl.sourceforge.net/ for the latest releases.
License: Other
I am not exactly sure on how to do that to be standard compliant (considering different paddings require different use of local memory, etc...). I am thinking about super conservative profile (when no autotuning is used), and modify the autotuning procedure to output the best profile under the constraint local_mem < 16kB, the best profile for padding 32*32, and the best profile. Is it ok?
Hello Karl, Philippe,
The following example produces compile errors (g++ -o vcl vcl.cpp
) with v1.4.2:
#include <viennacl/vector.hpp>
#include <viennacl/linalg/prod.hpp>
int main() {
viennacl::vector<double> x(100);
viennacl::vector<double> y(100);
viennacl::vector<double> z(100);
x += viennacl::linalg::element_prod(y, z);
}
, and it does compile with 0e76809 or generator_multi-devices branch. If the change is intentional, how can one do the same thing with 1.4.2?
Best regards,
Denis
In the doc, there is the following example:
// Instantiation of the symbolic variables
symbolic_vector<NumericT, 0> sX;
symbolic_matrix<NumericT, 1> sA;
symbolic_vector<NumericT, 2> sY;
symbolic_vector<NumericT, 3> sZ;
//Creation of the custom operation
custom_operation my_op( sX = prod(sA, inner_prod(sY, sY+sZ) * sY + sZ),
"operation_name" );
However, I don't see those types anywhere defined in the code. So, is the doc outdated/wrong?
(A bit OT: I want to do something like this. I also asked on SO about it.)
Copy from GPU to CPU via two format conversions, even though a direct dump is possible. Also, OpenMP acceleration of the factor transposition possible.
Denis provided a bit of code already. Merging and examples required.
Allows for much better HTML docs (search box, navigation panel on the left). Allows us to improve accessibility of the documentation and directly refer to HTML pages rather than pointing at page numbers in the PDF. Manual can be included in HTML docs directly, allowing better cross-links.
Current PDF-manual can be extracted from Doxygen then, but presumably requires some extractor-script.
Scanning dependencies of target matrix_col_int-test-opencl
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f tests/CMakeFiles/matrix_col_int-test-opencl.dir/build.make tests/CMakeFiles/matrix_col_int-test-opencl.dir/build
/usr/local/Cellar/cmake/2.8.12/bin/cmake -E cmake_progress_report /Users/az/Programmierung/viennacl-dev/build/CMakeFiles 44
[ 62%] Building CXX object tests/CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o
cd /Users/az/Programmierung/viennacl-dev/build/tests && /usr/bin/c++ -I/Users/az/Programmierung/viennacl-dev -I/opt/local/include -I/usr/local/include -I/Users/az/Programmierung/viennacl-dev/external -I/Users/az/Programmierung/viennacl-dev/libviennacl/include -DVIENNACL_WITH_OPENCL -o CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o -c /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:18:
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:31:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/scalar.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/memory.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/mem_handle.hpp:32:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/opencl.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/backend.hpp:26:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/context.hpp:36:
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:232:20: error:
member reference base type 'const long' is not a structure or union
assert(&val.handle().opencl_handle().context() == &handle_.conte...
~~~^~~~~~~
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__, __LINE...
^
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:348:118: note: in
instantiation of function template specialization
'viennacl::ocl::kernel::arg<long>' requested here
...arg(4, t4); arg(5, t5); arg(6, t6); arg(7, t7); arg(8, t8); arg(9, t9);
^
/Users/az/Programmierung/viennacl-dev/viennacl/linalg/opencl/matrix_operations.hpp:222:32: note:
in instantiation of function template specialization
'viennacl::ocl::kernel::operator()<viennacl::ocl::handle<_cl_mem *>,
unsigned int, unsigned int, unsigned int, unsigned int, unsigned int,
unsigned int, unsigned int, unsigned int, long>' requested here
viennacl::ocl::enqueue(k(viennacl::traits::opencl_handle(mat),
^
/Users/az/Programmierung/viennacl-dev/viennacl/linalg/matrix_operations.hpp:161:11: note:
in instantiation of function template specialization
'viennacl::linalg::opencl::matrix_assign<long, viennacl::column_major>'
requested here
viennacl::linalg::opencl::matrix_assign(mat, s, clear);
^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:632:9: note: in
instantiation of function template specialization
'viennacl::linalg::matrix_assign<long, viennacl::column_major>' requested
here
viennacl::linalg::matrix_assign(*this, SCALARTYPE(0), true);
^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:261:11: note: in
instantiation of member function 'viennacl::matrix_base<long,
viennacl::column_major, unsigned long, long>::clear' requested here
clear();
^
/Users/az/Programmierung/viennacl-dev/viennacl/matrix.hpp:757:105: note: in
instantiation of member function 'viennacl::matrix_base<long,
viennacl::column_major, unsigned long, long>::matrix_base' requested here
...columns, viennacl::context ctx = viennacl::context()) : base_type(rows, ...
^
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:675:19: note: in
instantiation of member function 'viennacl::matrix<long,
viennacl::column_major, 1>::matrix' requested here
VCLMatrixType vcl_A_full(4 * dim_rows, 4 * dim_cols);
^
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:42:7: note:
in instantiation of function template specialization
'run_test<viennacl::column_major, long>' requested here
if (run_test<viennacl::column_major, long>(epsilon) != EXIT_SUCCESS)
^
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_col_int.cpp:18:
In file included from /Users/az/Programmierung/viennacl-dev/tests/src/matrix_int.hpp:31:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/scalar.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/memory.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/mem_handle.hpp:32:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/backend/opencl.hpp:28:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/backend.hpp:26:
In file included from /Users/az/Programmierung/viennacl-dev/viennacl/ocl/context.hpp:36:
/Users/az/Programmierung/viennacl-dev/viennacl/ocl/kernel.hpp:234:26: error:
member reference base type 'const long' is not a structure or union
cl_mem temp = val.handle().opencl_handle().get();
~~~^~~~~~~
2 errors generated.
make[2]: *** [tests/CMakeFiles/matrix_col_int-test-opencl.dir/src/matrix_col_int.cpp.o] Error 1
make[1]: *** [tests/CMakeFiles/matrix_col_int-test-opencl.dir/all] Error 2
make: *** [all] Error 2
Running viennacl-dev on aws cg1x.4xlarge.
When I run the program below with the viennacl::ocl::switch_context(thread_id_); statement it runs fine. When I comment out the statement "viennacl::ocl::switch_context(thread_id_);" I get the following output
0x7f4b70ac2d70
0x7f4b70ac2df0
9,9
terminate called after throwing an instance of 'viennacl::ocl::invalid_mem_object'
what(): ViennaCL: FATAL ERROR: CL_INVALID_MEM_OBJECT.
If you think that this is a bug in ViennaCL, please report it at [email protected] and supply at least the following information:
It seems like the viennacl::linalg::prod is not picking up the context from the objects.
//
// main.cpp
// NN-dual-gpu-test
//
// include necessary system headers
//include basic scalar and vector types of ViennaCL JH
//include the generic inner product functions of ViennaCL
using namespace std;
template
class worker
{
public:
worker(std::size_t tid) : thread_id_(tid) {}
void operator()()
{
viennacl::context ctx(const_cast<viennacl::ocl::context&>(viennacl::ocl::get_context(static_cast<long>(thread_id_))));
unsigned int rows = 9;
unsigned int cols = 9;
viennacl::matrix<float,viennacl::column_major> tmat(rows, cols,ctx);
viennacl::matrix<float,viennacl::column_major> amat(rows, cols,ctx);
viennacl::matrix<float,viennacl::column_major> rmat(rows, cols,ctx);
for (int ix = 0; ix <rows ; ++ix) {
for (int iy = 0; iy < cols; ++iy) {
tmat(ix,iy) = ix+iy;
amat(ix,iy) = ix+iy;
}
}
//viennacl::ocl::switch_context(thread_id_);
rmat = viennacl::linalg::prod(tmat,amat);
viennacl::backend::finish();
cout << rmat<<endl;
}
std::string message() const { return message_; }
private:
std::string message_;
std::size_t thread_id_;
};
int main()
{
//Change this type definition to double if your gpu supports that
typedef float ScalarType;
if (viennacl::ocl::get_platforms().size() == 0)
{
std::cerr << "Error: No platform found!" << std::endl;
return EXIT_FAILURE;
}
//
// Part 1: Setup first device for first context, second device for second context:
//
viennacl::ocl::platform pf = viennacl::ocl::get_platforms()[0];
std::vector<viennacl::ocl::device> const & devices = pf.devices();
// Set first device to first context:
viennacl::ocl::setup_context(0, devices[0]);
// Set second device for second context (use the same device for the second context if only one device available):
if (devices.size() > 1)
viennacl::ocl::setup_context(1, devices[1]);
else
viennacl::ocl::setup_context(1, devices[0]);
viennacl::backend::finish();
//cout << devices[0].full_info()<<endl;
cout << devices[0].id()<<endl;
//cout << devices[1].full_info()<<endl;
cout << devices[1].id()<<endl;
//
// Part 2: Now let two threads operate on two GPUs in parallel
//
worker<ScalarType> work_functor0(0);
worker<ScalarType> work_functor1(1);
std::thread worker_thread_0(work_functor0);
std::thread worker_thread_1(work_functor1);
worker_thread_0.join();
worker_thread_1.join();
std::cout << "!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!" << std::endl;
return EXIT_SUCCESS;
}
In this case, A and A^T have different semantics in the kernel, but refer to the same handle and are considered equal by the generator... I am really not sure on how to handle this. Plus, I'm pretty sure A_A^T and A_A can be implemented using a better kernel... Should I just forbid the handle of LHS and RHS to be the same in that case (and in a later version dispatch to different kernels)? I will try to find out a way to handle this, but this problem seems to lay deep down in the generator's structure... I had really not anticipated that the same handle could refer to two different was of accessing memory in the same kernel !
Now that the tuning procedures are (supposedly) bug free, it would be nice to have some default profiles to add in the builtin database for the Intel MIC platform, by just runing the provided blas{1,2,3}_tuning targets ( and see if it doesn't crash!)
[ 73%] Building CXX object tests/CMakeFiles/matrix_vector_int-test-opencl.dir/src/matrix_vector_int.cpp.o
cd /Users/az/Programmierung/viennacl-dev/build/tests && /usr/bin/c++ -I/Users/az/Programmierung/viennacl-dev -I/opt/local/include -I/usr/local/include -I/Users/az/Programmierung/viennacl-dev/external -I/Users/az/Programmierung/viennacl-dev/libviennacl/include -DVIENNACL_WITH_OPENCL -o CMakeFiles/matrix_vector_int-test-opencl.dir/src/matrix_vector_int.cpp.o -c /Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp:821:29: warning: implicit conversion from
'double' to 'NumericT' (aka 'long') changes value from 9.999999999999999E-12 to 0 [-Wliteral-conversion]
NumericT epsilon = 1.0E-11;
~~~~~~~ ^~~~~~~
/Users/az/Programmierung/viennacl-dev/tests/src/matrix_vector_int.cpp:837:29: warning: implicit conversion from
'double' to 'NumericT' (aka 'long') changes value from 9.999999999999999E-12 to 0 [-Wliteral-conversion]
NumericT epsilon = 1.0E-11;
~~~~~~~ ^~~~~~~
2 warnings generated.
This was probably not intended.
We should support viennacl::vector and the like.
Pitfalls:
Current LU factorization shows atrocious performance. With fast GEMM for submatrices available we should be able to come up with a portable high-performance implementation at a strikingly high level of abstraction.
Hello,
My program needs to have two threads and each thread needs to have its own context.
(The cg1.4xlarge instance has dual gpus) At the moment when I run one thread with a context containing a GPU my program runs fine. The program uses custom kernels.
When I run two threads it crashes with
terminate called after throwing an instance of 'viennacl::ocl::invalid_mem_object'
what(): ViennaCL: FATAL ERROR: CL_INVALID_MEM_OBJECT.
which I am pretty sure is because I am not setting up the classes to have completely
separate contexts, programs and queues etc.
Could you please indicate how to best implement the toy program below
which is mainly the multithreads example you provided (which runs fine).
(1) How do I add the program to the context? ctx.opencl_context() is a const. Rather than a
viennacl::context should I use a veinal::ocl::context?
(2) When I viennacl::ocl::enqueue do I have to specify the queue for that device in the context?
I have been getting greats results with viennacl, just having a bit of trouble here.
template
class worker
{
public:
worker(std::size_t tid) : thread_id_(tid) {}
void operator()()
{
std::size_t N = 6;
viennacl::context ctx(viennacl::ocl::get_context(static_cast<long>(thread_id_)));
viennacl::vector<NumericT> u = viennacl::scalar_vector<NumericT>(N, NumericT(1) * NumericT(thread_id_ + 1), ctx);
viennacl::vector<NumericT> v = viennacl::scalar_vector<NumericT>(N, NumericT(2) * NumericT(thread_id_ + 1), ctx);
viennacl::matrix<NumericT> A = viennacl::linalg::outer_prod(u, v);
viennacl::vector<NumericT> x(u);
u += v;
NumericT result = viennacl::linalg::norm_2(u);
std::stringstream ss;
ss << "Result of thread " << thread_id_ << " on device " << viennacl::ocl::get_context(static_cast<long>(thread_id_)).devices()[0].name() << ": " << result << std::endl;
ss << " A: " << A << std::endl;
ss << " x: " << x << std::endl;
message_ = ss.str();
}
std::string message() const { return message_; }
private:
std::string message_;
std::size_t thread_id_;
};
Due to an internal change in the kernels used, the matrix parameter tuner is currently not working. This issue interacts with the ongoing kernel generator integration, so there may be a more general replacement in 1.5.0
When I do
#include <viennacl/generator/generate.hpp>
I get the following error:
In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:627:29: error: variable has
incomplete type 'viennacl::ocl::kernel'
viennacl::ocl::kernel temp(kernel_handle, *this, *p_context_, kern...
^
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
'viennacl::ocl::kernel'
class kernel;
^
In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:640:15: error: member access into
incomplete type 'viennacl::ocl::kernel'
if (it->name() == name)
^
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
'viennacl::ocl::kernel'
class kernel;
^
In file included from main.cpp:12:
In file included from /usr/local/include/viennacl/generator/generate.hpp:33:
In file included from /usr/local/include/viennacl/generator/profiles.hpp:35:
In file included from /usr/local/include/viennacl/generator/profile_base.hpp:30:
In file included from /usr/local/include/viennacl/ocl/kernel.hpp:32:
In file included from /usr/local/include/viennacl/ocl/backend.hpp:26:
/usr/local/include/viennacl/ocl/context.hpp:650:32: error: incomplete type
'viennacl::ocl::kernel' named in nested name specifier
inline void viennacl::ocl::kernel::set_work_size_defaults()
~~~~~~~~~~~~~~~^~~~~~~~
/usr/local/include/viennacl/ocl/forwards.h:44:11: note: forward declaration of
'viennacl::ocl::kernel'
class kernel;
^
I just noticed the following code fails with custom kernels.
#include <iostream>
#include <viennacl/ocl/backend.hpp>
#include <viennacl/scalar.hpp>
using namespace std;
int main(){
try{
viennacl::scalar<float> a;
string prog =
"__kernel void\n"
"set(__global float *ret)\n"
"{\n"
" *ret = 1;\n"
"}";
viennacl::ocl::program& ref = viennacl::ocl::current_context().add_program(prog, "prog");
ref.add_kernel("set");
viennacl::ocl::kernel& set = viennacl::ocl::get_kernel("prog", "set");
viennacl::ocl::enqueue(set(a));
cout << a << endl;
} catch (const exception& e) {
cerr << e.what() << endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
It generates the following error message:
Assertion failed: (val_.get_active_handle_id() != viennacl::MEMORY_NOT_INITIALIZED && bool("Scalar not initialized, cannot read!")), function operator float, file ../../soft/ViennaCL-1.4.2/viennacl/scalar.hpp, line 278.
The program has unexpectedly finished.
Then I noticed that inside scalar.hpp
, the empty constructor does not actually allocate memory for the object (although the comment says it does!). Changing the code to the following (and including memory allocation) seems to fix the my problem
/** @brief Allocates the memory for the scalar, but does not set it to zero. */
scalar()
{
viennacl::backend::memory_create(val_, sizeof(SCALARTYPE));
} //No initialization yet in order to allow for global variables
I am not sure if there is something I'm doing wrong or this is actually a bug.
Code:
// c++ test_prod_sparse.cpp -std=c++11
#include <viennacl/vector.hpp>
#include <viennacl/matrix.hpp>
#include <viennacl/compressed_matrix.hpp>
#include <viennacl/linalg/prod.hpp>
#include <viennacl/linalg/vector_operations.hpp>
#include <viennacl/linalg/matrix_operations.hpp>
#include <viennacl/scalar.hpp>
#include <viennacl/matrix_proxy.hpp>
int main() {
using namespace viennacl;
using namespace viennacl::linalg;
auto a = matrix<float>(10,10);
auto b = compressed_matrix<float>(10,10);
auto v = prod(a, b);
return 0;
}
Error:
az@azmacbookpro ~/P/N/NN-OCR> c++ test_prod_sparse.cpp -std=c++11
test_prod_sparse.cpp:18:11: error: no matching function for call to 'prod'
auto v = prod(a, b);
^~~~
/usr/local/include/viennacl/linalg/prod.hpp:91:5: note: candidate template
ignored: failed template argument deduction
prod(std::vector< std::vector<T, A1>, A2 > const & matrix, VectorT c...
^
/usr/local/include/viennacl/linalg/prod.hpp:106:5: note: candidate template
ignored: failed template argument deduction
prod(std::vector< std::map<KEY, DATA, COMPARE, AMAP>, AVEC > const& ...
^
/usr/local/include/viennacl/linalg/prod.hpp:142:5: note: candidate template
ignored: failed template argument deduction
prod(viennacl::matrix_base<NumericT, F1> const & A,
^
/usr/local/include/viennacl/linalg/prod.hpp:158:5: note: candidate template
ignored: failed template argument deduction
prod(viennacl::matrix_base<NumericT, F1> const & A,
^
/usr/local/include/viennacl/linalg/prod.hpp:178:5: note: candidate template
ignored: failed template argument deduction
prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F1>,
^
/usr/local/include/viennacl/linalg/prod.hpp:201:5: note: candidate template
ignored: failed template argument deduction
prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F1>,
^
/usr/local/include/viennacl/linalg/prod.hpp:225:5: note: candidate template
ignored: failed template argument deduction
prod(viennacl::matrix_base<NumericT, F> const & matrix,
^
/usr/local/include/viennacl/linalg/prod.hpp:241:5: note: candidate template
ignored: failed template argument deduction
prod(viennacl::matrix_expression<const viennacl::matrix_base<NumericT, F>,
^
/usr/local/include/viennacl/linalg/prod.hpp:261:5: note: candidate template
ignored: failed template argument deduction
prod(const SparseMatrixType & mat,
^
/usr/local/include/viennacl/linalg/prod.hpp:275:5: note: candidate template
ignored: failed template argument deduction
prod(const SparseMatrixType & sp_mat,
^
/usr/local/include/viennacl/linalg/prod.hpp:292:5: note: candidate template
ignored: failed template argument deduction
prod(const SparseMatrixType & A,
^
/usr/local/include/viennacl/linalg/prod.hpp:310:5: note: candidate template
ignored: failed template argument deduction
prod(const StructuredMatrixType & mat,
^
1 error generated.
Currently only a single work group is used because synchronizations are required. For matrices above ~1000x1000 it makes sense to use panel-like updates, i.e.
OpenMP up to version 2.5 only specifies parallel for-loops for signed integer types. The use of std::size_t is not sufficient here...
The following doesn't compile due to a missing template specialization
#include "viennacl/matrix.hpp"
#include "viennacl/vector.hpp"
int main(){
viennacl::matrix<double> A(10,10);
viennacl::vector<double> x(10);
viennacl::scalar<double> alpha(2);
A += alpha*viennacl::linalg::outer_prod(x,x);
}
This can be trivially solved by adding in viennacl/tools/tools.hpp:306
template <typename ScalarType, typename T>
struct MATRIX_EXTRACTOR_IMPL<viennacl::matrix_expression<const viennacl::vector_base<ScalarType>, T, op_prod>,
viennacl::scalar<ScalarType> >
{
typedef viennacl::matrix<ScalarType, viennacl::row_major> ResultType;
};
However, this seems to slightly overlap with the CPU Scalar case, and it is not impossible that several other parts from the code suffer from the same issue. Is this fix reasonable?
Increase the robustness of ILU preconditioners by (optional?) pivoting. Requested by Christopher Batty in this thread:
https://sourceforge.net/p/viennacl/discussion/1143678/thread/d104427f/
Please comment if you're interested in this feature so that we can prioritize it accordingly.
This leads to the warning:
/usr/local/include/viennacl/vector.hpp:700:47: Implicit conversion loses integer precision: 'unsigned long' to 'unsigned int'
size_type
should be used instead.
Same thing for explicit const_entry_proxy(unsigned int mem_offset, ...
:
/usr/local/include/viennacl/matrix.hpp:565:46: Implicit conversion loses integer precision: 'vcl_size_t' (aka 'unsigned long') to 'unsigned int'
It seems that in many places, std::size_t
is used directly instead of ...::size_type
(or vcl_size_t
). Is that by purpose? If so, it seems odd that vcl_size_t
exists at all.
Also, in sparse_matrix_operations.hpp, in prod_impl, is that by purpose?
unsigned int const * coords = detail::extract_raw_pointer<unsigned int>(mat.handle2());
Or should it use vcl_size_t
instead?
Currently vector<>, vector_range<>, and vector_slice<> are entirely unrelated types. Similarly for matrix<>, matrix_range<>, and matrix_slice<>. To reduce compiler load and thus compilation times, the triplets can be unified in a common base class, e.g. vector_base and matrix_base. This will also help in reducing the necessary operator overloads.
Thanks again for the quick resolution of the one below. This one does not show up as a leak in the profiler but uses an extra 17MB every 10k iterations. OSX 10.9.1 viennacl-dev
void vcl_inner_prod_MemoryTest() {
viennacl::vector<float> v1 = viennacl::scalar_vector<float>(42, 42.0f);
float f;
for (int ix = 0; ix < 10000000; ix++) {
f = viennacl::linalg::inner_prod(v1,v1);
viennacl::backend::finish();
if (ix % 1000 == 0) cout << "Iter:" << ix << endl;
}
}
Special cases like x = prod(A,x) require additional attention. There are checks in ViennaCL for this case, but they don't provide a unified behavior: Some create a temporary vector (good), others get stuck in asserts() (not so good). Some checks are overly restrictive when ranges are used.
The current SPAI preconditioners should be refactored to support multiple compute backends. Also, a better overlapping of computations on CPU and GPU should be provided.
We should support viennacl::max(x) and viennacl::min(x) to better support a wider range of algorithms (cf. https://sourceforge.net/p/viennacl/discussion/1143678/thread/d0887e01/). Certainly makes sense for vectors, maybe also for matrices.
Since ViennaCL 1.4.2 (the bug is not here in 1.4.1), the simple code
int main(){
viennacl::matrix A;
viennacl::matrix B(A);
}
fails with
./viennacl/matrix.hpp:662: void viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::resize(viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type, viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type, bool) [with SCALARTYPE = double; F = viennacl::row_major; SizeType = long unsigned int; DistanceType = long int; viennacl::matrix_base<SCALARTYPE, F, SizeType, DistanceType>::size_type = long unsigned int]: Assertion `(rows > 0 && columns > 0) && bool("Check failed in matrix::resize(): Number of rows and columns must be positive!")' failed.
Required for some flavors of GMRES.
We would like to provide good performance operations like x = solve(A, y, upper_tag()); with sparse A properly. The level scheduling implemented for LU certainly helps, but any factorization is missing for a dense direct solver. Ideas from the various sparse packages like UMFPACK or SuperLU can be reused. Latency, however, can be a show-stopper.
For tiny matrices (e.g. 4x5) there seem to be some problems.
Version of pugiXML in ViennaCL 1.4.2 does not build on MacOS, update required:
#43
In addition to CSR, COO, ELL and HYB we should add at least DIA. Also, further improvements can be done obtained custom (specialized) formats.
While the BLAS3 autotuning procedure works well on the NVidia SDK, it crashes on the Intel MIC as well as on the latest version of the AMD App SDK.
The following
void VCLmemoryTest() {
viennacl::matrix<float,viennacl::column_major> A;
viennacl::matrix<float,viennacl::column_major> B = viennacl::identity_matrix<float>(1024);
for (int i = 0; i < 1000000000; i++) {
A = viennacl::linalg::prod(B, B);
}
}
uses more and more memory. The output of the profiler is here
For operations such as
x = y + z;
x = y - z;
there are currently two separate kernels launched, leading to unnecessary memory transfers. Expression templates are not enough to resolve this, so we need a micro-scheduler for fusing operations and passing them on to a kernel generator facility.
The following code creates surprising results if VectorType is vector_base rather than vector:
VectorType result = rhs;
viennacl::traits::clear(result);
Similar issues may arise with matrix_base. A good way of dealing with this is required: Disallow copy-CTOR of *_base? Always create a deep copy?
Toby noted on viennacl-support (Feb 18, 20:28 MEZ) that the inclusion of direct_solve.hpp and gmres.hpp in the same compilation unit causes ambiguity problems on GCC 4.8 (and possibly others). Needs to be fixed.
Since synchronizations are cheap for block-ILU, it makes sense to extend the level scheduling logic from ILU to block-ILU.
Otherwise there's unnecessary copying. Not too hard to add this.
There are a few inconsistencies when using viennacl::copy(). Sometimes an empty object is resized accordingly, sometimes it is not. A single unified behavior is desirable.
Is fairly important for users. Need to be partly rewritten for higher efficiency and for supporting multiple compute backends.
The generator is merged in the master branch, but the code can still be polished (some base classes to introduce, dirty code to remove ...) and is yet to be documented.
It would be nice for users to pass STL-types to solvers directly, e.g. a std::vector<> and a sparse matrix of type std::vector<std::map<U, T> >. Only needs a bit of wrapper logic with respect to the sparse_matrix_adapter.
Implement a diag-like matlab operator in the generator. diag(matrix) would return the vector of the diagonal elements, and diag(vector) a symbolic diagonal matrix whose diagonal elements are specified by vector...
Useful for postprocessing an SVD decomposition for example, and compute the inverse/matrix square root of the input.
Quite a number of applications relies on complex arithmetics. The difficulty is the lack of native support for complex_t in OpenCL, thus all operations need to be emulated. Addition and subtraction are easy, but multiplication and divisions are tricky. Emulation of sqrt() and the like is also required.
Allows better integration into other languages such as Python. Also provides a performance-portable BLAS library.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.