nvidia / amgx Goto Github PK

View Code? Open in Web Editor NEW

466.0 466.0 136.0 2.95 MB

Distributed multigrid linear solver library on GPU

CMake 0.23% C++ 30.26% C 10.75% Shell 0.04% Cuda 58.65% Python 0.08%

amgx's People

Contributors

Stargazers

Watchers

Forkers

caomw eminsight zenotech niklaskarla myousefi2016 qingshanchen shwina julianocristian jihoyang muelu numerical-fun mkdx gahansen hubailmm pingyuan162 andrei-pokrovsky cpehle costat afcarl luckywizard jegerjensen xianjiaodahe91 cponder ftvkun realasking dhan75025 phoenixflyinghigh dl-frameworks xyuan drjiathu hkqdtc1 wanyzh navytensor mhrywniak ayuxjh michaelsekachev zhbxlm ezhangle petsc paklui jirikraus amruta-bandhu-chaudhury vivicoco haoxiangmiao xiaoleishi-th fizmat mattmartineau cncae junhua-zhang geotyper marsaev jimshith hixio-mh toothstone lacrymose petrkodl bubersome amit2016-17 gsc74 gonzalobg joshessman-llnl yerbapage aerosayan hamidzandipour li12242 mx7f kristofferc artv3 fbordignon yinliu-91 nanmi alsam cybernitta antdogg149 byran77 mfkiwl kuangllbnu pdpdhp cjnolet ahdhn sfncy lj-cug harrywaugh hotakayagi rqzhang0 randallrain lilux618 ysakai-tum thanduriel ecooll prajwalrrao prajwalrao93 oluwafemi2016 morphorm22 samkenxstream drmaruyama aaronyicongfu ytzhang1 ningjiafei ajunlonglive

amgx's Issues

An issue from Fraunhofer-Chalmers Centre: amgx_capi_multi

@marsaev
@lukeyeager

Running amgx_capi_multi example with n=2, on a machine with two identical GPUs, produces (after the 1st solve is completed)

Caught amgx exception: Could not create the CUDENSE handle

Compilation Issue CUDA 9.2 and VS2017

The following error occurs:

polynomial_solver.cu(291): error C2668: "amgx::polynomial_solver::poly_postsmooth":

polynomial_solver.cu(351): error C2668: "amgx::polynomial_solver::poly_presmooth":

Cannot compile with sm30 on Quadro K1100

Hi,

So I am looking at compiling AmgX on my laptop Quadro K1100. After making the changes discussed here #11 the project compiles but running examples won't work no kernel image is available for execution on the device. I checked the CMakeLists.txt and of coursed it's because we are fixing CUDA_ARCH to sm35 and greater. So I go back to my configuration and based on this website I put sm30 in my config and compile procedure:

install_dir=${HOME}/apps/amgx
build_dir=build-gcc-ompi

# clean
rm -rf $build_dir
mkdir $build_dir
cd $build_dir

export OMPI_CC=/opt/cuda/bin/gcc
export OMPI_CXX=/opt/cuda/bin/g++

cmake \
    -DCMAKE_INSTALL_PREFIX=$install_dir \
    -DCMAKE_C_COMPILER=/opt/cuda/bin/gcc \
    -DCUDA_ARCH="30" \
    -DCMAKE_CXX_COMPILER=/opt/cuda/bin/g++ \
    ../ && make -j2 && make install

Alas, this breaks pretty early on producing lots of errors. See the file attached. I am investigating it independently, but if you have and advice, please let me know.

No host implementation of the dense LU solver

I get the following error message:

"Caught amgx exception: No host implementation of the dense LU solver"

How can I fix this?

I use the following configuration: (mode=hDDI)
config_version=2
solver(main)=FGMRES
main:max_iters=300
main:convergence=RELATIVE_MAX
main:tolerance=0.00000001
main:monitor_residual=1
main:preconditioner(amg)=AMG
main:print_solve_stats=1
amg:algorithm=CLASSICAL
amg:cycle=V
amg:max_iters=1
amg:max_levels=10
amg:smoother(amg_smoother)=BLOCK_JACOBI
amg:relaxation_factor=0.75
amg:presweeps=1
amg:postsweeps=2
amg:coarsest_sweeps=4
determinism_flag=1

Exact output:
AMGX version 2.0.0.130-opensource
Built on May 21 2019, 15:47:45
Compiled with CUDA Runtime 10.0, using CUDA driver 10.2
Cannot read file as JSON object, trying as AMGX config
Caught amgx exception: No host implementation of the dense LU solver
at: /home/lucas/Repositories/AMGX-master/core/include/solvers/dense_lu_solver.h:65

More information:
Ubuntu 18.04 (gcc 7.4)
CUDA 10.0
AMGX source from git master at: 21st of may
Build using MKL 2019.3-062 and MAGMA 2.3.0

MESH SOFTWARE EXPORT TO MATRIX MARKET?

There is a software cad, what export to matrix market format?

Right-hand-side of Poisson examples should be changed

AMGX ships with many nice sample programs, including simple implementation of Poisson-like equations. Currently, the charge function (e.g. in A x = b, the array b) is set to 1.d0 everywhere.

This is a poor choice, since integrating a constant over larger and large volumes gives you an exponentially growing solution. As a consequence, the sample cases have terrible convergence rates and do not scale as the problem size is increased.

I suggest a better choice: set b = sin( 2 * pi * x / Lx ) * sin( 2 * pi * y / Ly ) * sin( 2 * pi * z / Lz ), which satisfies either periodic or Dirichlet boundary conditions. This is more representative of the use of a Poisson solver.

Compute capability 2.0?

Would AMGX support GPUs with compute capability 2.0? I see this PR by @niklaskarla (edit - apologies for tagging the wrong user ID) adds support for devices with 3.0. Wondering if a similar fix could accommodate lower CC?

Build AMG.sln successfully, but can't debug.AND How can I choose different matrix and algorithm?

I debug the project amgx_capi,then shows:

Usage: ./amgx_capi [-mode [hDDI | hDFI | hFFI | dDDI | dDFI | dFFI]] [-m file] [-c config_file] [-amg "variable1=value1 ... variable3=value3"]
-mode: select the solver mode
-m file: read matrix stored in the file
-c: set the amg solver options from the config file
-amg: set the amg solver options from the command line
请按任意键继续. . .

Mode switch missing in the docs?

Indepdently, I got AmgX to compile on K80s on our cluster and I can now can run the examples. It seems to me that -mode switch is missing in README.md e.g. this

examples/amgx_capi -m ../examples/matrix.mtx -c ../core/configs/FGMRES_AGGREGATION.json

Should be probably something like:

examples/amgx_capi -mode dDDI -m ../examples/matrix.mtx -c ../core/configs/FGMRES_AGGREGATION.json

Happy to provide an independent pull request with README.md update and a tabular description of modes if you think it's useful.

Error building for Ubuntu 18.04.02 + CUDA 10.1.105-1

-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found MPI_C: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
This is a MPI build:TRUE
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "10.1")
Cuda libraries: /usr/local/cuda/lib64/libcudart_static.a-lpthreaddl/usr/lib/x86_64-linux-gnu/librt.so
-- Configuring done
-- Generating done
-- Build files have been written to: /home/victor/AMGX/build

Also use gcc and g++ version 4.8.5 without success.

During installation, multiple notifications appear and as a result, the build crashes:
/home/victor/AMGX/base/include/matrix.h:247:3646: note: in C++11 destructors default to noexcept
/home/victor/AMGX/base/include/matrix.h:247:4053: warning: throw will always call terminate() [-Wterminate]
cusparseCheckError(cusparseDestroyMatDescr(cuMatDescr));
/home/victor/AMGX/base/include/matrix.h:247:4053: note: in C++11 destructors default to noexcept
CMakeFiles/Makefile2:165: recipe for target 'base/CMakeFiles/amgx_base.dir/all' failed
make[1]: *** [base/CMakeFiles/amgx_base.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

A big request to help deal with the problem and build the library

Reproduce results on machine with CUDA 10

My examples/amgx_capi prints

AMGX version 2.0.0.130-opensource
Built on May 17 2019, 14:26:12
Compiled with CUDA Runtime 9.2, using CUDA driver 10.1

The NVIDIA docs say CUDA driver means the highest supported version, and CUDA Runtime the actual used version.

The problem is now that I cannot reproduce my previous results, when CUDA 10.1 was not yet installed. For example, using CG on bodyy6.mtx with a custom B vector now requires 843 iterations, versus 568 earlier. cfd2.mtx does not converge at all, where it only took 22 iterations before. Does anyone else have these problems?
If you'd like to run the experiments as well, unpack the zip in examples/ (replacing amgx_capi.c), make and run with

gcc -O2 -std=c99 amgx_capi.c -c -I/usr/local/cuda-9.2/include -I../base/include
g++ -O2 amgx_capi.o -o amgx_capi -L/usr/local/cuda-9.2/lib64 -L../build -ldl -L../lib -lamgxsh -Wl,-rpath=../build
./amgx_capi -m bodyy6.mtx -b B.ltx -x X.ltx -c CG.json
./amgx_capi -m cfd2.mtx -c CG.json

debug.zip

GPU memory usage estimation

AMGX is currently able to provide an estimation of the GPU memory usage. Currently, this is achieved by computing the memory used by all processes, thanks to cudaMemGetInfo function.

It would be more interesting to only provide the memory used by the AMGX process.
To get the amount of memory used by the AMGX process, you may store at launch the current memory used estimation and subtract it to each "allocated = total - free" in the updateMaxMemoryUsage function.

why the result is emanative?

iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.743436 1.242538e+02
0 0.743436 1.895453e+03 15.2547
1 0.7434 2.219963e+03 1.1712
2 0.7434 6.992442e+03 3.1498
3 0.7434 4.034371e+04 5.7696
4 0.7434 2.512463e+05 6.2276
5 0.7434 1.574859e+06 6.2682
6 0.7434 9.933583e+06 6.3076
7 0.7434 6.550438e+07 6.5942
8 0.7434 4.426426e+08 6.7575
9 0.7434 2.992345e+09 6.7602
10 0.7434 2.048499e+10 6.8458
11 0.7434 1.371790e+11 6.6966
12 0.7434 8.775767e+11 6.3973
13 0.7434 5.814756e+12 6.6259
14 0.7434 3.906154e+13 6.7177
15 0.7434 2.647010e+14 6.7765
16 0.7434 1.822857e+15 6.8865
17 0.7434 1.258167e+16 6.9022
18 0.7434 8.768401e+16 6.9692
19 0.7434 6.146868e+17 7.0102
20 0.7434 4.336527e+18 7.0549
21 0.7434 3.075281e+19 7.0916
22 0.7434 2.193094e+20 7.1314
23 0.7434 1.572653e+21 7.1709
24 0.7434 1.133802e+22 7.2095
25 0.7434 8.224542e+22 7.2539
26 0.7434 5.996229e+23 7.2907
27 0.7434 4.401880e+24 7.3411
28 0.7434 3.243756e+25 7.3690
29 0.7434 2.408748e+26 7.4258
30 0.7434 1.788772e+27 7.4261
31 0.7434 1.339881e+28 7.4905
32 0.7434 9.923855e+28 7.4065
33 0.7434 7.419991e+29 7.4769
34 0.7434 5.320212e+30 7.1701
35 0.7434 3.810510e+31 7.1623
36 0.7434 2.739372e+32 7.1890
37 0.7434 1.967098e+33 7.1808
38 0.7434 1.414827e+34 7.1925
39 0.7434 1.017845e+35 7.1941
40 0.7434 7.338657e+35 7.2100
41 0.7434 5.301943e+36 7.2247
42 0.7434 3.837148e+37 7.2372
43 0.7434 2.786919e+38 7.2630
44 0.7434 2.020863e+39 7.2512
45 0.7434 1.470666e+40 7.2774
46 0.7434 1.058089e+41 7.1946
47 0.7434 7.589583e+41 7.1729
48 0.7434 5.452218e+42 7.1838
49 0.7434 3.913964e+43 7.1787
50 0.7434 2.815332e+44 7.1930
51 0.7434 2.026757e+45 7.1990
52 0.7434 1.461508e+46 7.2111
53 0.7434 1.056404e+47 7.2282
54 0.7434 7.628804e+47 7.2215
55 0.7434 5.523249e+48 7.2400
56 0.7434 3.972145e+49 7.1917
57 0.7434 2.852312e+50 7.1808
58 0.7434 2.046261e+51 7.1740
59 0.7434 1.470226e+52 7.1849
60 0.7434 1.055143e+53 7.1767
61 0.7434 7.573274e+53 7.1775
62 0.7434 5.432389e+54 7.1731
63 0.7434 3.901670e+55 7.1822
64 0.7434 2.800355e+56 7.1773
65 0.7434 2.015150e+57 7.1961
66 0.7434 1.449547e+58 7.1932
67 0.7434 1.045759e+59 7.2144
68 0.7434 7.526693e+59 7.1973
69 0.7434 5.426749e+60 7.2100
70 0.7434 3.898031e+61 7.1830
71 0.7434 2.798482e+62 7.1792
72 0.7434 2.007781e+63 7.1745
73 0.7434 1.443555e+64 7.1898
74 0.7434 1.036727e+65 7.1818
75 0.7434 7.455786e+65 7.1917
76 0.7434 5.355224e+66 7.1826
77 0.7434 3.851526e+67 7.1921
78 0.7434 2.766181e+68 7.1820
79 0.7434 1.989039e+69 7.1906
80 0.7434 1.428224e+70 7.1805
81 0.7434 1.026730e+71 7.1889
82 0.7434 7.371549e+71 7.1796
83 0.7434 5.298678e+72 7.1880
84 0.7434 3.804295e+73 7.1797
85 0.7434 2.734669e+74 7.1884
86 0.7434 1.963625e+75 7.1805
87 0.7434 1.411857e+76 7.1901
88 0.7434 1.013979e+77 7.1819
89 0.7434 7.292470e+77 7.1919
90 0.7434 5.238013e+78 7.1828
91 0.7434 3.767246e+79 7.1921
92 0.7434 2.705717e+80 7.1822
93 0.7434 1.945582e+81 7.1906
94 0.7434 1.397137e+82 7.1811
95 0.7434 1.004480e+83 7.1896
96 0.7434 7.212811e+83 7.1806
97 0.7434 5.185486e+84 7.1893
98 0.7434 3.723612e+85 7.1808
99 0.7434 2.677180e+86 7.1897
--------------------------------------------------------------
Total Iterations: 100
Avg Convergence Rate: 6.9716
Final Residual: 2.677180e+86
Total Reduction in Residual: 2.154607e+84
Maximum Memory Usage: 0.743 GB

Specifying RHS vector in amgx_capi

I started playing a little bit with matrices from SuiteSparse Matrix Collection and some matrices there come together with separate files for RHS vectors. At the moment I can't get them to work for amgx_capi and amgx_mpi_cap applications.

The apps use AMGX_read_system function to read the matrices. I found in AmgX documentation the following passage:

%%MatrixMarket matrix coordinate real general
%%AMGX block_dimx(int) block_dimy(int) diagonal sorted rhs solution
%% mxn matrix with nnz non-zero elements
%% m=block_dimx*n_block_rows, n=block_dimy*n_block_cols
%% nnz=block_dimx*block_dimy*n_block_entrees
m(int) n(int) nnz(int)
1 1 a_11
1 2 a_12
...
i j a_ij
...
%% these two comment lines present only for the description (to be removed)
%% optional diagonal mx1
...
a_ii
...
%% these two comment lines present only for the description (to be removed)
%% optional rhs mx1
...
b_i
...
%% these two comment lines present only for the description (to be removed)

So as an example I cat atmosmodl.mtx atmosmodl_b.mtx > atmosmodl_Ab.mtx and remove comment lines in between. That doesn't seem to do anything and I still get the message:

Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T

Also, I am concerned that the diagonal entries are spread independently within the first block of data.

Please advise. I am happy to delve deeper in the code, but I thought I'll check first.

The DEBUG version couldn't be built due to "calling a hostfunction from a host device function"

The compiler I use is VS 2017, and the CUDA I use is CUDA 10. I have tried it on two computers, and I got the same errors every time.

The errors I get are:
calling a host function("std::_Iterator_base12::_Iterator_base12") from a host device function("std::_Iterator_base12::_Iterator_base12 [subobject]") is not allowed amgx_core C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\include\thrust\system\cuda\detail\assign_value.h

calling a host function("std::_Iterator_base12::_Iterator_base12") from a host device function("std::_Iterator_base12::_Iterator_base12 [subobject]") is not allowed amgx_core C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\include\thrust\system\cuda\detail\assign_value.h

By the way, it take me 4 to 6 hours to build amgx_core project with MPI. Would it be possible that I set up wrong somewhere?

Non-mpi build fails with GCC-5

Hi,
I tried to build AMGX on ubuntu 16.04, but was not able to go very far.
Basic information about my system:
Ubuntu 16.04,
gcc, g++, 5.4.0
cuda 7.5

Compilation fails at 4%:
[ 4%] Building NVCC (Device) object base/CMakeFiles/amgx_base.dir/src/energymin/interpolators/amgx_base_generated_em_interpolator.cu.o
/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined

Any suggestions on how to move forward? Thanks a lot!

An (indirect) issue from Fraunhofer-Chalmers Centre: CUDA 9.0/9.1

@marsaev Me and @niklaskarla had been having great performance issues with cusparse< type >csrsv_analysis in CUDA 9.0 and CUDA 9.1. It is 20 times slower than in CUDA 8.0, just running the same Sample code conjugateGradientPrecond on same GPU for a matrix sufficiently large enough.

We know it is not AMGX related (from what I can see that specific function is not being called anywhere in AMGX), but since having a Windows AMGX requires CUDA 9.0+, this enforces us to use those versions. Is it a known bug? Could we direct it to the cuSparse API group?

Openfoam Matrix

Hi guys, how i convert one mesh matrix of openfoam to a matrix market for AMGX?

Error in amgx_mpi_poisson5pt.c example

There are two errors in the 5 point 2D discretization of Poisson's equation. The A matrix is generated in the example code that comes along with AmgX library as follows:


1.         for (int i = 0; i < n; i ++)
2.         {
3.             row_ptrs[i] = nnz;
4. 
5.             if (rank > 0 || i > ny)
6.             {
7.                 col_indices[nnz] = (i + start_idx - ny);
8. 
9.                 if (sizeof_m_val == 4)
10.                 {
11.                     ((float *)values)[nnz] = -1.f;
12.                 }
13.                 else if (sizeof_m_val == 8)
14.                 {
15.                     ((double *)values)[nnz] = -1.;
16.                 }
17. 
18.                 nnz++;
19.             }
20. 
21.             if (i % ny != 0)
22.             {
23.                 col_indices[nnz] = (i + start_idx - 1);
24. 
25.                 if (sizeof_m_val == 4)
26.                 {
27.                     ((float *)values)[nnz] = -1.f;
28.                 }
29.                 else if (sizeof_m_val == 8)
30.                 {
31.                     ((double *)values)[nnz] = -1.;
32.                 }
33. 
34.                 nnz++;
35.             }
36. 
37.             {
38.                 col_indices[nnz] = (i + start_idx);
39. 
40.                 if (sizeof_m_val == 4)
41.                 {
42.                     ((float *)values)[nnz] = 4.f;
43.                 }
44.                 else if (sizeof_m_val == 8)
45.                 {
46.                     ((double *)values)[nnz] = 4.;
47.                 }
48. 
49.                 nnz++;
50.             }
51. 
52.             if ((i + 1) % ny == 0)
53.             {
54.                 col_indices[nnz] = (i + start_idx + 1);
55. 
56.                 if (sizeof_m_val == 4)
57.                 {
58.                     ((float *)values)[nnz] = -1.f;
59.                 }
60.                 else if (sizeof_m_val == 8)
61.                 {
62.                     ((double *)values)[nnz] = -1.;
63.                 }
64. 
65.                 nnz++;
66.             }
67. 
68.             if ( (rank != nranks - 1) || (i / ny != (nx - 1)) )
69.             {
70.                 col_indices[nnz] = (i + start_idx + ny);
71. 
72.                 if (sizeof_m_val == 4)
73.                 {
74.                     ((float *)values)[nnz] = -1.f;
75.                 }
76.                 else if (sizeof_m_val == 8)
77.                 {
78.                     ((double *)values)[nnz] = -1.;
79.                 }
80. 
81.                 nnz++;
82.             }
83.         }

The two errors are in the following conditions:
Line 5. if (rank > 0 || i > ny)
Line 52. if ((i + 1) % ny == 0)

The correct statement should be:
Line 5. if (rank > 0 || i >= ny)
Line 52. if ((i + 1) % ny != 0)

After the above mentioned correction, the output for 4x4 matrix is attached. The solution has been verified against matlab solution.

output.log

Inserting debug prints

Is there a way to easily insert debug prints in the library?
For example, I want to create a file with the X and B vector for every iteration. base/src/solver.cu:solve() contains the main iteration loop, and calls solve_iteration(b, x, xIsZero).
They are declared as 'Vector &' as parameters, the solver I use (PBICGSTAB) receives 'VVector &' type parameters.
I tried 'printf("%.5e\n", b[i]); in pbicgstab_solver.cu, but it warns 'non-POD class type passed through ellipsis' and it prints zeros for b, where it should be ones. The value of x starts at zeros (initial guess), but is 6.9e-310 (smallest positive value) in subsequent iterations.
'printf("%.5e\n", b.pod()[i]); , but it results in a SIGSEGV
Printing the types of 'x.pod()' and 'x.pod[0]' result in 'N4amgx9PODVectorIdiEE' and 'd' respectively.

AMGX_solver_get_status() requires monitor_residual.

If monitor_residual is not set or set to 0 in the config, then AMGX_solver_get_status() silently returns 0 (success) regardless of convergence. Perhaps this should be made explicit in the reference for AMGX_solver_get_status.

The issue on the "AMG" solver

the "AMG” solver has mistake,given below

Building AMGX using Nix

I'm working on a Nix recipe for AMGX. It currently looks like this. Currently it seems to build without any issues. However, when I try to run an example I get the following,

$ examples/amgx_capi -m ../examples/matrix.mtx -c ../core/configs/FGMRES_AGGREGATION.json
AMGX version 2.0.0.130-opensource
Built on May 14 2018, 23:06:38
AMGX ERROR: file /tmp/nix-build-AmgX.drv-0/lafn8qxabfn95rh3bh3y0bi113kzwl8w-source/examples/amgx_capi.c line    245
AMGX ERROR: Error initializing amgx core.
Failed while initializing CUDA runtime in cudaRuntimeGetVersion

Any ideas?

DILU and ILU preconditioning errors

I'm trying to test CG with DILU and ILU preconditioning to make a solver comparison and I'm having troubles getting them working. I'm using PCG_DILU.json provided by AMGX as the baseline config file, solving the example matrix provided ( examples/matrix.mtx ).

Environment:

CUDA 9.2
Ubuntu 18.04 LTS

PCG_DILU.json

{
    "config_version": 2, 
    "solver": {
        "preconditioner": {
            "scope": "precond", 
            "solver": "MULTICOLOR_DILU"
        }, 
        "solver": "PCG", 
        "print_solve_stats": 1, 
        "obtain_timings": 1, 
        "max_iters": 20, 
        "monitor_residual": 1, 
        "scope": "main", 
        "tolerance": 1e-06, 
        "norm": "L2"
    }
}

And I get the following results:

AMGX version 2.0.0.130-opensource
Built on Jun 27 2018, 09:03:34
Compiled with CUDA Runtime 9.2, using CUDA driver 9.2
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini                   0   3.464102e+00
              0                   0            nan            nan
              1              0.0000           -nan           -nan
              2              0.0000            nan            nan
              3              0.0000           -nan           -nan
              4              0.0000            nan            nan
              5              0.0000           -nan           -nan
              6              0.0000            nan            nan
              7              0.0000           -nan           -nan
              8              0.0000            nan            nan
              9              0.0000           -nan           -nan
             10              0.0000            nan            nan
             11              0.0000           -nan           -nan
             12              0.0000            nan            nan
             13              0.0000           -nan           -nan
             14              0.0000            nan            nan
             15              0.0000           -nan           -nan
             16              0.0000            nan            nan
             17              0.0000           -nan           -nan
             18              0.0000            nan            nan
             19              0.0000           -nan           -nan
         --------------------------------------------------------------
         Total Iterations: 20
         Avg Convergence Rate: 		           -nan
         Final Residual: 		           -nan
         Total Reduction in Residual: 	           -nan
         Maximum Memory Usage: 		          0.000 GB
         --------------------------------------------------------------
Total Time: 0.125037
    setup: 0.000589824 s
    solve: 0.124448 s
    solve(per iteration): 0.00622238 s

For ILU Preconditioning I made a minor modification to the original PCG_DILU.json

{
    "config_version": 2, 
    "solver": {
        "preconditioner": {
            "scope": "precond", 
			"ilu_sparsity_level": 0,
			"coloring_level": 1,
            "solver": "MULTICOLOR_ILU"
        }, 
        "solver": "PCG", 
        "print_solve_stats": 1, 
        "obtain_timings": 1, 
        "max_iters": 20, 
        "monitor_residual": 1, 
        "scope": "main", 
        "tolerance": 1e-06, 
        "norm": "L2"
    }
}

This gives me the following output

AMGX version 2.0.0.130-opensource
Built on Jun 27 2018, 09:03:34
Compiled with CUDA Runtime 9.2, using CUDA driver 9.2
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Caught amgx exception: Multicolor ILU smoother requires matrix to be reordered by color with ILU0 solver. Try setting reorder_cols_by_color=1 and insert_diag_while_reordering=1 in the multicolor_ilu solver scope in configuration file
 at: /opt/software/repos/AMGX/core/src/solvers/multicolor_ilu_solver.cu:1902
Stack trace:
 libamgxsh.so : amgx::multicolor_ilu_solver::MulticolorILUSolver_Base<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::pre_setup()+0x4ee
 libamgxsh.so : amgx::multicolor_ilu_solver::MulticolorILUSolver_Base<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solver_setup(bool)+0x8a
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x1d4
 libamgxsh.so : amgx::PCG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solver_setup(bool)+0x46
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x1d4
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup_no_throw(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x7c
 libamgxsh.so : amgx::AMG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup(amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&)+0x5c
 libamgxsh.so : amgx::AMGX_ERROR amgx::(anonymous namespace)::set_solver_with_shared<(AMGX_Mode)8193, amgx::AMG_Solver, amgx::Matrix>(AMGX_solver_handle_struct*, AMGX_matrix_handle_struct*, amgx::Resources*, amgx::AMGX_ERROR (amgx::AMG_Solver<amgx::TemplateMode<(AMGX_Mode)8193>::Type>::*)(std::shared_ptr<amgx::Matrix<amgx::TemplateMode<(AMGX_Mode)8193>::Type> >))+0xc9
 libamgxsh.so : AMGX_solver_setup()+0x183
 examples/amgx_capi : main()+0x4d7
 /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0xe7
 examples/amgx_capi : _start()+0x2a

Caught amgx exception: Error, setup must be called before calling solve
 at: /opt/software/repos/AMGX/base/src/solvers/solver.cu:598
Stack trace:
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x1fe1
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_no_throw(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::AMGX_STATUS&, bool)+0x82
 libamgxsh.so : amgx::AMG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::AMGX_STATUS&, bool)+0x3d
 libamgxsh.so : amgx::AMGX_ERROR amgx::(anonymous namespace)::solve_with<(AMGX_Mode)8193, amgx::AMG_Solver, amgx::Vector>(AMGX_solver_handle_struct*, AMGX_vector_handle_struct*, AMGX_vector_handle_struct*, amgx::Resources*, bool)+0xdf
 libamgxsh.so : AMGX_solver_solve()+0x17f
 examples/amgx_capi : main()+0x4eb
 /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0xe7
 examples/amgx_capi : _start()+0x2a

As the output messages states, I added reorder_cols_by_color=1 and insert_diag_while_reordering=1

{
    "config_version": 2, 
    "solver": {
        "preconditioner": {
            "scope": "precond", 
			"ilu_sparsity_level": 0,
			"coloring_level": 1,
			"reorder_cols_by_color": 1,
			"insert_diag_while_reordering": 1,
            "solver": "MULTICOLOR_ILU"
        }, 
        "solver": "PCG", 
        "print_solve_stats": 1, 
        "obtain_timings": 1, 
        "max_iters": 20, 
        "monitor_residual": 1, 
        "scope": "main", 
        "tolerance": 1e-06, 
        "norm": "L2"
    }
}

Then I get the following output

AMGX version 2.0.0.130-opensource
Built on Jun 27 2018, 09:03:34
Compiled with CUDA Runtime 9.2, using CUDA driver 9.2
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Caught amgx exception: Unsupported block size for Multicolor ILU solver, computeLUFactors
 at: /opt/software/repos/AMGX/core/src/solvers/multicolor_ilu_solver.cu:1818
Stack trace:
 libamgxsh.so : amgx::multicolor_ilu_solver::MulticolorILUSolver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::computeLUFactors()+0x88b
 libamgxsh.so : amgx::multicolor_ilu_solver::MulticolorILUSolver_Base<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solver_setup(bool)+0xa2
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x1d4
 libamgxsh.so : amgx::PCG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solver_setup(bool)+0x46
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x1d4
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup_no_throw(amgx::Operator<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x7c
 libamgxsh.so : amgx::AMG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::setup(amgx::Matrix<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&)+0x5c
 libamgxsh.so : amgx::AMGX_ERROR amgx::(anonymous namespace)::set_solver_with_shared<(AMGX_Mode)8193, amgx::AMG_Solver, amgx::Matrix>(AMGX_solver_handle_struct*, AMGX_matrix_handle_struct*, amgx::Resources*, amgx::AMGX_ERROR (amgx::AMG_Solver<amgx::TemplateMode<(AMGX_Mode)8193>::Type>::*)(std::shared_ptr<amgx::Matrix<amgx::TemplateMode<(AMGX_Mode)8193>::Type> >))+0xc9
 libamgxsh.so : AMGX_solver_setup()+0x183
 examples/amgx_capi : main()+0x4d7
 /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0xe7
 examples/amgx_capi : _start()+0x2a

Caught amgx exception: Error, setup must be called before calling solve
 at: /opt/software/repos/AMGX/base/src/solvers/solver.cu:598
Stack trace:
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x1fe1
 libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_no_throw(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::AMGX_STATUS&, bool)+0x82
 libamgxsh.so : amgx::AMG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::AMGX_STATUS&, bool)+0x3d
 libamgxsh.so : amgx::AMGX_ERROR amgx::(anonymous namespace)::solve_with<(AMGX_Mode)8193, amgx::AMG_Solver, amgx::Vector>(AMGX_solver_handle_struct*, AMGX_vector_handle_struct*, AMGX_vector_handle_struct*, amgx::Resources*, bool)+0xdf
 libamgxsh.so : AMGX_solver_solve()+0x17f
 examples/amgx_capi : main()+0x4eb
 /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0xe7
 examples/amgx_capi : _start()+0x2a

Perhaps I'm doing something wrong modifying the config file for ILU preconditioner. But for DILU I'm using the original config provided in the AMGX (I get same results with other solvers that use DILU preconditioner).

Any help or feedback would be very much appreciated

Can the library connect with other program by csr matrix but not mtx?

Fron the paper"AMGX: A LIBRARY FOR GPU ACCELERATED ALGEBRAIC MULTIGRID AND PRECONDITIONED ITERATIVE METHODS∗", I know that AMGX library use CSR matrix itself.
Can I provide the CSR format to AMGX to calculate directly?

An issue from Fraunhofer-Chalmers Centre: Compute Capability 3.0

Is CC 3.0 really not compatible? We have tested and everything seems to be fine on 660ti. Nonetheless, there are many occurrences of hardcoded requirement of CC>=3.5. Is it safe to change to >=3.0?

Question on AMGX configuration of published results.

Hi,
I have built AMGX and it works fine. For verification, I am trying to reproduce the results in Table 2 on page S618 in the article on AMGX by N.Naumov et.al. in SIAM J.Sci.Comput., vol 37, no.5, pp S602-S626 (2015). Trying different settings, but can not make the number of iterations agree (hardware is different so execution time will not agree) for any of the cases. It is not clear from the paper exactly which settings AMGX is using, if it is only AMG or AMG as preconditioner to another iterative method ? Is there by any chance a configuration file available for those runs ?
Best Regards
Bjorn

[Question] Does it support devices of compute capability 70?

I have a Titan V card and I've successfully compiled the codes. But when I was trying to run the example I got the following errors. does 'invalid device function' mean that it currently does not support Titan V? Thanks.

➜  build git:(master) examples/amgx_capi -m ../examples/matrix.mtx -c ../core/configs/FGMRES_AGGREGATION.json
AMGX version 2.0.0.130-opensource
Built on Mar 16 2018, 21:21:21
Compiled with CUDA Runtime 9.1, using CUDA driver 9.1
Warning: No mode specified, using dDDI by default.
Thrust failure: parallel_for failed: invalid device function
File and line number are not available for this exception.
Caught amgx exception: Cuda failure: 'invalid device function'

An issue from Fraunhofer-Chalmers Centre: Windows builds

@marsaev
What is the current status regarding Windows builds? We have been trying to build latest AMGX with both VS 2015/2017 and CUDA TK 8/9 versions, with no success. In https://github.com/bnase/AMGX you can find a fork with the latest fixes we have employed in order to build AMGX under Windows. Seems we have solved some, but not all of the issues.

could you compile non-mpi build on MS VS 2017 + Cmake 3.14.1 + CUDA 10.0,

Build by vs2017, CUDA10.0 ERRORS

I can,t solve it......
error LNK1104 无法打开文件“D:\Users\HIT\Desktop\AMGX-master\build6\base\CMakeFiles\amgx_base.dir\src\Debug\amgx_base_generated_csr_multiply_sm20.cu.obj” amgxsh D:\Users\HIT\Desktop\AMGX-master\build6\LINK 1

error calling a host function("std::_Iterator_base12::~~_Iterator_base12") from a host device function("std::_Iterator_base12::~~_Iterator_base12 [subobject]") is not allowed amgx_base C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\include\thrust\system\cuda\detail\assign_value.h 78

An issue from Fraunhofer-Chalmers Centre: Unit tests

Which unit tests are relevant and up to date?
We have extensively test on different GPUs, CUDA 8 vs 9.0 and quite a few keep failing under Linux. Besides the missing matrices, which we have filled in from either Matrix Market or proper use of generate_poisson, there are some tests that seem basic and keep failing. Find attached the logs of unit tests on an array of GPUs and configurations.

gtx970_9.0.log
gtx970.log
gtx1080ti.log
gtx660ti.log

All unit tests were compiled with CUDA 8.0, except gtx_970_9.0.log. The tests performed on 660ti were done after we have changed some (possibly too strict? check #8) hardcoded CUDA_ARCH>=3.5 to CUDA_ARCH>=3.0.

We will be more than happy to fix them, as long as we have a list of the tests that are supposed to work, because some seem very outdated or poorly maintained.

How long it cost you that compiling the AMG.sln？

It cost hours. I don't know it is normal or bug.

Can't understand the indexing scheme of coefficient matrix-A in Poisson's eqn solver example

Hi! I am modifying the Poisson's equation solver example for incorporating variable coefficient Poisson's equation, however, I am unable to interpret the indexing scheme.

The precise issue is that according to my understanding for every grid-point in my original matrix (for which I am solving the 5 point discretized 2D Poisson's equation), I should have 5 coefficients (which should be -1, -1, 4, -1 and -1), however, I find that the number of coefficients are varying from 2 to 5 for different grid-points.

In order to understand the example code before modifying it for variable coefficient case, I printed the grid-point index and its coefficient value for a simple case of 4x4 matrix with single CPU and GPU processor.
I ran the code as follows:
mpirun -np 1 examples/amgx_mpi_poisson5 -mode dDDI -p 4 4 -c ../core/configs/FGMRES_AGGREGATION_JACOBI.json

The output is attached herewith.
output.log

The example code is also attached.
amgx_mpi_poisson5pt.zip

Extract multiple eigenvalues

Hello guys.
I try to figure out how to get multiple eigenvalues using your example 'eigen_examples/eigensolver.c'. Can it be done by using specific config file or I have to modify the code?
I suppose that this algorithm works like this: starting from random vector and by modifying (through iteration) this vector getting closer to eigenvector?
Then i have to start from different random vectors to receive multiple eigenvalues, but this will not give me all of the eigenvalues every time for sure.
Is there any info I can read about this eigensolvers algorithms or I have to get this info by reading all the code?
Cheers.

Cannot read file as JSON object

When I use amgx_capi to solve some sparse linear system, it comes out that most of the default configs cannot be loaded into the API. Some default config will be used and the number of AMG levels is set to be 1.

Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version

For some large enough sparse systems, these default settings won't converge.

For instance, ../core/configs/FGMRES_AGGREGATION_JACOBI.json will be loaded, however, ../core/configs/FGMRES_CLASSICAL_AGGRESSIVE_HMIS.json and ../core/configs/AMG_CLASSICAL_L1_AGGRESSIVE_HMIS.json will not.

Problem compiling my program

Hi guys, i am trying compiling my simple program usign AMGX. But when i try, initialize the AMGX, e.g:

#include<"Amgx_c.h">

main(){

AMGX_initialize()
}

I have one error, "Undefined reference to AMGX_initialize".

What happens?
Best regards.

I can't compile this project by Cmake at all for many day and nights

I triede many times, and can't get useful imformations from Internet.
system information:
CUDA 10.0
Cmake cmake-3.13.0-rc3-win64-x64
IDE VS 2017
TOOOOOOO many errors showed below! This made me MAD.

Found OpenMP_C: -openmp
Found OpenMP_CXX: -openmp
Found OpenMP: TRUE
Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS)
Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND)
This is a MPI build:FALSE
CUDA_TOOLKIT_ROOT_DIR not found or specified
Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
Cuda libraries: CUDA_CUDART_LIBRARY-NOTFOUND
CMake Error at CMakeLists.txt:247 (STRING):
STRING sub-command REGEX, mode REPLACE needs at least 6 arguments total to
command.

CUDA_TOOLKIT_ROOT_DIR not found or specified
Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
CUDA_TOOLKIT_ROOT_DIR not found or specified
Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_CUDART_LIBRARY (ADVANCED)
linked by target "amgx" in directory D:/Users/HIT/Desktop/AMGX-master
linked by target "amgxsh" in directory D:/Users/HIT/Desktop/AMGX-master
linked by target "amgx_base" in directory D:/Users/HIT/Desktop/AMGX-master/base
linked by target "amgx_core" in directory D:/Users/HIT/Desktop/AMGX-master/core
linked by target "amgx_template_plugin" in directory D:/Users/HIT/Desktop/AMGX-master/template_plugin
linked by target "amgx_eigensolvers" in directory D:/Users/HIT/Desktop/AMGX-master/eigensolvers
linked by target "generate_poisson" in directory D:/Users/HIT/Desktop/AMGX-master/examples
linked by target "generate_poisson7_dist_renum" in directory D:/Users/HIT/Desktop/AMGX-master/examples
linked by target "amgx_tests_library" in directory D:/Users/HIT/Desktop/AMGX-master/tests
linked by target "amgx_tests_launcher" in directory D:/Users/HIT/Desktop/AMGX-master/tests
CUDA_TOOLKIT_INCLUDE (ADVANCED)
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/base
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/base
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/core
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/core
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/core
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/template_plugin
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/template_plugin
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/eigensolvers
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/eigensolvers
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/eigen_examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/eigen_examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/eigen_examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/eigen_examples
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
used as include directory in directory D:/Users/HIT/Desktop/AMGX-master/tests
cublas_library
linked by target "amgx" in directory D:/Users/HIT/Desktop/AMGX-master
linked by target "amgxsh" in directory D:/Users/HIT/Desktop/AMGX-master
cusolver_library
linked by target "amgx" in directory D:/Users/HIT/Desktop/AMGX-master
linked by target "amgxsh" in directory D:/Users/HIT/Desktop/AMGX-master
linked by target "generate_poisson" in directory D:/Users/HIT/Desktop/AMGX-master/examples
linked by target "generate_poisson7_dist_renum" in directory D:/Users/HIT/Desktop/AMGX-master/examples
cusparse_library
linked by target "amgx" in directory D:/Users/HIT/Desktop/AMGX-master
linked by target "amgxsh" in directory D:/Users/HIT/Desktop/AMGX-master
linked by target "generate_poisson" in directory D:/Users/HIT/Desktop/AMGX-master/examples
linked by target "generate_poisson7_dist_renum" in directory D:/Users/HIT/Desktop/AMGX-master/examples
linked by target "amgx_tests_launcher" in directory D:/Users/HIT/Desktop/AMGX-master/tests

Configuring incomplete, errors occurred!
See also "D:/Users/HIT/Desktop/AMGX-master/build/CMakeFiles/CMakeOutput.log".

Your's
Ning

CMake requires CUDA9.5 flags

Hi,

Thanks for open-sourcing AMGX! Looking forward to using it.

Meanwhile, I am trying to build it on my up-to-date Arch Linux laptop with Quadro and CUDA 9.1 on it. I am aiming for a distributed version so I do the following:

install_dir=${HOME}/apps/amgx                                                                                                                                                                                                                                                   

export OMPI_CC=/opt/cuda/bin/gcc
export OMPI_CXX=/opt/cuda/bin/g++

cmake \
    -DCMAKE_INSTALL_PREFIX=$install_dir \
    ../

I am getting a little confusing error about setting CUDA 9.5 flags inside CMakeLists.txt even though 9.1 was correctly detected.

-- Found CUDA: /opt/cuda (found version "9.1")                                                                                                                                                                                                                                  
Cuda libraries: /opt/cuda/lib64/libcudart_static.a-lpthreaddl/usr/lib/librt.so                                                                                                                                                                                                  
CMake Error at CMakeLists.txt:247 (message):                                                                                                                                                                                                                                    
  Default flags for CUDA 9.5 are not set.  Edit CMakeLists.txt to set them

Could you please comment on this and point me out in the right direction?

Compiling lib from scratch fails

git clone https://github.com/NVIDIA/AMGX.git
cd amgx; mkdir build; cd build; cmake ..

This fails, because I don't pass any intended cuda architectures, when removing CMakeLists.txt:247, cmake runs successfully.

Running 'make' then results in an error around 7%:
amgx/core/src/classical/interpolators/distance2.cu(1343): error: identifier "sign" is undefined
detected during:
instantiation of "void amgx::distance2::compute_inner_sum_kernel<Value_type,CTA_SIZE,SMEM_SIZE,WARP_SIZE>(int, const int *, const int *, const Value_type *, const int *, const __nv_bool *, const int *, const int *, const int *, const int *, const Value_type *, const int *, Value_type *, int, int *, int *) [with Value_type=double, CTA_SIZE=256, SMEM_SIZE=128, WARP_SIZE=32]"
(2418): here
instantiation of "void amgx::Distance2_Interpolator<amgx::TemplateConfig<(AMGX_MemorySpace)1, t_vecPrec, t_matPrec, t_indPrec>>::generateInterpolationMatrix_1x1(amgx::Distance2_Interpolator<amgx::TemplateConfig<(AMGX_MemorySpace)1, t_vecPrec, t_matPrec, t_indPrec>>::Matrix_d &, amgx::Distance2_Interpolator<amgx::TemplateConfig<(AMGX_MemorySpace)1, t_vecPrec, t_matPrec, t_indPrec>>::IntVector &, amgx::Distance2_Interpolator<amgx::TemplateConfig<(AMGX_MemorySpace)1, t_vecPrec, t_matPrec, t_indPrec>>::BVector &, amgx::Distance2_Interpolator<amgx::TemplateConfig<(AMGX_MemorySpace)1, t_vecPrec, t_matPrec, t_indPrec>>::IntVector &, amgx::Distance2_Interpolator<amgx::TemplateConfig<(AMGX_MemorySpace)1, t_vecPrec, t_matPrec, t_indPrec>>::Matrix_d &, void *) [with t_vecPrec=(AMGX_VecPrecision)0, t_matPrec=(AMGX_MatPrecision)0, t_indPrec=(AMGX_IndPrecision)2]"
(2521): here

Line 1343 indeed calls the function sign(), which is defined, but only as a device function.
I do have cuda-10 installed, but the cuda-9.2 preceeds cuda-10 in $PATH.

Undefined variables in Debug build

base/include/amgx_timer.h uses undefined variables (t1 and t2) in code that is activated during a Debug build.

AMGX/base/include/amgx_timer.h

Line 233 in 732338c

m_time = std::max( 0ull, (duration_cast<nanoseconds>(t2 - t1)).count());

Replace calls to MAGMA/MKL with corresponding calls to cuSolver.

Where possible. Probably better to wait for CUDA 9.2 release for updated cuSolver functionality.

GMRES segfault when preconditioner="NOSOLVER"

The GMRES solver produces a segfault when preconditioner="NOSOLVER":

[atrikut@node1144 examples]$ cat ../configs/core/GMRES.json
{
    "config_version": 2,
    "solver": {
        "preconditioner": {
            "scope": "amg",
            "solver": "NOSOLVER"
        },
        "use_scalar_norm": 1,
        "solver": "GMRES",
        "print_solve_stats": 1,
        "obtain_timings": 1,
        "monitor_residual": 1,
        "convergence": "RELATIVE_INI_CORE",
        "scope": "main",
        "tolerance": 1e-6,
        "norm": "L2"
    }
}
[atrikut@node1144 examples]$ ./amgx_capi -m matrix.mtx -c ../configs/core/GMRES.json

▽
  1 {
AMGX version 2.0.0.130-opensource
Built on May 17 2018, 05:29:30
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini                   0   3.464102e+00
              0                   0   1.845471e+00         0.5327
              1              0.0000   1.541877e+00         0.8355
              2              0.0000   1.374225e+00         0.8913
              3              0.0000   1.366903e+00         0.9947
              4              0.0000   1.040855e+00         0.7615
              5              0.0000   1.026638e+00         0.9863
              6              0.0000   8.614123e-01         0.8391
              7              0.0000   6.599583e-01         0.7661
              8              0.0000   6.596676e-01         0.9996
              9              0.0000   6.593714e-01         0.9996
             10              0.0000   6.331763e-01         0.9603
Caught signal 11 - SIGSEGV (segmentation violation)
 /home/atrikut/local/AMGX/build/libamgxsh.so : amgx::handle_signals(int)+0xbb
 /lib64/libpthread.so.0 : ()+0xf680
 /home/atrikut/local/AMGX/build/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x11
 /home/atrikut/local/AMGX/build/libamgxsh.so : amgx::GMRES_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_iteration(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x5d2
 /home/atrikut/local/AMGX/build/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, bool)+0x554
 /home/atrikut/local/AMGX/build/libamgxsh.so : amgx::Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve_no_throw(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::AMGX_STATUS&, bool)+0x77
 /home/atrikut/local/AMGX/build/libamgxsh.so : amgx::AMG_Solver<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >::solve(amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::Vector<amgx::TemplateConfig<(AMGX_MemorySpace)1, (AMGX_VecPrecision)0, (AMGX_MatPrecision)0, (AMGX_IndPrecision)2> >&, amgx::AMGX_STATUS&, bool)+0x3d
 /home/atrikut/local/AMGX/build/libamgxsh.so : amgx::AMGX_ERROR amgx::(anonymous namespace)::solve_with<(AMGX_Mode)8193, amgx::AMG_Solver, amgx::Vector>(AMGX_solver_handle_struct*, AMGX_vector_handle_struct*, AMGX_vector_handle_struct*, amgx::Resources*, bool)+0x268
 /home/atrikut/local/AMGX/build/libamgxsh.so : AMGX_solver_solve()+0x147
 ./amgx_capi : main()+0x34d
 /lib64/libc.so.6 : __libc_start_main()+0xf5
 ./amgx_capi() [0x402233]
Segmentation fault (core dumped)

But when preconditioner is not NOSOLVER:

[atrikut@node1144 examples]$ cat ../configs/core/GMRES.json
{
    "config_version": 2,
    "solver": {
        "preconditioner": {
            "scope": "amg",
            "solver": "BLOCK_JACOBI"
        },
        "use_scalar_norm": 1,
        "solver": "GMRES",
        "print_solve_stats": 1,
        "obtain_timings": 1,
        "monitor_residual": 1,
        "convergence": "RELATIVE_INI_CORE",
        "scope": "main",
        "tolerance": 1e-6,
        "norm": "L2"
    }
}
[atrikut@node1144 examples]$ ./amgx_capi -m matrix.mtx -c ../configs/core/GMRES.json
AMGX version 2.0.0.130-opensource
Built on May 17 2018, 05:29:30
Compiled with CUDA Runtime 8.0, using CUDA driver 9.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
           iter      Mem Usage (GB)       residual           rate
         --------------------------------------------------------------
            Ini                   0   3.464102e+00
              0                   0   1.934535e+00         0.5585
              1              0.0000   1.934535e+00         1.0000
              2              0.0000   1.934535e+00         1.0000
              3              0.0000   1.644110e+00         0.8499
              4              0.0000   1.507196e+00         0.9167
              5              0.0000   1.213641e+00         0.8052
              6              0.0000   1.160986e+00         0.9566
              7              0.0000   1.092385e+00         0.9409
              8              0.0000   1.088166e+00         0.9961
              9              0.0000   8.810365e-01         0.8097
             10              0.0000   4.990786e-01         0.5665
             11              0.0000   4.764949e-01         0.9547
             12              0.0000   1.059929e-01         0.2224
             13              0.0000   1.677930e-15         0.0000
         --------------------------------------------------------------
         Total Iterations: 14
         Avg Convergence Rate: 		         0.0806
         Final Residual: 		   1.677930e-15
         Total Reduction in Residual: 	   4.843766e-16
         Maximum Memory Usage: 		          0.000 GB
         --------------------------------------------------------------
Total Time: 0.0461597
    setup: 0.000248704 s
    solve: 0.045911 s
    solve(per iteration): 0.00327936 s
[atrikut@node1144 examples]$

How to change/generate bigger problem size instead of the sample input size "matrix.mtx"?

Hi all,

How to change/generate bigger problem size instead of the sample input size "matrix.mtx"?

Thanks

Solving a system with multiple right-hand sides

Is there currently a way to solve a system A*X=B, where X and B are not a single but multiple vectors, at once? The only part of the documentation I could find that approaches this topic talks about computing multiple right-hand sides in succession.
There certainly are some applications where prior knowledge of multiple right-hand side vectors b^i is given and computing the solution to them simultaneously could lead to efficieny gains.

Python interface to AMGX

Hello, I'm working on a Python interface to AMGX:

https://github.com/shwina/pyamgx

The project is new, but there is enough to setup and solve a system (single GPU). I'm looking for any advice/comments on the API and desired features. Thanks!

calling a _host_ function from a _host_ _device_ function is not allowed

I was using vs2015 update3 and cmake 3.13.1 to compile. After meeting the problem in https://github.com/NVIDIA/AMGX/issues/36 and having solved the problem, I got another problem in building :

1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include\thrust/detail/allocator/allocator_traits.inl(230): error : calling a __host__ function("std::_Iterator_base12::_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::_Iterator_base12 [subobject]") is not allowed
1>
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include\thrust/detail/allocator/allocator_traits.inl(230): error : calling a __host__ function("std::_Iterator_base12::~_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::~_Iterator_base12 [subobject]") is not allowed
1>
1> 2 errors detected in the compilation of "C:/Users/i/AppData/Local/Temp/tmpxft_00005b44_00000000-10_comms_mpi_hostbuffer_stream.cpp1.ii".
1> comms_mpi_hostbuffer_stream.cu
1> CMake Error at amgx_base_generated_comms_mpi_hostbuffer_stream.cu.obj.Debug.cmake:283 (message):
1> Error generating file
1> C:/AMGX-master/build/base/CMakeFiles/amgx_base.dir/src/distributed/Debug/amgx_base_generated_comms_mpi_hostbuffer_stream.cu.obj

and

2>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include\thrust/system/cuda/detail/assign_value.h(78): error : calling a __host__ function("std::_Iterator_base12::_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::_Iterator_base12 [subobject]") is not allowed
2>
2>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\include\thrust/system/cuda/detail/assign_value.h(78): error : calling a __host__ function("std::_Iterator_base12::~_Iterator_base12") from a __host__ __device__ function("std::_Iterator_base12::~_Iterator_base12 [subobject]") is not allowed
2>
2> 2 errors detected in the compilation of "C:/Users/i/AppData/Local/Temp/tmpxft_00002bd0_00000001-10_matrix_analysis.cpp1.ii".
2> matrix_analysis.cu
2> CMake Error at amgx_core_generated_matrix_analysis.cu.obj.Debug.cmake:283 (message):
2> Error generating file
2> C:/AMGX-master/build/core/CMakeFiles/amgx_core.dir/src/Debug/amgx_core_generated_matrix_analysis.cu.obj

I was really confused that these errors occured in the CUDA thrust head files and I couldn't figure it out.
I wonder if anyone else meets the same problem.

GTC 2018 Anyone? (this is not an issue)

Hi,
is anyone from the develpers team or any user at GTC 2018 in San Jose?

I would be interested in meeting developers / other users to understand better the road-map of AMGX and share experiences in the use of AMGX.

Best Regards,

Andrea Borsic

AMGx adapter in Trilinos/MueLu

Hello,
In 2015, an AMGx adapter was created within Trilinos's MueLU package [E. Furst, A. Prokopenko, J. Hu, 'Creating an AMGX adapter within the MueLu package', CCR Summer Proceedings 2015]. It is however restricted to a single GPU. Do you know whether work is currently being done (or planned) so as to make it work for more than a single GPU?
Thank you very much.
With best regards,
Serge

Extract eigenvalues and eigenvectors from AMGX_eigensolver_handle

Hello, I can't sort how to extract the eigenvalues and eigenvectors from the AMGX_eigensolver_handle. I tried following the eigen_example; however it is not clear how to get this information form the example.

Thank you,
Miguel