dmalhotra / pvfmm Goto Github PK
View Code? Open in Web Editor NEWA parallel kernel-independent FMM library for particle and volume potentials
Home Page: http://pvfmm.org
License: GNU Lesser General Public License v3.0
A parallel kernel-independent FMM library for particle and volume potentials
Home Page: http://pvfmm.org
License: GNU Lesser General Public License v3.0
It seems that when one tries to auto-download PVFMM with FetchContent_Declare()
in CMake, the resulting call to make
is broken because pvfmm_config.h
is not found. It seems that ${CMAKE_CURRENT_SOURCE_DIR}
is not added through target_include_directories()
, but the root level CMakeLists.txt writes pvfmm_config.h
to the root directory (seen here ).
It seems that there are two possible fixes for this:
pvfmm_config.h
in the project root is relevant to other parts of the build, simply adding:configure_file(pvfmm_config.h.in include/pvfmm_config.h @ONLY)
below line 70 seems to address the problem in a non-elegant but sufficient way. This is probably the least effort solution and makes some sense. It seems that files in PVFMM explicitly depend on pvfmm_config.h
, so maybe it should live in include/
.
2. Explicitly adding something like ${PROJECT_SOURCE_DIR}
to the target_include_directories()
calls also works. But this could have problematic effects with the calls to install()
if it is written without the proper generator expression to switch locations for the install and build interfaces.
A minimal example to reproduce the error:
root level CMakeLists.txt
cmake_minimum_required(VERSION 3.1)
set(CMAKE_CXX_STANDARD 14)
project(test)
include(cmake/PVFMM.cmake)
In cmake/PVFMM.cmake
:
if (NOT TARGET PVFMM::PVFMM)
include(FetchContent)
FetchContent_Declare(
PVFMM
SOURCE_DIR ${CMAKE_BINARY_DIR}/_deps/PVFMM
BINARY_DIR ${CMAKE_BINARY_DIR}/_deps/PVFMM
GIT_REPOSITORY https://github.com/dmalhotra/pvfmm.git
GIT_TAG ffec8376dac7e2df134e56c1a37f22051ec483bb
GIT_SHALLOW TRUE
)
set(CMAKE_BUILD_TYPE Release CACHE INTERNAL "Release or debug mode")
FetchContent_GetProperties(PVFMM)
if(NOT pvfmm_POPULATED)
FetchContent_Populate(PVFMM)
message("pvfmm_SOURCE_DIR " ${pvfmm_SOURCE_DIR} " " ${pvfmm_BINARY_DIR})
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CMAKE_CURRENT_SOURCE_DIR}/cmake)
add_subdirectory(${pvfmm_SOURCE_DIR} ${pvfmm_BINARY_DIR})
add_library(PVFMM::PVFMM INTERFACE IMPORTED)
target_include_directories(PVFMM::PVFMM SYSTEM INTERFACE
${pvfmm_SOURCE_DIR}/include)
target_link_libraries(PVFMM::PVFMM INTERFACE
${pvfmm_BINARY_DIR}/lib/libpvfmm.a)
endif()
endif()
in the root level run:
mkdir build
cd build
cmake ..
make
Thanks!!
when i compile the library, it shows no include sctl.hpp, how can i solve this issue?
/home/alex/dev/pvfmm/include/pvfmm_common.hpp:64:10: fatal error: sctl.hpp: No such file or directory
64 | #include <sctl.hpp>
| ^~~~~~~~~~
compilation terminated.
I've tried to compile the code using g++ (Debian 6.3.0-6), but it failed when USE_SSE variable was defined with the following message
In file included from /home/uranix/pvfmm-dev/include/kernel.hpp:202:0,
from /home/uranix/pvfmm-dev/include/cheb_utils.hpp:12,
from /home/uranix/pvfmm-dev/src/cheb_utils.cpp:8:
/home/uranix/pvfmm-dev/include/kernel.txx: In function ‘void pvfmm::{anonymous}::stokesStressSSE(int, int, const double*, const double*, const double*, const double*, const double*, const double*, const double*, double*)’:
/home/uranix/pvfmm-dev/include/kernel.txx:2043:32: error: ‘T’ was not declared in this scope
double r = pvfmm::sqrt<T>(r2);
^
/home/uranix/pvfmm-dev/include/kernel.txx:2043:37: error: no matching function for call to ‘sqrt(double&)’
double r = pvfmm::sqrt<T>(r2);
^
In file included from /home/uranix/pvfmm-dev/include/pvfmm_common.hpp:62:0,
from /home/uranix/pvfmm-dev/include/cheb_utils.hpp:10,
from /home/uranix/pvfmm-dev/src/cheb_utils.cpp:8:
/home/uranix/pvfmm-dev/include/math_utils.hpp:29:15: note: candidate: template<class Real_t> Real_t pvfmm::sqrt(Real_t)
inline Real_t sqrt(const Real_t a){return ::sqrt(a);}
It looks like T
was a template argument, but T
was later hardcoded to be double
.
The code is not compiling with standard setup sequence (autogen, configure, make) with Intel Composer_XE_2015.0 on Ubuntu, with a strange error possibly related to MIC.
The compiler log is attached
output.txt
Dear All,
I am new to pvFMM and trying to use a non-symmetric kernel.
As a first test, I changed the laplace_potent_kernel
to be non-symmetric (as given in the diff below)
Running the classical exemple1
with the potential kernel
const pvfmm::Kernel<double>& kernel_fn=pvfmm::LaplaceKernel<double>::potential();
indicates me that the kernel is non-symmetric and then segfaults in the U2U
part:
InitFMM_Pts {
no-symmetry for: laplace
LoadMatrices {
ReadFile {
}
Broadcast {
}
}
//some other output...
RunFMM {
UpwardPass {
S2U {
}
U2U {
*** Process received signal ***
Signal: Segmentation fault: 11 (11)
Signal code: (0)
Failing at address: 0x0
*** End of error message ***
Segmentation fault: 11
Do you have any tip/idea of what is going on? Do you support non-symmetric kernels?
diff --git a/include/kernel.txx b/include/kernel.txx
index 7867086..f919a8b 100755
--- a/include/kernel.txx
+++ b/include/kernel.txx
@@ -1108,7 +1108,7 @@ void laplace_poten_uKernel(Matrix<Real_t>& src_coord, Matrix<Real_t>& src_value,
Vec_t r2= mul_intrin(dx,dx) ;
r2=add_intrin(r2,mul_intrin(dy,dy));
- r2=add_intrin(r2,mul_intrin(dz,dz));
+ //r2=add_intrin(r2,mul_intrin(dz,dz));
Vec_t rinv=RSQRT_INTRIN(r2);
tv=add_intrin(tv,mul_intrin(rinv,sv));
@@ -1405,8 +1405,8 @@ void laplace_grad(T* r_src, int src_cnt, T* v_src, int dof, T* r_trg, int trg_cn
template<class T> const Kernel<T>& LaplaceKernel<T>::potential(){
- static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1>, laplace_dbl_poten<T,1> >("laplace" , 3, std::pair<int,int>(1,1),
- NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<T>);
+ static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1> >("laplace" , 3, std::pair<int,int>(1,1),
+ NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
return potn_ker;
}
template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
@@ -1418,8 +1418,8 @@ template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
template<> inline const Kernel<double>& LaplaceKernel<double>::potential(){
typedef double T;
- static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2>, laplace_dbl_poten<T,2> >("laplace" , 3, std::pair<int,int>(1,1),
- NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<double>);
+ static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2> >("laplace" , 3, std::pair<int,int>(1,1),
+ NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
return potn_ker;
}
template<> inline const Kernel<double>& LaplaceKernel<double>::gradient(){
PS: I am runing on MacOS and compiled with
./configure MPICXX=mpic++ CXX=icpc CC=icc F77=ifort CXXFLAGS="-mavx -g -std=c++11" CFLAGS="-mavx -g" FFLAGS="-mavx -g" --prefix=/opt/intel/intel_lib/pvfmm-1.0.0 --with-openmp-flag="qopenmp" --with-fftw-include="${FFTW_INC}" --with-fftw-lib="-mkl" --with-blas="-mkl" --with-lapack="-mkl" --disable-doxygen-doc --disable-doxygen-dot --disable-doxygen-html
I'm not sure how to fix it, but your SVD algorithm isn't incrementing k0 in some cases — it then enters an infinite loop.
First found it here: http://stackoverflow.com/questions/3856072/single-value-decomposition-implementation-c/25291714#25291714
Dear all,
I am starting with pvFMM and encountering an issue while trying to change the kernel evaluation. I tried to replace the evaluation of rinv = RSQRT_INTRIN
by another intrinsic, provided, rsqrt_approx_intrin
(which should be more accurate, up to my understanding).
I first try to run the program, without any modification (downloaded 5 march 2018), which goes through the example1
test smoothly.
./bin/exemple1 -N 4096
Maximum Absolute Error:2.47002e-05
Maximum Relative Error:4.5345e-09
Then, changing the evaluation of the rinv
intrinsic in the kernel evaluation makes the algorithm blow up (after make clean
everything).
./bin/exemple1 -N 4096
Maximum Absolute Error:3.07983e+09
Maximum Relative Error:565400
and runing the potential kernel only, gives me an error in the precomputation of the boundary conditions:
Cheb_Integ::Failed to converge.[6.93278e-09,-0.975,-0.975,-0.975]
Cheb_Integ::Failed to converge.[6.40075e-10,-0.975,-0.647222,-0.975]
Cheb_Integ::Failed to converge.[2.07315e-09,-0.975,-0.647222,-0.647222]
Cheb_Integ::Failed to converge.[2.08389e-09,-0.975,-0.647222,-0.319444]
...
Cheb_Integ::Failed to converge.[4.40053e-10,-0.975,0.00833333,0.663889]
Cheb_Integ::Failed to converge.[3.17001e-09,-0.975,0.00833333,0.991667]
Cheb_Integ::Failed to converge.[2.80505e-09,-0.975,0.00833333,1.31944]
Cheb_Integ::Failed to converge.[4.60546e-10,-0.975,0.00833333,1.64722]
Cheb_Integ::Failed to converge.[7.76104e-10,-0.975,0.336111,-0.975]
Cheb_Integ::Failed to converge.[8.40628e-11,-0.975,0.336111,-0.647222]
Cheb_Integ::Failed to converge.[1.85214e-09,-0.975,0.336111,-0.319444]
Cheb_Integ::Failed to converge.[4.40053e-10,-0.975,0.336111,0.00833333]
Here is the git diff
of the downloaded code. I checked the kernel and the value only differs from each other by 1e-4 to 1e-6.
Do you have any solution to that issue?
diff --git a/include/kernel.txx b/include/kernel.txx
index 7867086..5abd5df 100755
--- a/include/kernel.txx
+++ b/include/kernel.txx
@@ -1088,7 +1088,7 @@ void laplace_poten_uKernel(Matrix<Real_t>& src_coord, Matrix<Real_t>& src_value,
for(int i=0;i<NWTN_ITER;i++){
nwtn_scal=2*nwtn_scal*nwtn_scal*nwtn_scal;
}
- const Real_t OOFP = 1.0/(4*nwtn_scal*const_pi<Real_t>());
+ const Real_t OOFP = 1.0/(4*const_pi<Real_t>());
size_t src_cnt_=src_coord.Dim(1);
size_t trg_cnt_=trg_coord.Dim(1);
@@ -1110,7 +1110,7 @@ void laplace_poten_uKernel(Matrix<Real_t>& src_coord, Matrix<Real_t>& src_value,
r2=add_intrin(r2,mul_intrin(dy,dy));
r2=add_intrin(r2,mul_intrin(dz,dz));
- Vec_t rinv=RSQRT_INTRIN(r2);
+ Vec_t rinv=rsqrt_approx_intrin(r2);
tv=add_intrin(tv,mul_intrin(rinv,sv));
}
Vec_t oofp=set_intrin<Vec_t,Real_t>(OOFP);
EDIT
When using the potential kernel, by removing the scale invariance boolean and the dbl_layer kernel, i managed to reduce the error:
@@ -1405,8 +1405,8 @@ void laplace_grad(T* r_src, int src_cnt, T* v_src, int dof, T* r_trg, int trg_cn
template<class T> const Kernel<T>& LaplaceKernel<T>::potential(){
- static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1>, laplace_dbl_poten<T,1> >("laplace" , 3, std::pair<int,int>(1,1),
- NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<T>);
+ static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,1> >("laplace" , 3, std::pair<int,int>(1,1),
+ NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
return potn_ker;
}
template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
@@ -1418,8 +1418,8 @@ template<class T> const Kernel<T>& LaplaceKernel<T>::gradient(){
template<> inline const Kernel<double>& LaplaceKernel<double>::potential(){
typedef double T;
- static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2>, laplace_dbl_poten<T,2> >("laplace" , 3, std::pair<int,int>(1,1),
- NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL, &laplace_vol_poten<double>);
+ static Kernel<T> potn_ker=BuildKernel<T, laplace_poten<T,2> >("laplace" , 3, std::pair<int,int>(1,1),
+ NULL,NULL,NULL, NULL,NULL,NULL, NULL,NULL,NULL,false);
return potn_ker;
}
template<> inline const Kernel<double>& LaplaceKernel<double>::gradient(){
gives
Maximum Absolute Error:21.2006
Maximum Relative Error:0.0541196
PS: I am runing on mac OS and compiled with
./configure MPICXX=mpic++ CXX=icpc CC=icc F77=ifort CXXFLAGS="-mavx -g -std=c++11" CFLAGS="-mavx -g" FFLAGS="-mavx -g" --with-openmp-flag="qopenmp" --with-fftw-include="${FFTW_INC}" --with-fftw-lib="-mkl" --with-blas="-mkl" --with-lapack="-mkl" --disable-doxygen-doc --disable-doxygen-dot --disable-doxygen-html
where FFTW_INC
refers to the the include dir of MKL/fftw
I'm working on a benchmark that repeatedly calls pvffm to evaluate a biot-savart point-to-point kernel as part of a runge kutta time integration, with the source and target points and potentials changing between each invocation. Are there any particularly strategies, documentation, or examples on how to cut down on or reuse initialization/tree matrix initialization, tree construction or other setup overheads between successive invocations? Is this, for example, what fmm_pts.cpp
is doing?
When running example2 with the parameters
./examples/bin/example2 -m 2 -q 0 -N 669
the example does not terminate. Instead there is just a single thread constantly allocating memory until the compute node runs out of memory (which in my case would be about 192 GB) and crashes.
When I am running the example like
./examples/bin/example2 -m 2 -q 0 -N 668
this behaviour does not occur. Instead the example terminates almost instantly and needs about 400 MB of memory.
Is this a bug, or am I missing something obvious here? The difference in the memory requirements seems extreme given the two scenarios. I have encountered this behaviour on two separate systems (as I wanted to see whether more memory would fix it).
If it is of any help, here the configuration I used:
I am using the latest commit (6cd67bd) and have built pvfmm with cuda support:
./configure --with-cuda=/usr/local/cuda
Output of the configuration:
pvfmm-lib-configuration.txt
I can't seem to reopen #5 myself, so I opened a new one.
Bad news. Still seems to be happening with the following matrix:
{ {7044.7734691220212, 0, 0}, {0, -1.284570679187241e-322, 57.264113734770199}, {0, 0, 0} }
Hi I am trying to compile your code and had a segmentation fault problem when compiled with cuda.
Software:
Centos 7, mpicxx(openmpi 1.10.0 + gcc 4.8.5), cuda-7.5, nvidia-driver 367.35
Hardware:
Xeon E5 2643 V3 x2, 128GB mem
pvfmm cloned from the github repo.
When compiled with cpu only, the examples run smoothly. When I configured it with cuda, for example:
/configure MPICXX=/usr/lib64/openmpi/bin/mpicxx --prefix=/home_local/wyan_local/software/PVFMM/install --with-cuda=/usr/local/cuda
The examples throw segmentation fault. For example with 1 openmp thread:
example1 -N 512
gives
W-List {
}
U-List {
}
V-List {
}
D2H_Wait:LocExp {
Segmentation fault (core dumped)
I looked the code a bit and it seems the loop copy dev_ptr to host_ptr at line 681 in fmm_tree.txx gives that segmentation fault
Profile::Tic("D2H_Wait:LocExp",this->Comm(),false,5);
if(device) if(setup_data[0+MAX_DEPTH*2].output_data!=NULL){
Real_t* dev_ptr=(Real_t*)&fmm_mat->staging_buffer[0];
Matrix<Real_t>& output_data=*setup_data[0+MAX_DEPTH*2].output_data;
size_t n=output_data.Dim(0)*output_data.Dim(1);
Real_t* host_ptr=output_data[0];
output_data.Device2HostWait();
#pragma omp parallel for
for(size_t i=0;i<n;i++){
host_ptr[i]+=dev_ptr[i];
}
}
I have tried moving from openmpi 1.10 to 2.0(latest), and to latest mpich. Also configured pvfmm with different gcc/nvcc compiling flags from '-g -O0' to '-O2' to 'mtune=native', '-gencode arch=compute_52,code=sm_52 '. All the combinations give the same segamentation fault.
Could you please help me locate the problem?
Thank you,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.