paboyle / grid Goto Github PK
View Code? Open in Web Editor NEWData parallel C++ mathematical object library
License: GNU General Public License v2.0
Data parallel C++ mathematical object library
License: GNU General Public License v2.0
CXX Application.o
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:169:21: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(p), p);
~~~~~~~~~~~ ^
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:132:9: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::initPopulation' requested here
initPopulation();
^
../../../extras/Hadrons/Application.cc:210:23: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::nextGeneration' requested here
scheduler.nextGeneration();
^
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:203:25: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(m), m);
~~~~~~~~~~~ ^
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:140:9: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::doMutation' requested here
doMutation();
^
../../../extras/Hadrons/Application.cc:210:23: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::nextGeneration' requested here
scheduler.nextGeneration();
^
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:183:21: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(c1), c1);
~~~~~~~~~~~ ^
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:148:9: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::doCrossover' requested here
doCrossover();
^
../../../extras/Hadrons/Application.cc:210:23: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::nextGeneration' requested here
scheduler.nextGeneration();
^
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:184:21: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(c2), c2);
~~~~~~~~~~~ ^
4 errors generated.
make[2]: *** [Application.o] Error 1
We need to think about a sensible interface providing things like conserved currents etc...
abstracting the differences between 5D formulations like DWF, mobius, ContFrac, PartFrac
and Wilson and other 4D approaches
We also need to think about the interface to 4D props and sources.
Antonin has done some of this in his measurement code, but we need to standardise and include.
Hi,
Do we really want to do this colour thing? I see many cons:
Pros: ??
Let me know what you think
It seems that the configure flag --enable-lapack just checks if liblapack is there. However, it would be good if it also checks if MKL is there and continue if this is the case. Maybe by test-compiling a small file which makes a lapack call with and without trailing underscores and if one of it succeeds, just continue.
I suggest to add few more tests to the Travis CI:
Hi,
As the code is available publicly and certainly receive attention, I think we should license it properly. There is already a GPLv3 text distributed in COPYING
. It would be safer to mark all source file as prescribed by the FSF guys. I suggest that we add a license header to each individual source file . I propose the following header:
/*
* <filename>.cc, part of Grid
*
* Copyright (C) 2015 <author list>
*
* Grid is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Grid is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Grid. If not, see <http://www.gnu.org/licenses/>.
*/
Please tell me if you think another license is more suitable.
We should try to go through the different feature branches and see which ones are fully integrated into develop
and should be removed. I have spotted the following ones:
knl-stats
hirep
Please advise if the development of these features is considered finished.
Hi,
Recently my attention was attracted to the following extension of git to manage the development flow of a project (features, stable/unstable branches, ...): http://danielkummer.github.io/git-flow-cheatsheet/
I thought it was worth considering it regarding the recent discussion and Guido's suggestion (which I completely support) of some form quality control and maybe milestones.
Let me know what you think.
I am running on Cori and it crashed with the error:
NOARCH.splanc.x: ../../../src/Grid/lib/communicator/Communicator_base.cc:49: void *Grid::CartesianCommunicator::ShmBufferMalloc(unsigned long): Assertion heap_bytes<MAX_MPI_SHM_BYTES' failed.`
I have Grid configured with standard MPI comms so by my understanding, these shared buffer allocs for the comms buffers are not needed. However it appears that Stencil.h doesn't check the MPI mode. It would be great if Stencil did a check that Grid is configured to use the hybrid MPI and if not just do regular allocs.
Hi,
My laptop does not support AVX2 instructions. When I try to compile with AVXFMA4
, the compilation craches complaining that I am using AVX2 intrinsics unsupported by my machine.
So my question is: is that really an AVX1+FMA target or did it become redundant with AVX2 (especially now that Peter added -mfma
to AVX2
)?
Develop is generating an internal compiler error since a commit about 5 days ago under travis and GCC5.
Just to start the discussion on a real documentation. There are many possibilities, I can think of the following:
Paper-style PDF
Pro: can be put on arXiv, good for reaching the community
Con: quite "static", with a risk of the document becoming quickly obsolete
Documentation CMS
It seems that the developper community quite like Sphinx which was originally designed for Python documentation. A lot of examples can be found here.
Pro: nice to browse, search. More dynamical
Con: we need to learn how to use the thing (but that does not look that hard)
I have several issues compiling GRID on a cray machine.
the automatically generated Makefile in lib does not get the correct include paths:
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
CXX Init.o
CXX PerfCount.o
CXX algorithms/approx/MultiShiftFunction.o
CXX Log.o
CXX qcd/action/fermion/CayleyFermion5D.o
CXX qcd/action/fermion/ContinuedFractionFermion5D.o
CXX qcd/action/fermion/PartialFractionFermion5D.o
CXX qcd/action/fermion/WilsonFermion.o
CXX qcd/action/fermion/WilsonKernels.o
CXX qcd/action/fermion/WilsonFermion5D.o
CXX qcd/action/fermion/WilsonKernelsAsm.o
CXX qcd/action/fermion/WilsonKernelsHand.o
In file included from ../../src/lib/qcd/action/fermion/CayleyFermion5D.cc:32:0:
../../src/lib/Grid.h:62:46: fatal error: Grid/serialisation/Serialisation.h: No such file or directory
#include <Grid/serialisation/Serialisation.h>
That I could fix by manually adding -I's in the generated makefile
the lib compilation uses gcc/g++ and not the compiler I selected. I want to use the cray wrappers cc/CC to enable cray mpi, but I got:
make[1]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
CXX Init.o
In file included from ../../src/include/Grid/Communicator.h:31:0,
from ../../src/lib/Grid.h:72,
from ../../src/lib/Init.cc:44:
../../src/include/Grid/communicator/Communicator_base.h:35:17: fatal error: mpi.h: No such file or directory
#include <mpi.h>
That is expected, as gcc does not know about mpi. Please fix this so that CC/CXX is actually CC/CXX specified by the user
after fixing that, the next error is:
../../src/include/Grid/Stencil.h(276): error: a value of type "Grid::iScalar<Grid::iVector<Grid::iVector<Grid::vComplexF, 3>, 2>> *" cannot be used to initialize an entity of type "uint64_t={unsigned long}"
uint64_t cbase = & comm_buf[0];
to me this looks like a some implicit casting the intel compiler does not like. I think an explicit typecast would be healthy here.
after fixing these things, I finally get:
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
make[2]: *** No rule to make target 'simd/Grid_empty.h', needed by 'all-am'. Stop.
That I don't know how to solve. Please advise.
Best
Thorsten
See pre-development branch
feature/Ls-vectorised-actions
I've concluded we really require easy access to dense matrix functionality.
There's been a few places:
Lanczos,
now also Mobius with s-vectorisation
where it is needed. Depending on Eigen seems the least ugly option.
Similarly, FFT is becoming important.
-- fourier accel gauge fixing
-- measurement mom projection
-- QED
-- Gauge fixed smearing by convolution
-- etc...
I made include/Grid a symlink to lib/ and all includes be
The headers remain with source, side by side in tree, but the logical include path is
include/Grid/
AC_LINK_FILES is used for same effect in the build directory.
I added a prerequisites subdirectory that gets built first.
-- This downloads and caches (in the source tree) the Eigen and FFTW packages.
-- In the build directory, it untars and builds FFTW, untars Eigen.
-- Eigen headers, the FFTW header and the compiled FFTW library are moved into
the include/Grid/Eigen/
include/Grid/fftw3.h
These are therefore disambiguated from any system installed versions kicking around,
as I want to avoid blowing in the wind as various clusters module systems control
software versions.
Thus the header only Eigen, and the fftw3 header and library get installed along with Grid,
but pulled from their source repositories and cached in the Grid source tree
from the first checkout & build someone does from github.
I've added subdirectories, and only core tests get built by default.
We could make a named travis test directory and run all tests in there.
Thoughts welcome...
the flag mpi-auto
triggers a failure in the configure step in ARCHER.
The responsible is the
LX_FIND_MPI
which seems unsupported by the Cray wrappers in that machine.
Compilation with just
--enable-comms=mpi
works.
Two options
Dear contributers,
I tried to run benchmarks, but Benchmark_wilson failed due to Segmentation fault.
Benchmark_dwf and Benchmark_zmm seem to have the same problem.
I am not sure this is a problem of Grid or gcc behind the intel compiler.
version of the Grid:
master, as of Apr. 21 (d9b5e66)
$ ./Benchmark_wilson --debug-signals
||||||||||||||__
||||||||||||||__
|| | | | | | | | | | | | |_
| |_
|_ GGGG RRRR III DDDD _|
|_ G R R I D D _|
|_ G R R I D D _|
|_ G GG RRRR I D D _|
|_ G G R R I D D _|
|_ GGGG R R III DDDD _|
| |_
||||||||||||||__
||||||||||||||__
| | | | | | | | | | | | | |
Copyright (C) 2015 Peter Boyle, Azusa Yamaguchi, Guido Cossu, Antonin Portelli and other authors
Colours by Tadahito Boyle
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Grid : Message : 0 ms : Grid is setup to use 1 threads
Grid : Message : 0 ms : Grid floating point word size is REALF4
Grid : Message : 0 ms : Grid floating point word size is REALD8
Grid : Message : 0 ms : Grid floating point word size is REAL4
Grid : Message : 1134 ms : Calling Dw
Caught signal 11
mem address 4
code 1
instruction 3c4c20e0fc
rdi 77af20
rsi 0
rbp 0
rbx 0
rdx 22088400
rax 0
rcx 20
rsp 7ffc40e26ed0
rip 3c4c20e0fc
r8 2eb4a67
r9 0
r10 7ffc40e26d40
r11 7ffc40e26cf0
r12 3c56cefa88
r13 0
r14 13c65c0
r15 0
BackTrace Strings: 0 ./Benchmark_wilson() [0x4366da]
BackTrace Strings: 1 /lib64/libc.so.6() [0x3c4ca326a0]
BackTrace Strings: 2 /lib64/ld-linux-x86-64.so.2() [0x3c4c20e0fc]
BackTrace Strings: 3 /lib64/ld-linux-x86-64.so.2() [0x3c4c2148f5]
BackTrace Strings: 4 /usr/lib64/libstdc++.so.6(_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE+0x97) [0x3c56ab65a7]
BackTrace Strings: 5 ./Benchmark_wilson() [0x441170]
BackTrace Strings: 6 ./Benchmark_wilson() [0x46c105]
BackTrace Strings: 7 ./Benchmark_wilson() [0x46c045]
BackTrace Strings: 8 ./Benchmark_wilson() [0x487d8e]
BackTrace Strings: 9 ./Benchmark_wilson() [0x487d00]
BackTrace Strings: 10 ./Benchmark_wilson() [0x48761b]
BackTrace Strings: 11 ./Benchmark_wilson() [0x406cbd]
BackTrace Strings: 12 /lib64/libc.so.6(__libc_start_main+0xfd) [0x3c4ca1ed5d]
BackTrace Strings: 13 ./Benchmark_wilson() [0x4032c9]
Here is my configuration:
$ ../configure CXX=icpc --enable-simd=AVX --enable-precision=single CXXFLAGS="-std=c++11 -O0 -debug inline-debug-info -g " --enable-comms=none
$ icpc -v
icpc version 16.0.1 (gcc version 4.8.2 compatibility)
The build gave plenty of warnings, like:
../../lib/simd/Grid_avx.h(521): warning #167: argument of type "__m128" is incompatible with parameter of type "__m128i"
_mm256_alignr_epi32(ret,in,tmp,n);
^
detected during instantiation of "__m256 Grid::Optimization::Rotate::tRotate(__m256) [with n=0]" at line 494
../../lib/simd/Grid_avx.h(521): warning #167: argument of type "__m128" is incompatible with parameter of type "__m128i"
_mm256_alignr_epi32(ret,in,tmp,n);
^
detected during instantiation of "__m256 Grid::Optimization::Rotate::tRotate(__m256) [with n=0]" at line 494
../../lib/simd/Grid_avx.h(521): warning #167: argument of type "__m128i" is incompatible with parameter of type "const __m128 &"
_mm256_alignr_epi32(ret,in,tmp,n);
^
detected during instantiation of "__m256 Grid::Optimization::Rotate::tRotate(__m256) [with n=0]" at line 494
(it continues for about 48000 lines)
The machine has an Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz, which has the AVX,
and 16GB memory.
Other benchmarks:
OK:
Benchmark_su3
Benchmark_memory_asynch
Benchmark_memory_bandwidth
Failed:
Benchmark_comms : Aborted (due to --enable-comms=none, I guess)
Benchmark_dwf : Segmentation fault (after "Calling Dw")
Benchmark_zmm : Segmentation fault (after "Calling Dw")
Best regards,
Issaku
Some versions of clang++/g++ fails to compile lib/algorithms/approx/Remez.cc
Adding #include<stddef.h> before any other include's fixes it.
As per the title. I'm now in the stage of merging the smearing branch and I'm retesting everything, but not have the time to address immediately this issue. If someone wants to solve it in the meantime...
It was running in my branch before the merge
My configuration flags
../../Grid/configure --enable-precision=single --enable-simd=AVX CXXFLAGS=-mavx -fopenmp=libomp -O3 -std=c++11 LDFLAGS=-fopenmp=libomp LIBS=-lgmp -lmpfr --enable-comms=none
I just happened to look at the OpenSHMEM usage in the library; it looks like that the collectives usage is little buggy. As per the OpenSHMEM standards, "Every element of this array(here pSync array) must be initialized with the value SHMEM_SYNC_VALUE (in C/C++) or SHMEM_SYNC_VALUE (in Fortran) before any of the PEs in the Active set enter the reduction routine."
Some random example from the library; In CartesianCommunicator::GlobalSumVector(double *d,int N), it looks like psync lacks initialization to SHMEM_SYNC_VALUE
If you're always using C++11, then unordered_map may offer some performance improvements over std::map, since the best case lookup complexity is O(1) due to the underlying implementation using a hash table, and you're not iterating over header, so an ordered map isn't required.
On the other hand, loading an NERSC file is of course IO bound, so a few extra cycles spent using std::map probably won't affect things that much.
My compiler is complaining of invalid intrinsic calls in the Grid_avx.h code:
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h: In static member function ‘static __m256i Grid::Optimization::PrecisionChange::StoH(__m256, __m256)’:
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:480:36: error: cannot convert ‘__m128i {aka __vector(2) long long int}’ to ‘__m128 {aka __vector(4) float}’ for argument ‘1’ to ‘__m256 _mm256_castps128_ps256(__m128)’
h = _mm256_castps128_ps256(ha);
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:481:38: error: cannot convert ‘__m128i {aka __vector(2) long long int}’ to ‘__m128 {aka __vector(4) float}’ for argument ‘2’ to ‘__m256 _mm256_insertf128_ps(__m256, __m128, int)’
h = _mm256_insertf128_ps(h,hb,1);
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:485:14: error: cannot convert ‘__m256 {aka __vector(8) float}’ to ‘__m256i {aka __vector(4) long long int}’ in return
return h;
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h: In static member function ‘static void Grid::Optimization::PrecisionChange::HtoS(__m256i, __m256&, __m256&)’:
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:489:53: error: cannot convert ‘__m256i {aka __vector(4) long long int}’ to ‘__m256 {aka __vector(8) float}’ for argument ‘1’ to ‘__m128 _mm256_extractf128_ps(__m256, int)’
sa = _mm256_cvtph_ps(_mm256_extractf128_ps(h,0));
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:490:53: error: cannot convert ‘__m256i {aka __vector(4) long long int}’ to ‘__m256 {aka __vector(8) float}’ for argument ‘1’ to ‘__m128 _mm256_extractf128_ps(__m256, int)’
sb = _mm256_cvtph_ps(_mm256_extractf128_ps(h,1));
I have managed to fix all the issues by modifying the code to the following (starting Grid_avx.h:475):
static inline __m256i StoH (__m256 a,__m256 b) {
__m256i hi;
#ifdef USE_FP16
__m256 h;
__m128i ha = _mm256_cvtps_ph(a,0);
__m128 hha = _mm_cvtepi32_ps(ha);
__m128i hb = _mm256_cvtps_ph(b,0);
__m128 hhb = _mm_cvtepi32_ps(hb);
h = _mm256_castps128_ps256(hha);
h = _mm256_insertf128_ps(h,hhb,1);
hi = _mm256_cvtps_epi32(h);
#else
assert(0);
#endif
return hi;
}
static inline void HtoS (__m256i h,__m256 &sa,__m256 &sb) {
#ifdef USE_FP16
__m256 hh = _mm256_cvtepi32_ps(h);
__m128 hh0 = _mm256_extractf128_ps(hh,0);
__m128i hh0i = _mm_cvtps_epi32(hh0);
__m128 hh1 = _mm256_extractf128_ps(hh,1);
__m128i hh1i = _mm_cvtps_epi32(hh1);
sa = _mm256_cvtph_ps(hh0i);
sb = _mm256_cvtph_ps(hh1i);
#else
assert(0);
#endif
Unfortunately this involves a lot more instructions!
I've finally got fed up files with a Grid_ in front. What seemed like a good idea when there
were one or two files is blatantly dumb when we have a whole tree which has the "Grid" in the
top directory name (where it should be) anyway.
I've started switching to C++ style capitalised names like
MobiusZolotarevFermion.h
Any reasons to do lower case mobius_zolotarev_fermion.h
I plan to do a global rename exercise soon, and think that the file name reflecting the capitalised
class name is the simplest. and will do so unless there are objections persuading me otherwise.
What is the point of view of the developers on starting to write the doxygen annotations before the code becomes too big?
Any other suggestion about the documentation?
Hi Guys!
As you know I have been experiencing with Gruido last week.
I am using Grid as an external library, and I had a little pain in making my code able to compile against the "installed" version of Grid.
I would assume that one is expected to pass to oneself program the flags
-I$grid_prefix/include -L$grid_prefix/install/lib -lGrid
(where $grid_prefix is the prefix passed when configuring Grid) in agreement with typical package folder structure. If then I include Grid.h as per:
#include <Grid/Grid.h>
I get the error:
$grid_prefix/include/Grid/algorithms/approx/Remez.h:19:20: fatal error: Config.h: No such file or directory #include <Config.h>
in facts, the file Config.h is included in $grid_prefix/include/Grid, which is not in the searched path.
If I pass instead
-I$grid_prefix/include/Grid
and include Grid.h as in:
#include <Grid.h>
what I obtain is the error:
$grid_prefix/include/Grid/Grid.h:63:24: fatal error: Grid/Timer.h: No such file or directory #include <Grid/Timer.h>
beacuse now "Grid/" is already parth of the search path. So ultimately I need to compile the code with both paths:
-I$grid_prefix/include -I$grid_prefix/include/Grid
Provided that this combination is kept, everything goes fine, but I find this hard intuitive.
Another issue is the fact that, as you distribute Config.h file, the "PACKAGE_NAME", "PACKAGE_STRING" etc macros clash with those used in my autotools generated header. A typical solution that I've seen used is to wrap the Config.h in a "true" included file, and then rename the package macros. For example c-lime library does the following:
#ifndef LIME_CONFIG_H
#define LIME_CONFIG_H
/* Undef the unwanted from the environment -- eg the compiler command line */
#undef PACKAGE
#undef PACKAGE_BUGREPORT
#undef PACKAGE_NAME
#undef PACKAGE_STRING
#undef PACKAGE_TARNAME
#undef PACKAGE_VERSION
#undef VERSION
/* Include the stuff generated by autoconf */
#include "lime_config_internal.h"
/* Prefix everything with LIME_ /
static const char const LIME_PACKAGE = PACKAGE;
static const char* const LIME_PACKAGE_BUGREPORT = PACKAGE_BUGREPORT;
static const char* const LIME_PACKAGE_NAME = PACKAGE_NAME;
static const char* const LIME_PACKAGE_STRING = PACKAGE_STRING;
static const char* const LIME_PACKAGE_TARNAME = PACKAGE_TARNAME;
static const char* const LIME_PACKAGE_VERSION = PACKAGE_VERSION;
/* LIME_VERSION is already defined in lime_defs.h */
/* Undef the unwanted */
#undef PACKAGE
#undef PACKAGE_BUGREPORT
#undef PACKAGE_NAME
#undef PACKAGE_STRING
#undef PACKAGE_TARNAME
#undef PACKAGE_VERSION
#undef VERSION
#endif
We get the following error when we try to compile Grid on Intel
In file included from ../../../lib/qcd/action/Actions.h(44),
from ../../../lib/qcd/QCD.h(460),
from ../../../lib/Grid.h(78),
from ../../../lib/PerfCount.cc(29):
/usr/include/c++/5/bits/stl_iterator_base_types.h(154): error: name followed by "::" must be a class or namespace name
typedef typename _Iterator::iterator_category iterator_category;
Please find attached the full list of errors
$ icpc --version
icpc (ICC) 16.0.0 20150815
Copyright (C) 1985-2015 Intel Corporation. All rights reserved.
$ g++ --version
g++ (Ubuntu 5.1.1-4ubuntu12) 5.1.1 20150504
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
build_out.txt
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
That is going to be very painful, but I think that on the long run it can really pay in term of clarity for users and the way other softwares will base themselves on Grid. The idea would full scan all the declarations to insure:
const
when appropriate to reduce the risk of silent buggy variable changesPosting here before propagating to the devel
Need a way to handle RNG management in QCD with multiple live grids.
SpaceTimeGrid class is perhaps a place holder for central lattice information,
could create a Grid hierarchy there.
We could make several simplifying assumptions:
i) 5d is never spread out; 4d RNG's suffice.
ii) Subdivided Grids make use of RNG's from the 0,0,0,0 subcell element of the finest grid.
We would only ever have to save and restore 4d RNG's then, and alternate routines
for RNG filling to index the corresponding RNG on a different grid.
Tradeoffs:
I want the RNG sequence to be independent of machine decomposition.
IroIro no longer does this, and gains from the Mersenne twister skip ahead by having
one RNG per node. This gives up both machine decomposition independence AND the
threading of RNG generation within an MPI task.
Could make RNG's live on a coarser grid. (Coarsest?). Suppresses RNG state volume.
CPS does a version of this with one RNG per hypercube.
Making this quite general -- fill a fine grid from a coarse grid RNG that subdivides the fine
grid, allowing for 5d/4d -- would enable unable suppression of RNG state volume, while retaining
ability to parallelise within a node and also retaining machine decomposition independence providing
we do not subdivide too much.
I am tempted to expand lib/qcd/utils/SpaceTimeGrid.h/cc to retain a sequence of
global Grid objects for QCD running (Fermion Grid, Gauge Grid, RNGGrid) and
provide the subdivided RNG grid fill, save/restore etc...
Similarly retain the single serial RNG here.
Comments on this strategy welcome. With a Mersenne Twister implementation we can
take a single seed and skip-ahead instead of reseeding with random as is presently done with ranlux.
I the following problem
[tkurth@gert01 GRID]$ tail -f slurm-3046972.out
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Grid : Message : Requesting 134217728 byte stencil comms buffers
Grid : Message : Grid is setup to use 32 threads
Grid : Message : Making s innermost grids
^[[A^[[A^[[ABenchmark_dwf: ../../src/lib/communicator/Communicator_base.cc:49: void *Grid::CartesianCommunicator::ShmBufferMalloc(unsigned long): Assertion `heap_bytes<MAX_MPI_SHM_BYTES' failed.
ShmBufferMalloc exceeded shared heap size -- try increasing with --shm <MB> flag
Parameter specified in units of MB (megabytes)
Current value is 128
Benchmark_dwf: ../../src/lib/communicator/Communicator_base.cc:49: void *Grid::CartesianCommunicator::ShmBufferMalloc(unsigned long): Assertion `heap_bytes<MAX_MPI_SHM_BYTES' failed.
srun: error: nid02439: tasks 0-1: Aborted
srun: Terminating job step 3046972.0
the code hangs when it tries to make the innermost grids and then fails after 10 minutes. this is my run script
[tkurth@gert01 GRID]$ cat benchmark_dwf.sh
#!/bin/bash
#SBATCH --ntasks-per-core=4
#SBATCH -N 1
#SBATCH -A mpccc
#SBATCH -p regular
#SBATCH -t 2:00:00
#SBATCH -C knl,quad,cache
export OMP_NUM_THREADS=32
export OMP_PLACES=threads
export OMP_PROC_BIND=spread
#MPI stuff
export MPICH_NEMESIS_ASYNC_PROGRESS=MC
export MPICH_MAX_THREAD_SAFETY=multiple
export MPICH_USE_DMAPP_COLL=1
srun -n 2 -c 136 --cpu_bind=cores ./install/grid_sp_mpi/bin/Benchmark_dwf --threads 32 --grid 32.32.32.32 --mpi 1.1.1.2 --dslash-asm --cacheblocking=4.2.2.1
[config.log.txt](https://github.com/paboyle/Grid/files/572824/config.log.txt)
[config.summary.txt](https://github.com/paboyle/Grid/files/572823/config.summary.txt)
commit c067051d5ff1a3f4c4dea0e72cc9b1b0ad092c7a
Merge: bc248b6 afdeb2b
Author: paboyle <[email protected]>
Date: Wed Nov 2 13:59:18 2016 +0000
Merge branch 'develop' into release/v0.6.0
KNL bin1, cray xc-40, intel 16.0.3.210
build script and configure
#!/bin/bash -l
#module loads
module unload craype-haswell
module load craype-mic-knl
module load cray-memkind
precision=single
comms=mpi
if [ "${precision}" == "single" ]; then
installpath=$(pwd)/install/grid_sp_${comms}
else
installpath=$(pwd)/install/grid_dp_${comms}
fi
mkdir -p build
cd build
../src/configure --prefix=${installpath} \
--enable-simd=KNL \
--enable-precision=${precision} \
--enable-comms=${comms} \
--host=x86_64-unknown-linux \
--enable-mkl \
CXX="CC" \
CC="cc"
#CXXFLAGS="-mkl -xMIC-AVX512 -std=c++11" \
#CFLAGS="-mkl -xMIC-AVX512 -std=c99" \
#LDFLAGS="-mkl -lmemkind"
make -j12
make install
cd ..
attached config.log
attached config. summary
no make.log but should not be necessary hopefully
good example of how to use variadic macros; simplifies the Hana library Bob discovered
and gives reflection (serialise/deserialise).
This is still dependent on Boost PP (preprocessor) but can strip that out too,
This could give a better version of what i did in ukhadron.
Travis on XCODE 5 continues to fail due to the time it takes to compile in the virtual machine.
I tried to separate the matrix of computations splitting the single and double but without success. The env options inside the matrix:
seem not to allow for splitting the compilations and external env:
will be overridden.
Any ideas?
Hi,
I built Grid for the Bc-cluster at Fermilab (AMD Opteron 6320) using icc v16 with impi 5.1.3. No issues when building but when executing the binary only the following message is printed:
Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT and AVX instructions.
Allowed flags from /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
Outputs from configure and make (config.log, config.out) are attached. I'm mainly testing the configure script and cleaning-up my install.
CXX=icpc ../Grid/configure --enable-simd=AVX --enable-comms=mpi-auto
I have a working build using the same intel compilers on this machine. However, that was created by explicitly specifying compile options on September 26, 2016 (cf config.log-2016-09-26).
Thank you,
Oliver
PS: To upload files I attached suffix '.txt' to the filename.
I'm finding on a small test I can get a substantive speed up out
of using the BFM style long lived thread with barrier synch.
e.g. Single node 8^4:
620GF/s -> 910 GF/s on KNL 7210 (SP)
62GF/s -> 72 GF/s on BG/Q node (DP)
On 16^4 local volumes on KNL the gain is minimal though.
If we act, this pushes us into a threading nightmare however, with many routines having
to accept multiple threads entering them.
I will think a little about whether the "parallel_for" macro can work around this and the implications,
but if there is no easy common solution we are looking at having to make "DhopThread"
and "Dhop" routines and changing the CG and other solvers to run in thread / barrier mode.
This seems to be mandated by self threading being 1.5x faster than OpenMP for loops.
I'll make the comparison a little more robust though. I'm tempted to not do this even if it means
a little less performance on the sweet spot on BG/Q since the software cost is large and perhaps
we should just accept it.
Hi,
Through my recent development (HDF5, Gamma) I really struggled with Grid header structures. A lot of headers can only be included assuming a very specific sequence of previous includes done externally to the header itself. I was curious enough to try to follow include chains and in some case it is really involved.
That scares me a bit considering that Grid is growing fast, because this can go out of control rather quickly. The main issue is that the whole structure is becoming increasingly cryptic:
I am not advocating that we should change the include strategy, but rather that we consolidate it. One possible strategy could be:
Global.h
file.#include <Grid/Global.h>
then #include
of thematic headers (Cartesian.h
, Algorithms.h
) necessary for the definition in the current header.Grid.h
would be fine. It is just about adding a bunch of Grid includes on top of each header to make them self-consistent and independent.This is what I am already doing in the measurement code, the loading order of headers can be permuted arbitrarily and any header is self-consistent in term of definitions & declarations (i.e. it can be included alone without errors). Of course although this is not a huge change either it will be a complete pain to do. I would be happy to volunteer doing so but I won't do anything without having your opinion. But I would say the code would gain quite some readability and robustness.
Any thoughts?
I have started adding the '0.6.0' milestone to some issues. I think this system is useful to plan releases. Tell me if you think it is inconvenient.
You mention boost arrays in the TODO list, so I thought I'd mention this on the off-chance you hadn't already seen it. C++11 provides an array class template that's pretty much the same as the boost array type.
Add the info of flops count to the solvers and HMC routines.
Added here as a memo for everyone.
Hi,
Would you mind checking if the zMobius CG converges when omega has imaginary components? I've modified Test_dwf_cg_prec DomainWallFermionR -> ZMobiusFermionR and it fails to converge.
In case you are wondering. omega(s) = 0.25 + 0.01 i
Compiling Grid with clang++-3.8 and AVX2 support appears to trigger a compiler error related to FMA support in the intrinsics.
$ cd Grid
$ git rev-parse --short HEAD
5e02392
In file included from qcd/action/fermion/CayleyFermion5D.cc:31:
In file included from ./Grid.h:68:
In file included from ./Simd.h:166:
In file included from ./simd/Grid_vector_types.h:47:
./simd/Grid_avx.h:240:14: error: always_inline function '_mm256_fmaddsub_ps' requires target feature 'fma', but would be
inlined into function 'operator()' that is compiled without support for 'fma'
return _mm256_fmaddsub_ps( a_real, b, a_imag ); // Ar Br , Ar Bi +- Ai Bi = ArBr-AiBi , ArBi+AiBr
^
./simd/Grid_avx.h:286:14: error: always_inline function '_mm256_fmaddsub_pd' requires target feature 'fma', but would be
inlined into function 'operator()' that is compiled without support for 'fma'
return _mm256_fmaddsub_pd( a_real, b, a_imag ); // Ar Br , Ar Bi +- Ai Bi = ArBr-AiBi , ArBi+AiBr
^
fatal error: error in backend: Cannot select: 0x76cec80: v8f32 = X86ISD::FMADDSUB 0x787a1d0, 0x7a523a0, 0x7a52600
===== system details ========
$ cat /etc/redhat-release
Scientific Linux release 7.2 (Nitrogen)
$ uname -a
Linux yosemite.fnal.gov 3.10.0-229.20.1.el7.x86_64 #1 SMP Wed Nov 4 10:08:36 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
$ less /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
$ clang++ -v
clang version 3.8.0 (tags/RELEASE_380/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/james/installed/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
$ ./configure CXX=clang++ CXXFLAGS="-std=c++11 -O3 -mavx2 -fopenmp -lomp" --enable-simd=AVX2
Summary of configuration for grid v1.0
The following features are enabled:
For more clarity in the code structure it would be better having the .h files now in lib moved to and include / directory in the root level.
G
Hi,
I ran four tests on an 8^4 lattice on the summit machine at UC Boulder and similar tests on pi0 at Fermilab. All jobs were running on 2 nodes each with 1 mpi rank and 24 threads summit (16 threads pi0). The jobs differ by the type of the files ystem used for writing the ckpoints (NFS or GPFS summit; ZFS or lustre Fermilab) and whether I split the T or the Z direction (1 or 2 IO nodes)
summit
mpi SLURM-ID
1.1.1.2 (2 IO nodes) NFS 462 280 MB/s
1.1.1.2 (2 IO nodes) GPFS 461 0.05 MB/s
1.1.2.1 (1 IO node) NFS 460 131 MB/s
1.1.2.1 (1 IO node) GPFS 455 79 MB/s
pi0 Fermilab
mpi PBS-ID (last three)
1.1.1.2 (2 IO nodes) ZFS 628 228 MB/s
1.1.1.2 (2 IO nodes) lustre 635 0.002 MB/s
1.1.2.1 (1 IO node) ZFS 626 110 MB/s
1.1.2.1 (1 IO node) lustre 627 3-20 MB/s
Unfortunately, I didn't find performance values for the single I/O writing the rng-files in the log-files;
the full log-files are however attached and carry the SLURM-ID / PBS-ID in the filename. Do I need some special flag for parallel file systems? (striping?)
The parallel read of the ckpoint at the beginning of the job seems OK four all cases although in this tests not all jobs started from a checkpoint. On both machines Grid is compiled on NFS/ZFS.
Thank you,
Oliver
All norms and innerProducts are no high precision norms using intermediate double.
The code Guido has written with a differentiated normHP should just call norm.
These should be bandwidth limited anyway, so there is no reason to not make default.
Just for bookkeeping for 0.6 (Peter is aware of the origin of the issue), currently MPI freezes in an eternal wait and this should be corrected for 0.6.
Intel compiler chokes on syntax used in
lib/simd/Grid_avx.h(360): error: expression must have pointer-to-object type
return v1[0];
For example. It is necessary to go through a "conv" union to get at element by element
access to the _mXXX vector intrinsic types in a way that works with Clang, G++ and Intel's compiler.
I have problems running the benchmark codes. I can collect several issues here:
Grid : Message : 24906 ms : 30 4 10368000 1198.7 2397.41
Grid : Message : 26629 ms : 30 8 20736000 1199.82 2399.64
Grid : Message : 30097 ms : 30 16 41472000 1208.64 2417.28
Grid : Message : 30599 ms : 32 1 3145728 710.73 1421.46
Grid : Message : 31088 ms : 32 2 6291456 1173.55 2347.1
Grid : Message : 32166 ms : 32 4 12582912 1159.37 2318.73
Grid : Message : 34211 ms : 32 8 25165824 1231.09 2462.19
Grid : Message : 38414 ms : 32 16 50331648 1210.31 2420.63
Grid : Message : 38500 ms : ====================================================================================================
Grid : Message : 38500 ms : = Benchmarking sequential halo exchange in 4 dimensions
Grid : Message : 38500 ms : ====================================================================================================
Grid : Message : 38500 ms : L Ls bytes MB/s uni MB/s bidi
srun: error: nid12126: task 23: Floating point exception
There might be a division by zero or something.
Grid : Message : 116 ms : Making s innermost grids
||||||||||||||__
||||||||||||||__
|_ | | | | | | | | | | | | _|
|_ _|
|_ GGGG RRRR III DDDD _|
|_ G R R I D D _|
|_ G R R I D D _|
|_ G GG RRRR I D D _|
|_ G G R R I D D _|
|_ GGGG R R III DDDD _|
|_ _|
||||||||||||||__
||||||||||||||__
| | | | | | | | | | | | | |
Copyright (C) 2015 Peter Boyle, Azusa Yamaguchi, Guido Cossu, Antonin Portelli and other authors
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Grid : Message : 25 ms : Grid is setup to use 128 threads
Grid : Message : 116 ms : Making s innermost grids
Grid : Message : 15963 ms : Naive wilson implementation
Grid : Message : 15964 ms : Calling Dw
Benchmark_dwf: ../../src/lib/qcd/action/fermion/WilsonKernelsAsm.cc:46: void Grid::QCD::WilsonKernels::DiracOptAsmDhopSite(Impl::StencilImpl &, Grid::LebesgueOrder &, Impl::DoubledGaugeField &, std::vector<Impl::SiteHalfSpinor, Grid::alignedAllocatorImpl::SiteHalfSpinor> &, int, int, int, int, const Impl::FermionField &, Impl::FermionField &) [with Impl = Grid::QCD::WilsonImplGrid::Grid_simd<std::complex<double, __m512d>, Grid::QCD::FundamentalRep<3>, double>]: Assertion `0' failed.
srun: error: nid12151: task 0: Aborted
srun: Terminating job step 3027515.1```
Can you help me here?
I can start working on the action parts (gauge and fermions).
We have to decide the target design functionality and style.
I have in mind something similar to what I have in IroIro++, a kind of lightweight version of the Chroma ones.
You can have a look here
https://github.com/coppolachan/IroIro/tree/master/lib/Action
and here for the corresponding HMC implementation (leapfrog, 2MN, multilevel)
https://github.com/coppolachan/IroIro/tree/master/lib/HMC
The Travis build failing for Linux/clang come from the fact that LLVM guys closed their APT repository because network traffic was to heavy http://lists.llvm.org/pipermail/llvm-dev/2016-May/100303.html.
This is not surprising considering the increasing number of CI bots all downloading over and over again clang from their server.
Finding a plan B would be very painful, and some say this is just temporary and that the server will reopen.
I will keep a look on it and figure out something if it is not solved on LLVM side.
This is not an issue - more an information message (I don't know if the github messages attached to the commits get broadcasted to everyone so I write also here)
In the latest commit I added a .gitignore file to ignore autoconf files in the commits, and also the compiled libraries and created makefiles from automake. this would allow contributors to skip reconfiguring the tools and keep their own environment.
For new users, I also added a simple utility called reconfigure_script (maybe should be moved to the scripts dir) that runs all autotools command to setup the correct environment.
Add new file formats:
I want to be clear on this.
My source code is written to be readable by me.
I do NOT appreciate to reformatting source files, and committing them, especially if
this is done brainlessly by an automatic tool, but in any case the style of the author
is key. It does not matter if you do not like my style and choices.
A complete mess has been made of several source key files, and further
dozens of files have been pointlessly changed in a way that violates my
personal preference of not acquiring high levels of indentation when entering
the Grid or QCD namespace.
This floating indent, combined with a tool based application of 80 character wrap
creates unreadable code from code that was previously easily readable by the author.
Further, readability is in the judge of the prime author of a given area of the code, and it
is rude and inappropriate to reformat without consultation and agreement.
I am now spending several hours reverting code with this waste of time created by thoughtless
action.
The worst case is the thoughtless application of automatic formatting to a critical
file with braces and scopes in ifdef's that hopeless confused the formatting.
You should NEVER be committing without first applying git diff and satisfying yourself that
your are in complete control of these changes with a very few lines deliberately modified with genuine purpose.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.