paboyle / grid Goto Github PK

Data parallel C++ mathematical object library

License: GNU General Public License v2.0

Shell 0.90% C++ 92.96% C 3.26% Makefile 0.04% M4 1.72% Mathematica 0.83% Roff 0.24% BitBake 0.06%

grid's Issues

bgclang no compile

CXX Application.o
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:169:21: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(p), p);
~~~~~~~~~~~ ^
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:132:9: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::initPopulation' requested here
initPopulation();
^
../../../extras/Hadrons/Application.cc:210:23: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::nextGeneration' requested here
scheduler.nextGeneration();
^
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:203:25: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(m), m);
~~~~~~~~~~~ ^
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:140:9: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::doMutation' requested here
doMutation();
^
../../../extras/Hadrons/Application.cc:210:23: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::nextGeneration' requested here
scheduler.nextGeneration();
^
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:183:21: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(c1), c1);
~~~~~~~~~~~ ^
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:148:9: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::doCrossover' requested here
doCrossover();
^
../../../extras/Hadrons/Application.cc:210:23: note: in instantiation of member function 'Grid::Hadrons::GeneticScheduler::nextGeneration' requested here
scheduler.nextGeneration();
^
In file included from ../../../extras/Hadrons/Application.cc:31:
/dirac1/work/x03/paboyle/Grid-bgq/Grid/include/Grid/Hadrons/GeneticScheduler.hpp:184:21: error: no member named 'emplace' in 'std::multimap<int, std::vector<unsigned int, std::allocator >, std::less, std::allocator<std::pair<const int,
std::vector<unsigned int, std::allocator > > > >'
population_.emplace(func_(c2), c2);
~~~~~~~~~~~ ^
4 errors generated.
make[2]: *** [Application.o] Error 1

5D observables, 4D props

We need to think about a sensible interface providing things like conserved currents etc...
abstracting the differences between 5D formulations like DWF, mobius, ContFrac, PartFrac
and Wilson and other 4D approaches

We also need to think about the interface to 4D props and sources.

Antonin has done some of this in his measurement code, but we need to standardise and include.

Colors?

Hi,

Do we really want to do this colour thing? I see many cons:

in a log file (which in production is always what we will be looking at) the special characters can mess up some text editors
the actual setup assume that the user have a white background, mine is black and running a Grid program will set my terminal to black text once then everything is unreadable and I have to kill my session

Pros: ??

Let me know what you think

--enable-lapack flag problems

It seems that the configure flag --enable-lapack just checks if liblapack is there. However, it would be good if it also checks if MKL is there and continue if this is the case. Maybe by test-compiling a small file which makes a lapack call with and without trailing underscores and if one of it succeeds, just continue.

Add more tests to Travis

I suggest to add few more tests to the Travis CI:

Test_simd
Test_cshift
Test_hmc (one of them that runs one or two trajectories)
Test_cayley_cg
Test_stencil

Licensing

Hi,

As the code is available publicly and certainly receive attention, I think we should license it properly. There is already a GPLv3 text distributed in COPYING. It would be safer to mark all source file as prescribed by the FSF guys. I suggest that we add a license header to each individual source file . I propose the following header:

/*
 * <filename>.cc, part of Grid
 *
 * Copyright (C) 2015 <author list>
 *
 * Grid is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Grid is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Grid.  If not, see <http://www.gnu.org/licenses/>.
 */

Please tell me if you think another license is more suitable.

Branch cleaning for v0.6.0

We should try to go through the different feature branches and see which ones are fully integrated into develop and should be removed. I have spotted the following ones:

knl-stats
hirep

Please advise if the development of these features is considered finished.

Development flow

Hi,

Recently my attention was attracted to the following extension of git to manage the development flow of a project (features, stable/unstable branches, ...): http://danielkummer.github.io/git-flow-cheatsheet/

I thought it was worth considering it regarding the recent discussion and Guido's suggestion (which I completely support) of some form quality control and maybe milestones.

Let me know what you think.

Shared memory buffers allocated even with standard MPI comms

I am running on Cori and it crashed with the error:

NOARCH.splanc.x: ../../../src/Grid/lib/communicator/Communicator_base.cc:49: void *Grid::CartesianCommunicator::ShmBufferMalloc(unsigned long): Assertion heap_bytes<MAX_MPI_SHM_BYTES' failed.`

I have Grid configured with standard MPI comms so by my understanding, these shared buffer allocs for the comms buffers are not needed. However it appears that Stencil.h doesn't check the MPI mode. It would be great if Stencil did a check that Grid is configured to use the hybrid MPI and if not just do regular allocs.

AVXFMA4?

Hi,

My laptop does not support AVX2 instructions. When I try to compile with AVXFMA4, the compilation craches complaining that I am using AVX2 intrinsics unsupported by my machine.
So my question is: is that really an AVX1+FMA target or did it become redundant with AVX2 (especially now that Peter added -mfma to AVX2)?

Develop broken

Develop is generating an internal compiler error since a commit about 5 days ago under travis and GCC5.

Human-made documentation (not Doxygen)

Just to start the discussion on a real documentation. There are many possibilities, I can think of the following:

Paper-style PDF

Pro: can be put on arXiv, good for reaching the community
Con: quite "static", with a risk of the document becoming quickly obsolete

Documentation CMS
It seems that the developper community quite like Sphinx which was originally designed for Python documentation. A lot of examples can be found here.

Pro: nice to browse, search. More dynamical
Con: we need to learn how to use the thing (but that does not look that hard)

Compilation on cray machines [SOLVED]

I have several issues compiling GRID on a cray machine.

the automatically generated Makefile in lib does not get the correct include paths:
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
CXX Init.o
CXX PerfCount.o
CXX algorithms/approx/MultiShiftFunction.o
CXX Log.o
CXX qcd/action/fermion/CayleyFermion5D.o
CXX qcd/action/fermion/ContinuedFractionFermion5D.o
CXX qcd/action/fermion/PartialFractionFermion5D.o
CXX qcd/action/fermion/WilsonFermion.o
CXX qcd/action/fermion/WilsonKernels.o
CXX qcd/action/fermion/WilsonFermion5D.o
CXX qcd/action/fermion/WilsonKernelsAsm.o
CXX qcd/action/fermion/WilsonKernelsHand.o
In file included from ../../src/lib/qcd/action/fermion/CayleyFermion5D.cc:32:0:
../../src/lib/Grid.h:62:46: fatal error: Grid/serialisation/Serialisation.h: No such file or directory
#include <Grid/serialisation/Serialisation.h>
That I could fix by manually adding -I's in the generated makefile
the lib compilation uses gcc/g++ and not the compiler I selected. I want to use the cray wrappers cc/CC to enable cray mpi, but I got:
make[1]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
CXX Init.o
In file included from ../../src/include/Grid/Communicator.h:31:0,
from ../../src/lib/Grid.h:72,
from ../../src/lib/Init.cc:44:
../../src/include/Grid/communicator/Communicator_base.h:35:17: fatal error: mpi.h: No such file or directory
#include <mpi.h>
That is expected, as gcc does not know about mpi. Please fix this so that CC/CXX is actually CC/CXX specified by the user
after fixing that, the next error is:
../../src/include/Grid/Stencil.h(276): error: a value of type "Grid::iScalar<Grid::iVector<Grid::iVector<Grid::vComplexF, 3>, 2>> *" cannot be used to initialize an entity of type "uint64_t={unsigned long}"
uint64_t cbase = & comm_buf[0];
to me this looks like a some implicit casting the intel compiler does not like. I think an explicit typecast would be healthy here.
after fixing these things, I finally get:
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
make[2]: *** No rule to make target 'simd/Grid_empty.h', needed by 'all-am'. Stop.
That I don't know how to solve. Please advise.

Best
Thorsten

FFTW, Eigen, directory reorganisation

See pre-development branch

feature/Ls-vectorised-actions

I've concluded we really require easy access to dense matrix functionality.

There's been a few places:
Lanczos,
now also Mobius with s-vectorisation

where it is needed. Depending on Eigen seems the least ugly option.

Similarly, FFT is becoming important.

-- fourier accel gauge fixing
-- measurement mom projection
-- QED
-- Gauge fixed smearing by convolution
-- etc...

Headers

I made include/Grid a symlink to lib/ and all includes be

include <Grid/Grid.h> etc...

The headers remain with source, side by side in tree, but the logical include path is
include/Grid/

AC_LINK_FILES is used for same effect in the build directory.

prerequisites

I added a prerequisites subdirectory that gets built first.
-- This downloads and caches (in the source tree) the Eigen and FFTW packages.
-- In the build directory, it untars and builds FFTW, untars Eigen.
-- Eigen headers, the FFTW header and the compiled FFTW library are moved into
the include/Grid/Eigen/
include/Grid/fftw3.h

These are therefore disambiguated from any system installed versions kicking around,
as I want to avoid blowing in the wind as various clusters module systems control
software versions.

Thus the header only Eigen, and the fftw3 header and library get installed along with Grid,
but pulled from their source repositories and cached in the Grid source tree
from the first checkout & build someone does from github.

Tests
The number of tests was growing too long in an unmangaged flat directory

I've added subdirectories, and only core tests get built by default.
We could make a named travis test directory and run all tests in there.

Thoughts welcome...

mpi-auto fails on ARCHER

the flag mpi-auto triggers a failure in the configure step in ARCHER.
The responsible is the
LX_FIND_MPI
which seems unsupported by the Cray wrappers in that machine.
Compilation with just
--enable-comms=mpi
works.

Two options

discard this as a localised problem on ARCHER (but not sure yet)
address by making the search for mpi flags more portable

Segmentation fault with Benchmark_wilson

Dear contributers,

I tried to run benchmarks, but Benchmark_wilson failed due to Segmentation fault.
Benchmark_dwf and Benchmark_zmm seem to have the same problem.
I am not sure this is a problem of Grid or gcc behind the intel compiler.

version of the Grid:
master, as of Apr. 21 (d9b5e66)

$ ./Benchmark_wilson --debug-signals

||||||||||||||__
||||||||||||||__
|| | | | | | | | | | | | |_
| |_
|_ GGGG RRRR III DDDD _|
|_ G R R I D D _|
|_ G R R I D D _|
|_ G GG RRRR I D D _|
|_ G G R R I D D _|
|_ GGGG R R III DDDD _|
| |_
||||||||||||||__
||||||||||||||__
| | | | | | | | | | | | | |

Copyright (C) 2015 Peter Boyle, Azusa Yamaguchi, Guido Cossu, Antonin Portelli and other authors
Colours by Tadahito Boyle

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

Grid : Message : 0 ms : Grid is setup to use 1 threads
Grid : Message : 0 ms : Grid floating point word size is REALF4
Grid : Message : 0 ms : Grid floating point word size is REALD8
Grid : Message : 0 ms : Grid floating point word size is REAL4
Grid : Message : 1134 ms : Calling Dw
Caught signal 11
mem address 4
code 1
instruction 3c4c20e0fc
rdi 77af20
rsi 0
rbp 0
rbx 0
rdx 22088400
rax 0
rcx 20
rsp 7ffc40e26ed0
rip 3c4c20e0fc
r8 2eb4a67
r9 0
r10 7ffc40e26d40
r11 7ffc40e26cf0
r12 3c56cefa88
r13 0
r14 13c65c0
r15 0
BackTrace Strings: 0 ./Benchmark_wilson() [0x4366da]
BackTrace Strings: 1 /lib64/libc.so.6() [0x3c4ca326a0]
BackTrace Strings: 2 /lib64/ld-linux-x86-64.so.2() [0x3c4c20e0fc]
BackTrace Strings: 3 /lib64/ld-linux-x86-64.so.2() [0x3c4c2148f5]
BackTrace Strings: 4 /usr/lib64/libstdc++.so.6(_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE+0x97) [0x3c56ab65a7]
BackTrace Strings: 5 ./Benchmark_wilson() [0x441170]
BackTrace Strings: 6 ./Benchmark_wilson() [0x46c105]
BackTrace Strings: 7 ./Benchmark_wilson() [0x46c045]
BackTrace Strings: 8 ./Benchmark_wilson() [0x487d8e]
BackTrace Strings: 9 ./Benchmark_wilson() [0x487d00]
BackTrace Strings: 10 ./Benchmark_wilson() [0x48761b]
BackTrace Strings: 11 ./Benchmark_wilson() [0x406cbd]
BackTrace Strings: 12 /lib64/libc.so.6(__libc_start_main+0xfd) [0x3c4ca1ed5d]
BackTrace Strings: 13 ./Benchmark_wilson() [0x4032c9]

Here is my configuration:

$ ../configure CXX=icpc --enable-simd=AVX --enable-precision=single CXXFLAGS="-std=c++11 -O0 -debug inline-debug-info -g " --enable-comms=none

$ icpc -v
icpc version 16.0.1 (gcc version 4.8.2 compatibility)

The build gave plenty of warnings, like:

../../lib/simd/Grid_avx.h(521): warning #167: argument of type "__m128" is incompatible with parameter of type "__m128i"
_mm256_alignr_epi32(ret,in,tmp,n);
^
detected during instantiation of "__m256 Grid::Optimization::Rotate::tRotate(__m256) [with n=0]" at line 494

../../lib/simd/Grid_avx.h(521): warning #167: argument of type "__m128i" is incompatible with parameter of type "const __m128 &"
_mm256_alignr_epi32(ret,in,tmp,n);
^
detected during instantiation of "__m256 Grid::Optimization::Rotate::tRotate(__m256) [with n=0]" at line 494

(it continues for about 48000 lines)

The machine has an Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz, which has the AVX,
and 16GB memory.

Other benchmarks:

OK:
Benchmark_su3
Benchmark_memory_asynch
Benchmark_memory_bandwidth

Failed:
Benchmark_comms : Aborted (due to --enable-comms=none, I guess)
Benchmark_dwf : Segmentation fault (after "Calling Dw")
Benchmark_zmm : Segmentation fault (after "Calling Dw")

Best regards,

Issaku

max_align_t

Some versions of clang++/g++ fails to compile lib/algorithms/approx/Remez.cc
Adding #include<stddef.h> before any other include's fixes it.

Benchmark_dwf* fails in branch develop

As per the title. I'm now in the stage of merging the smearing branch and I'm retesting everything, but not have the time to address immediately this issue. If someone wants to solve it in the meantime...
It was running in my branch before the merge

My configuration flags
../../Grid/configure --enable-precision=single --enable-simd=AVX CXXFLAGS=-mavx -fopenmp=libomp -O3 -std=c++11 LDFLAGS=-fopenmp=libomp LIBS=-lgmp -lmpfr --enable-comms=none

Queries on OpenSHMEM collectives usage

I just happened to look at the OpenSHMEM usage in the library; it looks like that the collectives usage is little buggy. As per the OpenSHMEM standards, "Every element of this array(here pSync array) must be initialized with the value SHMEM_SYNC_VALUE (in C/C++) or SHMEM_SYNC_VALUE (in Fortran) before any of the PEs in the Active set enter the reduction routine."

Some random example from the library; In CartesianCommunicator::GlobalSumVector(double *d,int N), it looks like psync lacks initialization to SHMEM_SYNC_VALUE

Consider std::unordered_map instead of std::map in NerscField

If you're always using C++11, then unordered_map may offer some performance improvements over std::map, since the best case lookup complexity is O(1) due to the underlying implementation using a hash table, and you're not iterating over header, so an ordered map isn't required.

On the other hand, loading an NERSC file is of course IO bound, so a few extra cycles spent using std::map probably won't affect things that much.

Develop doesn't compile

My compiler is complaining of invalid intrinsic calls in the Grid_avx.h code:

/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h: In static member function ‘static __m256i Grid::Optimization::PrecisionChange::StoH(__m256, __m256)’:
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:480:36: error: cannot convert ‘__m128i {aka __vector(2) long long int}’ to ‘__m128 {aka __vector(4) float}’ for argument ‘1’ to ‘__m256 _mm256_castps128_ps256(__m128)’
h = _mm256_castps128_ps256(ha);
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:481:38: error: cannot convert ‘__m128i {aka __vector(2) long long int}’ to ‘__m128 {aka __vector(4) float}’ for argument ‘2’ to ‘__m256 _mm256_insertf128_ps(__m256, __m128, int)’
h = _mm256_insertf128_ps(h,hb,1);
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:485:14: error: cannot convert ‘__m256 {aka __vector(8) float}’ to ‘__m256i {aka __vector(4) long long int}’ in return
return h;
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h: In static member function ‘static void Grid::Optimization::PrecisionChange::HtoS(__m256i, __m256&, __m256&)’:
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:489:53: error: cannot convert ‘__m256i {aka __vector(4) long long int}’ to ‘__m256 {aka __vector(8) float}’ for argument ‘1’ to ‘__m128 _mm256_extractf128_ps(__m256, int)’
sa = _mm256_cvtph_ps(_mm256_extractf128_ps(h,0));
^
/home/ckelly/CPS/src/grid_gitflow/Grid/include/Grid/simd/Grid_avx.h:490:53: error: cannot convert ‘__m256i {aka __vector(4) long long int}’ to ‘__m256 {aka __vector(8) float}’ for argument ‘1’ to ‘__m128 _mm256_extractf128_ps(__m256, int)’
sb = _mm256_cvtph_ps(_mm256_extractf128_ps(h,1));

I have managed to fix all the issues by modifying the code to the following (starting Grid_avx.h:475):

static inline __m256i StoH (__m256 a,__m256 b) {
__m256i hi;
#ifdef USE_FP16
__m256 h;
__m128i ha = _mm256_cvtps_ph(a,0);
__m128 hha = _mm_cvtepi32_ps(ha);
__m128i hb = _mm256_cvtps_ph(b,0);
__m128 hhb = _mm_cvtepi32_ps(hb);
h = _mm256_castps128_ps256(hha);
h = _mm256_insertf128_ps(h,hhb,1);
hi = _mm256_cvtps_epi32(h);
#else
assert(0);
#endif
return hi;
}
static inline void HtoS (__m256i h,__m256 &sa,__m256 &sb) {
#ifdef USE_FP16
__m256 hh = _mm256_cvtepi32_ps(h);
__m128 hh0 = _mm256_extractf128_ps(hh,0);
__m128i hh0i = _mm_cvtps_epi32(hh0);
__m128 hh1 = _mm256_extractf128_ps(hh,1);
__m128i hh1i = _mm_cvtps_epi32(hh1);
sa = _mm256_cvtph_ps(hh0i);
sb = _mm256_cvtph_ps(hh1i);
#else
assert(0);
#endif

Unfortunately this involves a lot more instructions!

File naming conventions

I've finally got fed up files with a Grid_ in front. What seemed like a good idea when there
were one or two files is blatantly dumb when we have a whole tree which has the "Grid" in the
top directory name (where it should be) anyway.

I've started switching to C++ style capitalised names like

MobiusZolotarevFermion.h

Any reasons to do lower case mobius_zolotarev_fermion.h

I plan to do a global rename exercise soon, and think that the file name reflecting the capitalised
class name is the simplest. and will do so unless there are objections persuading me otherwise.

Doxygen?

What is the point of view of the developers on starting to write the doxygen annotations before the code becomes too big?
Any other suggestion about the documentation?

Installation structure

Hi Guys!

As you know I have been experiencing with Gruido last week.
I am using Grid as an external library, and I had a little pain in making my code able to compile against the "installed" version of Grid.

I would assume that one is expected to pass to oneself program the flags

-I$grid_prefix/include -L$grid_prefix/install/lib -lGrid

(where $grid_prefix is the prefix passed when configuring Grid) in agreement with typical package folder structure. If then I include Grid.h as per:

#include <Grid/Grid.h>

I get the error:
$grid_prefix/include/Grid/algorithms/approx/Remez.h:19:20: fatal error: Config.h: No such file or directory #include <Config.h>

in facts, the file Config.h is included in $grid_prefix/include/Grid, which is not in the searched path.

If I pass instead

-I$grid_prefix/include/Grid

and include Grid.h as in:

#include <Grid.h>

what I obtain is the error:

$grid_prefix/include/Grid/Grid.h:63:24: fatal error: Grid/Timer.h: No such file or directory #include <Grid/Timer.h>

beacuse now "Grid/" is already parth of the search path. So ultimately I need to compile the code with both paths:

-I$grid_prefix/include -I$grid_prefix/include/Grid

Provided that this combination is kept, everything goes fine, but I find this hard intuitive.

Another issue is the fact that, as you distribute Config.h file, the "PACKAGE_NAME", "PACKAGE_STRING" etc macros clash with those used in my autotools generated header. A typical solution that I've seen used is to wrap the Config.h in a "true" included file, and then rename the package macros. For example c-lime library does the following:

#ifndef LIME_CONFIG_H
#define LIME_CONFIG_H

/* Undef the unwanted from the environment -- eg the compiler command line */
#undef PACKAGE
#undef PACKAGE_BUGREPORT
#undef PACKAGE_NAME
#undef PACKAGE_STRING
#undef PACKAGE_TARNAME
#undef PACKAGE_VERSION
#undef VERSION

/* Include the stuff generated by autoconf */
#include "lime_config_internal.h"

/* Prefix everything with LIME_ /
static const char const LIME_PACKAGE = PACKAGE;
static const char* const LIME_PACKAGE_BUGREPORT = PACKAGE_BUGREPORT;
static const char* const LIME_PACKAGE_NAME = PACKAGE_NAME;
static const char* const LIME_PACKAGE_STRING = PACKAGE_STRING;
static const char* const LIME_PACKAGE_TARNAME = PACKAGE_TARNAME;
static const char* const LIME_PACKAGE_VERSION = PACKAGE_VERSION;
/* LIME_VERSION is already defined in lime_defs.h */

/* Undef the unwanted */
#undef PACKAGE
#undef PACKAGE_BUGREPORT
#undef PACKAGE_NAME
#undef PACKAGE_STRING
#undef PACKAGE_TARNAME
#undef PACKAGE_VERSION
#undef VERSION
#endif

Intel Compilation Error

We get the following error when we try to compile Grid on Intel

In file included from ../../../lib/qcd/action/Actions.h(44),
from ../../../lib/qcd/QCD.h(460),
from ../../../lib/Grid.h(78),
from ../../../lib/PerfCount.cc(29):
/usr/include/c++/5/bits/stl_iterator_base_types.h(154): error: name followed by "::" must be a class or namespace name
typedef typename _Iterator::iterator_category iterator_category;

Please find attached the full list of errors

$ icpc --version
icpc (ICC) 16.0.0 20150815
Copyright (C) 1985-2015 Intel Corporation. All rights reserved.

$ g++ --version
g++ (Ubuntu 5.1.1-4ubuntu12) 5.1.1 20150504
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO

build_out.txt
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Const-correctness & access rights

That is going to be very painful, but I think that on the long run it can really pay in term of clarity for users and the way other softwares will base themselves on Grid. The idea would full scan all the declarations to insure:

that function arguments & class methods are set to const when appropriate to reduce the risk of silent buggy variable changes
that class members & inheritance have the right level of accuracy to avoid exposure of internals outside Grid

mpi-auto fails on ARCHER

Posting here before propagating to the devel

RNG management

Need a way to handle RNG management in QCD with multiple live grids.

SpaceTimeGrid class is perhaps a place holder for central lattice information,
could create a Grid hierarchy there.

We could make several simplifying assumptions:

i) 5d is never spread out; 4d RNG's suffice.
ii) Subdivided Grids make use of RNG's from the 0,0,0,0 subcell element of the finest grid.

We would only ever have to save and restore 4d RNG's then, and alternate routines
for RNG filling to index the corresponding RNG on a different grid.

Tradeoffs:

I want the RNG sequence to be independent of machine decomposition.
IroIro no longer does this, and gains from the Mersenne twister skip ahead by having
one RNG per node. This gives up both machine decomposition independence AND the
threading of RNG generation within an MPI task.
Could make RNG's live on a coarser grid. (Coarsest?). Suppresses RNG state volume.
CPS does a version of this with one RNG per hypercube.

Making this quite general -- fill a fine grid from a coarse grid RNG that subdivides the fine
grid, allowing for 5d/4d -- would enable unable suppression of RNG state volume, while retaining
ability to parallelise within a node and also retaining machine decomposition independence providing
we do not subdivide too much.
I am tempted to expand lib/qcd/utils/SpaceTimeGrid.h/cc to retain a sequence of
global Grid objects for QCD running (Fermion Grid, Gauge Grid, RNGGrid) and
provide the subdivided RNG grid fill, save/restore etc...
Similarly retain the single serial RNG here.

Comments on this strategy welcome. With a Mersenne Twister implementation we can
take a single seed and skip-ahead instead of reseeding with random as is presently done with ranlux.

large heap memory consumption in mpi mode

I the following problem

[tkurth@gert01 GRID]$ tail -f slurm-3046972.out

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.


Grid : Message        : Requesting 134217728 byte stencil comms buffers 
Grid : Message        : Grid is setup to use 32 threads
Grid : Message        : Making s innermost grids
^[[A^[[A^[[ABenchmark_dwf: ../../src/lib/communicator/Communicator_base.cc:49: void *Grid::CartesianCommunicator::ShmBufferMalloc(unsigned long): Assertion `heap_bytes<MAX_MPI_SHM_BYTES' failed.
 ShmBufferMalloc exceeded shared heap size -- try increasing with --shm <MB> flag
 Parameter specified in units of MB (megabytes) 
 Current value is 128
Benchmark_dwf: ../../src/lib/communicator/Communicator_base.cc:49: void *Grid::CartesianCommunicator::ShmBufferMalloc(unsigned long): Assertion `heap_bytes<MAX_MPI_SHM_BYTES' failed.
srun: error: nid02439: tasks 0-1: Aborted
srun: Terminating job step 3046972.0

the code hangs when it tries to make the innermost grids and then fails after 10 minutes. this is my run script

[tkurth@gert01 GRID]$ cat benchmark_dwf.sh
#!/bin/bash
#SBATCH --ntasks-per-core=4
#SBATCH -N 1
#SBATCH -A mpccc
#SBATCH -p regular
#SBATCH -t 2:00:00
#SBATCH -C knl,quad,cache

export OMP_NUM_THREADS=32
export OMP_PLACES=threads
export OMP_PROC_BIND=spread

#MPI stuff
export MPICH_NEMESIS_ASYNC_PROGRESS=MC
export MPICH_MAX_THREAD_SAFETY=multiple
export MPICH_USE_DMAPP_COLL=1

srun -n 2 -c 136 --cpu_bind=cores ./install/grid_sp_mpi/bin/Benchmark_dwf --threads 32 --grid 32.32.32.32 --mpi 1.1.1.2 --dslash-asm --cacheblocking=4.2.2.1
[config.log.txt](https://github.com/paboyle/Grid/files/572824/config.log.txt)
[config.summary.txt](https://github.com/paboyle/Grid/files/572823/config.summary.txt)

commit version

commit c067051d5ff1a3f4c4dea0e72cc9b1b0ad092c7a
Merge: bc248b6 afdeb2b
Author: paboyle <[email protected]>
Date:   Wed Nov 2 13:59:18 2016 +0000

    Merge branch 'develop' into release/v0.6.0

KNL bin1, cray xc-40, intel 16.0.3.210
build script and configure

#!/bin/bash -l

#module loads
module unload craype-haswell
module load craype-mic-knl
module load cray-memkind

precision=single
comms=mpi

if [ "${precision}" == "single" ]; then
    installpath=$(pwd)/install/grid_sp_${comms}
else
    installpath=$(pwd)/install/grid_dp_${comms}
fi

mkdir -p build

cd build
../src/configure --prefix=${installpath} \
    --enable-simd=KNL \
    --enable-precision=${precision} \
    --enable-comms=${comms} \
    --host=x86_64-unknown-linux \
    --enable-mkl \
    CXX="CC" \
    CC="cc"
        
    #CXXFLAGS="-mkl -xMIC-AVX512 -std=c++11" \
    #CFLAGS="-mkl -xMIC-AVX512 -std=c99" \
    #LDFLAGS="-mkl -lmemkind"

make -j12 
make install

cd ..

attached config.log
attached config. summary
no make.log but should not be necessary hopefully

serialisation

http://stackoverflow.com/questions/11031062/c-preprocessor-avoid-code-repetition-of-member-variable-list/11744832#11744832

good example of how to use variadic macros; simplifies the Hana library Bob discovered
and gives reflection (serialise/deserialise).

This is still dependent on Boost PP (preprocessor) but can strip that out too,

This could give a better version of what i did in ukhadron.

Travis test failure on XCODE5

Travis on XCODE 5 continues to fail due to the time it takes to compile in the virtual machine.
I tried to separate the matrix of computations splitting the single and double but without success. The env options inside the matrix: seem not to allow for splitting the compilations and external env: will be overridden.
Any ideas?

Building Grid on machine with AMD porcessors

Hi,
I built Grid for the Bc-cluster at Fermilab (AMD Opteron 6320) using icc v16 with impi 5.1.3. No issues when building but when executing the binary only the following message is printed:

Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT and AVX instructions.

Allowed flags from /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1

Outputs from configure and make (config.log, config.out) are attached. I'm mainly testing the configure script and cleaning-up my install.
CXX=icpc ../Grid/configure --enable-simd=AVX --enable-comms=mpi-auto

I have a working build using the same intel compilers on this machine. However, that was created by explicitly specifying compile options on September 26, 2016 (cf config.log-2016-09-26).

Thank you,
Oliver
PS: To upload files I attached suffix '.txt' to the filename.

config.log.txt
config.log-2016-09-26.txt
config.out.txt

barrier synch model for CG

I'm finding on a small test I can get a substantive speed up out
of using the BFM style long lived thread with barrier synch.

e.g. Single node 8^4:
620GF/s -> 910 GF/s on KNL 7210 (SP)
62GF/s -> 72 GF/s on BG/Q node (DP)

On 16^4 local volumes on KNL the gain is minimal though.

If we act, this pushes us into a threading nightmare however, with many routines having
to accept multiple threads entering them.

I will think a little about whether the "parallel_for" macro can work around this and the implications,
but if there is no easy common solution we are looking at having to make "DhopThread"
and "Dhop" routines and changing the CG and other solvers to run in thread / barrier mode.

This seems to be mandated by self threading being 1.5x faster than OpenMP for loops.

I'll make the comparison a little more robust though. I'm tempted to not do this even if it means
a little less performance on the sweet spot on BG/Q since the software cost is large and perhaps
we should just accept it.

Header mess needs tidying?

Hi,

Through my recent development (HDF5, Gamma) I really struggled with Grid header structures. A lot of headers can only be included assuming a very specific sequence of previous includes done externally to the header itself. I was curious enough to try to follow include chains and in some case it is really involved.

That scares me a bit considering that Grid is growing fast, because this can go out of control rather quickly. The main issue is that the whole structure is becoming increasingly cryptic:

It is very hard to know where something is defined, one would just like to look at the first lines to see the includes but in a lot of case it is empty. One needs to understand the order of inclusion in a larger structure which is becoming increasingly complex.
There is a lot of duplication in standard header inclusion.
IDE/tools parsing the code get completely lost because, again, the correct chain of inclusion is only visible at the highest level.

I am not advocating that we should change the include strategy, but rather that we consolidate it. One possible strategy could be:

Concentrate all the standard headers and general purpose macros in a Global.h file.
Have a standard template for Grid headers, e.g include guard then #include <Grid/Global.h> then #include of thematic headers (Cartesian.h, Algorithms.h) necessary for the definition in the current header.
This is a change that would not change the current header interface at all, any program just using Grid.h would be fine. It is just about adding a bunch of Grid includes on top of each header to make them self-consistent and independent.

This is what I am already doing in the measurement code, the loading order of headers can be permuted arbitrarily and any header is self-consistent in term of definitions & declarations (i.e. it can be included alone without errors). Of course although this is not a huge change either it will be a complete pain to do. I would be happy to volunteer doing so but I won't do anything without having your opinion. But I would say the code would gain quite some readability and robustness.

Any thoughts?

Milestones?

I have started adding the '0.6.0' milestone to some issues. I think this system is useful to plan releases. Tell me if you think it is inconvenient.

Consider std::array in place of boost::array

You mention boost arrays in the TODO list, so I thought I'd mention this on the off-chance you hadn't already seen it. C++11 provides an array class template that's pretty much the same as the boost array type.

Instrument CG and linear operation for flops count

Add the info of flops count to the solvers and HMC routines.
Added here as a memo for everyone.

zMobius CG convergence?

Hi,

Would you mind checking if the zMobius CG converges when omega has imaginary components? I've modified Test_dwf_cg_prec DomainWallFermionR -> ZMobiusFermionR and it fails to converge.
In case you are wondering. omega(s) = 0.25 + 0.01 i

FMA support in clang++-3.8

Compiling Grid with clang++-3.8 and AVX2 support appears to trigger a compiler error related to FMA support in the intrinsics.

$ cd Grid
$ git rev-parse --short HEAD
5e02392

In file included from qcd/action/fermion/CayleyFermion5D.cc:31:
In file included from ./Grid.h:68:
In file included from ./Simd.h:166:
In file included from ./simd/Grid_vector_types.h:47:
./simd/Grid_avx.h:240:14: error: always_inline function '_mm256_fmaddsub_ps' requires target feature 'fma', but would be
inlined into function 'operator()' that is compiled without support for 'fma'
return _mm256_fmaddsub_ps( a_real, b, a_imag ); // Ar Br , Ar Bi +- Ai Bi = ArBr-AiBi , ArBi+AiBr
^
./simd/Grid_avx.h:286:14: error: always_inline function '_mm256_fmaddsub_pd' requires target feature 'fma', but would be
inlined into function 'operator()' that is compiled without support for 'fma'
return _mm256_fmaddsub_pd( a_real, b, a_imag ); // Ar Br , Ar Bi +- Ai Bi = ArBr-AiBi , ArBi+AiBr
^
fatal error: error in backend: Cannot select: 0x76cec80: v8f32 = X86ISD::FMADDSUB 0x787a1d0, 0x7a523a0, 0x7a52600

===== system details ========

$ cat /etc/redhat-release
Scientific Linux release 7.2 (Nitrogen)

$ uname -a
Linux yosemite.fnal.gov 3.10.0-229.20.1.el7.x86_64 #1 SMP Wed Nov 4 10:08:36 CST 2015 x86_64 x86_64 x86_64 GNU/Linux

$ less /proc/cpuinfo
model name : Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz

$ clang++ -v
clang version 3.8.0 (tags/RELEASE_380/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/james/installed/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.2
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.8.5
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64

$ ./configure CXX=clang++ CXXFLAGS="-std=c++11 -O3 -mavx2 -fopenmp -lomp" --enable-simd=AVX2

Summary of configuration for grid v1.0

The following features are enabled:

architecture (build) : x86_64
os (build) : linux-gnu
architecture (target) : x86_64
os (target) : linux-gnu
build DOXYGEN documentation : no
graphs and diagrams : no

- Supported SIMD flags : -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx

enabled simd support : AVX2 (config macro says supported: no )
communications type : none
default precision : double
RNG choice : ranlux48
LAPACK : no

Suggestion: Create include directory

For more clarity in the code structure it would be better having the .h files now in lib moved to and include / directory in the root level.

parallel write to parallel file system

Hi,

I ran four tests on an 8^4 lattice on the summit machine at UC Boulder and similar tests on pi0 at Fermilab. All jobs were running on 2 nodes each with 1 mpi rank and 24 threads summit (16 threads pi0). The jobs differ by the type of the files ystem used for writing the ckpoints (NFS or GPFS summit; ZFS or lustre Fermilab) and whether I split the T or the Z direction (1 or 2 IO nodes)

summit
mpi SLURM-ID
1.1.1.2 (2 IO nodes) NFS 462 280 MB/s
1.1.1.2 (2 IO nodes) GPFS 461 0.05 MB/s
1.1.2.1 (1 IO node) NFS 460 131 MB/s
1.1.2.1 (1 IO node) GPFS 455 79 MB/s

pi0 Fermilab
mpi PBS-ID (last three)
1.1.1.2 (2 IO nodes) ZFS 628 228 MB/s
1.1.1.2 (2 IO nodes) lustre 635 0.002 MB/s
1.1.2.1 (1 IO node) ZFS 626 110 MB/s
1.1.2.1 (1 IO node) lustre 627 3-20 MB/s

Unfortunately, I didn't find performance values for the single I/O writing the rng-files in the log-files;
the full log-files are however attached and carry the SLURM-ID / PBS-ID in the filename. Do I need some special flag for parallel file systems? (striping?)

The parallel read of the ckpoint at the beginning of the job seems OK four all cases although in this tests not all jobs started from a checkpoint. On both machines Grid is compiled on NFS/ZFS.

Thank you,
Oliver

pi0.zip
summit.zip

High precision norms

All norms and innerProducts are no high precision norms using intermediate double.
The code Guido has written with a differentiated normHP should just call norm.

These should be bandwidth limited anyway, so there is no reason to not make default.

MPI freeze

Just for bookkeeping for 0.6 (Peter is aware of the origin of the issue), currently MPI freezes in an eternal wait and this should be corrected for 0.6.

non compiler portable syntax in indexing _m256 etc...

Intel compiler chokes on syntax used in
lib/simd/Grid_avx.h(360): error: expression must have pointer-to-object type
return v1[0];

For example. It is necessary to go through a "conv" union to get at element by element
access to the _mXXX vector intrinsic types in a way that works with Clang, G++ and Intel's compiler.

Problems running Benchmark Codes

I have problems running the benchmark codes. I can collect several issues here:

running the comms benchmark on 16 KNL nodes, 4 ranks per node with 32 threads per rank, on a 128^4 grid with 2^2*4^2 topology, works till the summary step, there it fails:

Grid : Message        : 24906 ms : 30       4       10368000        1198.7      2397.41
Grid : Message        : 26629 ms : 30       8       20736000        1199.82     2399.64
Grid : Message        : 30097 ms : 30       16      41472000        1208.64     2417.28
Grid : Message        : 30599 ms : 32       1       3145728     710.73      1421.46
Grid : Message        : 31088 ms : 32       2       6291456     1173.55     2347.1
Grid : Message        : 32166 ms : 32       4       12582912        1159.37     2318.73
Grid : Message        : 34211 ms : 32       8       25165824        1231.09     2462.19
Grid : Message        : 38414 ms : 32       16      50331648        1210.31     2420.63
Grid : Message        : 38500 ms : ====================================================================================================
Grid : Message        : 38500 ms : = Benchmarking sequential halo exchange in 4 dimensions
Grid : Message        : 38500 ms : ====================================================================================================
Grid : Message        : 38500 ms :   L           Ls         bytes       MB/s uni        MB/s bidi
srun: error: nid12126: task 23: Floating point exception

There might be a division by zero or something.

Running the DWF benchmark on multiple nodes, the code hangs in
Grid : Message : 116 ms : Making s innermost grids
And that's it. Waited for 30 minutes, but it got stuck. Possible deadlock?
Running the DWF benchmark on single node, the code crashes with some assertion:

||||||||||||||__
||||||||||||||__
|_ | | | | | | | | | | | | _|
|_ _|
|_ GGGG RRRR III DDDD _|
|_ G R R I D D _|
|_ G R R I D D _|
|_ G GG RRRR I D D _|
|_ G G R R I D D _|
|_ GGGG R R III DDDD _|
|_ _|
||||||||||||||__
||||||||||||||__
| | | | | | | | | | | | | |

Grid : Message : 25 ms : Grid is setup to use 128 threads
Grid : Message : 116 ms : Making s innermost grids
Grid : Message : 15963 ms : Naive wilson implementation
Grid : Message : 15964 ms : Calling Dw
Benchmark_dwf: ../../src/lib/qcd/action/fermion/WilsonKernelsAsm.cc:46: void Grid::QCD::WilsonKernels::DiracOptAsmDhopSite(Impl::StencilImpl &, Grid::LebesgueOrder &, Impl::DoubledGaugeField &, std::vector<Impl::SiteHalfSpinor, Grid::alignedAllocatorImpl::SiteHalfSpinor> &, int, int, int, int, const Impl::FermionField &, Impl::FermionField &) [with Impl = Grid::QCD::WilsonImplGrid::Grid_simd<std::complex<double, __m512d>, Grid::QCD::FundamentalRep<3>, double>]: Assertion `0' failed.
srun: error: nid12151: task 0: Aborted
srun: Terminating job step 3027515.1```

Can you help me here?

Action classes & parameter I/O

I can start working on the action parts (gauge and fermions).
We have to decide the target design functionality and style.

I have in mind something similar to what I have in IroIro++, a kind of lightweight version of the Chroma ones.

You can have a look here
https://github.com/coppolachan/IroIro/tree/master/lib/Action
and here for the corresponding HMC implementation (leapfrog, 2MN, multilevel)
https://github.com/coppolachan/IroIro/tree/master/lib/HMC

Travis failing for Linux clang builds

The Travis build failing for Linux/clang come from the fact that LLVM guys closed their APT repository because network traffic was to heavy http://lists.llvm.org/pipermail/llvm-dev/2016-May/100303.html.
This is not surprising considering the increasing number of CI bots all downloading over and over again clang from their server.
Finding a plan B would be very painful, and some say this is just temporary and that the server will reopen.
I will keep a look on it and figure out something if it is not solved on LLVM side.

Autoconf files

This is not an issue - more an information message (I don't know if the github messages attached to the commits get broadcasted to everyone so I write also here)
In the latest commit I added a .gitignore file to ignore autoconf files in the commits, and also the compiled libraries and created makefiles from automake. this would allow contributors to skip reconfiguring the tools and keep their own environment.

For new users, I also added a simple utility called reconfigure_script (maybe should be moved to the scripts dir) that runs all autotools command to setup the correct environment.

File formats support

Add new file formats:

ILDG is needed
Some people from US suggested/requested HDF5
other suggestions?

DO NOT REFORMAT FILES

I want to be clear on this.

My source code is written to be readable by me.

I do NOT appreciate to reformatting source files, and committing them, especially if
this is done brainlessly by an automatic tool, but in any case the style of the author
is key. It does not matter if you do not like my style and choices.

A complete mess has been made of several source key files, and further
dozens of files have been pointlessly changed in a way that violates my
personal preference of not acquiring high levels of indentation when entering
the Grid or QCD namespace.

This floating indent, combined with a tool based application of 80 character wrap
creates unreadable code from code that was previously easily readable by the author.

Further, readability is in the judge of the prime author of a given area of the code, and it
is rude and inappropriate to reformat without consultation and agreement.

I am now spending several hours reverting code with this waste of time created by thoughtless
action.

The worst case is the thoughtless application of automatic formatting to a critical
file with braces and scopes in ifdef's that hopeless confused the formatting.

You should NEVER be committing without first applying git diff and satisfying yourself that
your are in complete control of these changes with a very few lines deliberately modified with genuine purpose.

paboyle / grid Goto Github PK

grid's Issues

include <Grid/Grid.h> etc...

- Supported SIMD flags : -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx

Recommend Projects

Recommend Topics

Recommend Org