Coder Social home page Coder Social logo

Header mess needs tidying? about grid HOT 15 CLOSED

paboyle avatar paboyle commented on August 18, 2024 1
Header mess needs tidying?

from grid.

Comments (15)

aportelli avatar aportelli commented on August 18, 2024

Hi,

I am spending hours now to try to insert a new source file Gamma.cc which contains the constant arrays for the Dirac algebra. It is a mess, I get all kinds of compilation errors due to the header structure and I have to randomly guess what headers need to be included. One cheap solution is to do a header only implementation (that's what I did first) or to duplicate code. This is quite annoying and this is only going to get worse with time.

So feedback on the previous issue would be more than welcome. Again I am volunteering to clean that up but a lot of file are going to be modified so I don't want to go ahead by myself.

from grid.

aportelli avatar aportelli commented on August 18, 2024

Another stat: for most executable programs the build time is completely dominated by the preprocessor. I thought long compilation times were coming from the template structure but it is just about doing #include <Grid/Grid.h>. This is also going to grow with time and there is a serious risk we will end up in a situation similar to Chroma with build times going through the roof.

This can be avoided by only including 'thematic' headers (Algorithms, QCD, ...) and dropping the global header which can be part of the strategy I am mentioning above.

from grid.

chulwoo1 avatar chulwoo1 commented on August 18, 2024

Dear Atnonin,

I'm certainly sympathetic to the difficulties you ran into. Including the headers itself needs in each header is a common solution as you know. This with the usual trick of avoiding duplication usually ensures all the necessary headers will be included, whatever order you include them in the main program.

A main throwback of this approach of course is that the user has to include all the necessary headers in their main program, which can be tedious. Providing some thematic headers you described for convenience, in addition to the component headers may be a reasonable compromise.

from grid.

coppolachan avatar coppolachan commented on August 18, 2024

Dear Antonin

I totally agree, I think we also discussed a bit the topic some time ago.
I am also distressed by the structure of including everything into one file, Grid.h.

I agree with the separation into thematic headers that we maintain. I would go further and put these thematic headers into the include directory where users should find them (it is the most natural location where they will look into).

I am trying separate headers in the HMC sector but the code is too close to the original, so that there is not much freedom, unless we really do a full makeover.

G

from grid.

paboyle avatar paboyle commented on August 18, 2024

I don't see the difference between "Global.h" and "Grid.h".

The headers have a sequence; that is unavoidable.

Most people do not wish to delve into the sequence; that is something Antonin found difficult.

Exposing the sequence in all the files seems to complicate all the main programmes.

I need to see evidence that compile is preprocessor dominated; I see it as icpc dominated and specifically WilsonFermion5d and PartialFraction dominate the compile time, for example.

Consolidating in aggregates, which include all sequences of subincludes is sensible.

However, when there is a dependency, that has to be dealt with by the author unavoidably,
and this is either putting it in the right place in globals (localises sequence control) or
by having every single header know all its prerequisites.

The latter is quite a lot of distributed work, and often difficult to get right and verify all possible
sequences of includes users of the code can make.

from grid.

sunpho84 avatar sunpho84 commented on August 18, 2024

I would like to comment on a related issue.

Another main drawback of having all Grid files seeing the whole set of headers through Grid.h is that, whatever change is made to a given Grid header (even just erasing a single space) triggers automatically the recompilation of the whole code.

Given that IDE gets totally lost parsing individual headers (due to the issue mentioned by Antonino), this implies that to check the correct syntax of the tiniest modification requires 15 minutes of compilation or so, during which my laptop completely freezes, and I cannot do anything else until is done.

This could be improved by removing any internal dependency on Grid.h in .cc files, and including all headers needed to get the single .h files correctly parsed by an IDE, so that

  • syntax can checked at IDE scope instead that at compilation time,
  • the amount of code to recompile each time is reduced, and my laptop won't get an a heart attack.

Of course it is a bit tedious but it's not a too complicated change, I think even the last-arrived could get to the bottom.

from grid.

paboyle avatar paboyle commented on August 18, 2024

The use of a single global header is a tradeoff between fine grained rebuild granularity,
and simplicity presented to the programmer. Do you need to remember and expose
the dependencies for all the little sub-headers to the programmer, or just include Grid.h?

If the build time is manageable, a single header is simpler.

I'm not typically using an IDE as I'm usually logged on to MPI clusters
or parallel machines. Clearly easy development on laptops is a good thing, but the tail
should not necessarily wag the dog here.

The build times are not so bad on my latop.

machdep.cpu.brand_string: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

c001003:build peterboyle$ time make -j 8 >& /dev/null
real 2m24.389s

from grid.

sunpho84 avatar sunpho84 commented on August 18, 2024

I think there are two distinguished persons:

  • the user of the library: this poor guy certainly has not to remember the fine-grained header structure, and a single "Grid.h" is the nicest thing you can do to him (thematic headers can do the work as well of course)
  • the internal developer: this guy on the other hand should have a fair good understanding of the structure of the library, and managing sub-headers should be no problem to him. On the contrary, including only the actually needed header helps navigating the library dependencies

I think that having a single Grid.h included everywhere is quite unbalanced to the "simplicity presented to the programmer" side, and the "fine grained rebuild granularity" is totally lost. Where is the tradeoff?

from grid.

paboyle avatar paboyle commented on August 18, 2024

Antonin claimed the compile time is C preprocessor limited. This is wrong, and here is the proof.

Some data points; on KNL compile nodes (these are crap cores) the compile time
is dominated by two files; WilsonFermion5D.cc, PartialFraction5D.cc

There are 35 ".cc" files built to object in the library itself. Most of these build pretty quickly.
For example, I removed a file that does very little but include <Grid.h>; XmlIO.o and rebuilt.

Despite committing the "huge problem" of including Grid.h the overhead is not at all large
for files that don't instantiate the templates.

Haswell (clang)-- 3.6s
KNL (icpc) -- 7.8s

However, rebuilding the worst case WilsonFermion5D.o
Files that do instantiate the templates incur the overhead; but do so because they
need the templates.

Haswell (clang)-- 38.3s
KNL (icpc) -- 5m40s (!) KNL cores are pretty weak.

I would hazard that build times would be far better addressed by having
someone look in detail into what makes WilsonFermion5D and PartialFraction5D
by experimental elimination of code and finding out what is slow would help, and
I certainly don't accept idealogical statements that the single header is intrinsically bad.

You could reasonably claim that granular includes avoiding Grid.h on these
files only would suppress rebuild of these files to a strict minimum, and I'd be sympathetic
to that, but not to a general statement of principle.

These Fermion files that I find problematic depend on most of the code anyway:
-- Stencil,
-- Simd,
-- Tensors,
-- Lattice,
-- CShift.
-- Communicator
-- most of QCD/utils, spinor, etc...

Frankly, as the above comparison shows, this dependency is the reason they
actually are problematic in the first place, and is not easily eliminated.

The later files, after the action are only:

#include <Grid/qcd/smearing/Smearing.h>
#include <Grid/qcd/hmc/integrators/Integrator.h>
#include <Grid/qcd/hmc/integrators/Integrator_algorithm.h>
#include <Grid/qcd/hmc/HMC.h>
#include <Grid/parallelIO/NerscIO.h>
#include <Grid/qcd/hmc/NerscCheckpointer.h>
#include <Grid/qcd/hmc/HmcRunner.h>

So, the only header changes that have avoidable rebuild lie only under hmc, I/O etc...
or headers at the same level under Actions with no dependency.

That is only saving 6 headers, and the artificial coupling of the
actions to each other.

Granted the fake dependency on the HMC sector impacts Guido a lot ;
this could be fixed by a very small and limited change, and not a global change.
The coupling of the actions on realistically is only impacting me.

from grid.

aportelli avatar aportelli commented on August 18, 2024

In my original message I claimed that:

Another stat: for most executable programs the build time is completely dominated by the . preprocessor.

Maybe one should notice here that I am speaking about "executable programs", which is the final product people are interested it. This is very simple to see, on my laptop (Skylake i7) compiling the following:

#include <Grid/Grid.h>

int main(void)
{
  return 0;
}

takes ~4.5s for bot clang and icpc. On KNL with icpc it goes to 8s. Just running the preprocessing step takes about the same time (actually a bit more for clang !?). I think this is a bit much for an empty file and this is going to grow unbounded with time. For simple executables like the one I am looking at (measurement, ...) this is dominating the build time. Of course if you make the structure of the source file much more complex (like in these fermion sources you mentioned) you can manage to invert that.

This about just build performance issues, which was a secondary point. My main point was the Grid.hcreating a header dependency structure which is non-local, leading to dependencies which are hard to read/use for developers and impossible to parse for IDEs.

I proposed a simple edit that is about a couple of lines per file, this does not actually have to change the current include strategy and I volunteered to do it. I am still happy to do it but I am a bit puzzled at the strength of Peter's resistance. I proposed a simple improvement and I am certainly not interested by "ideological" debates about it. I think the current header design is problematic and although the effects are reasonable now, this might hit us hard in the back later. Maybe I am wrong, but I thought it was not useless to raise this point.

from grid.

paboyle avatar paboyle commented on August 18, 2024

I measure the timing of a compile to .o (2.0s ) of BinaryIO.cc and preprocessing only
(-E flag to cxx) as 0.19s , i.e. C preprocessor is only 10% of the time; parsing the included code
by the C++ compiler appears dominant to me.

I did a trial and partitioned the headers into:

"GridBase" with only the core, non-QCD library which is mainly header only
"QCD"
"ActionBase"
-> under this remove coupling between fermions & each other & the rest of HMC etc.
"Actions"
"HMC_aggregate"

The build time reduced from 2:28s to to 2:21s, which is pretty minor return on a days work.

The partitioning does give incremental compilation of the fermion action .cc files,
which is a benefit to software development in the pseudofermion and HMC sector.

from grid.

aportelli avatar aportelli commented on August 18, 2024

Hi Peter,

Again, the preprocessor time was not my main point and the compile times I mentioned are not Grid's one but external programs. My main point was about code readability which is unrelated to build performances. I think there is some constructive feedback here that this could be improved.

One additional issue concerning build performances with external programs are the part of the code using the "magic macros" which are particularly expensive to process. The amount of serialisable classes in the code is likely to increase in the future and concentrating all these includes in the same place does not sound like a good idea.

The GridBase you described is essentially identical to the Global.h file I proposed in my original message.

from grid.

paboyle avatar paboyle commented on August 18, 2024

I have committed a reorganisation into the feature/bgq-asm tree after merging it with develop.
-- Subdirectory aggregates, like Lattice.h etc... have all moved into their respective subdirectories.

  • Principle: aggregates are named after their own subdirectory.

thus, Tensor.h in tensor etc...

  • "Aggregate.h" is used to include everything in the directory by later, dependent files.

  • "AggregateCore.h" is used as required for common prerequisites for .cc files in that directory.

    lib/Grid.h includes all the aggregates

  • lib/GridCore.h plays the "Global.h/GridBase.h" role, includes the base includes

  • lib/GridQCDcore.h plays a QCD key types role

  • lib/action/Action.h

  • lib/action/FermionCore.h <--- support required for each Fermion action .cc file
    lib/action/Fermion.h <--- entire set of fermion actions used by later files, like HMC

I try to keep aggregates as big as possible without losing compile performance.

Unlike Antonin, I do not think locality of the dependencies is the ideal; centralising the sequence
as much as possible makes maintenance much easier, and changes avoid global editing.

from grid.

aportelli avatar aportelli commented on August 18, 2024

Hi Peter,

Thank you that sounds very good 😊

Unlike Antonin, I do not think locality of the dependencies is the ideal; centralising the sequence
as much as possible makes maintenance much easier, and changes avoid global editing.

I think it was objective to say that the header structure could be consolidated, which you just did. The local headers strategy proposal is much more subjective, defended by many OO programmers (one class, one file strategy) but also criticised in other contexts. That's the way as I am used to do things, but it is to a large extent a matter of taste.

from grid.

paboyle avatar paboyle commented on August 18, 2024

I believe this issue can be closed. It is all back in develop.

from grid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.