Coder Social home page Coder Social logo

qdp-jit's Introduction

qdp-jit's People

Contributors

fwinter avatar grokqcd avatar diptorupd avatar

Stargazers

ZhangXin8069 avatar Sergei Bastrakov avatar René Widera avatar

Watchers

 avatar James Cloos avatar Vardan Gyurjyan avatar  avatar Valerie Bookwalter avatar  avatar

qdp-jit's Issues

fconcepts vuild issue GCC 9

My compiler is GCC 9 and I have disabled the C++20 features flag. However it still tries to build the concepts, is there somewhere I can pass -fconcepts .
thanks

qdp-jit/lib/../include/qdp_sum.h:198:24: error: ‘concept’ does not name a type; did you mean ‘concat’?
198 | template concept ConceptHasShift = HasShift::value;

jit launch explicit geom error

Hi @fwinter , when I updated qdp-jit to the latest transpose bug fix (commit: 2b31be631645835febeb5accfe2cec120df40c05), there was a new issue appeared.

The test input.xml of chroma is as following:

<?xml version="1.0"?>
<chroma>
  <Param>
    <InlineMeasurements>
      <elem>
        <Name>MAKE_SOURCE</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>6</version>
          <Source>
            <version>3</version>
            <SourceType>SHELL_SOURCE</SourceType>
            <j_decay>3</j_decay>
            <t_srce>0 0 0 0</t_srce>
            <quark_smear_lastP>false</quark_smear_lastP>
            <SmearingParam>
              <wvf_kind>GAUGE_INV_GAUSSIAN</wvf_kind>
              <wvf_param>2.0</wvf_param>
              <wvfIntPar>30</wvfIntPar>
              <no_smear_dir>3</no_smear_dir>
            </SmearingParam>
            <Displacement>
              <version>1</version>
              <DisplacementType>NONE</DisplacementType>
            </Displacement>
          </Source>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <source_id>sh_source</source_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>PROPAGATOR</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>10</version>
          <quarkSpinType>FULL</quarkSpinType>
          <obsvP>false</obsvP>
          <numRetries>1</numRetries>
          <FermionAction>
            <FermAct>CLOVER</FermAct>
            <Mass>-0.04</Mass>
            <clovCoeffR>1.2</clovCoeffR>
            <clovCoeffT>0.6</clovCoeffT>
            <AnisoParam>
              <anisoP>true</anisoP>
              <t_dir>3</t_dir>
              <xi_0>5</xi_0>
              <nu>1</nu>
            </AnisoParam>
            <FermionBC>
              <FermBC>SIMPLE_FERMBC</FermBC>
              <boundary>1 1 1 -1</boundary>
            </FermionBC>
          </FermionAction>
          <InvertParam>
            <invType>CG_INVERTER</invType>
            <RsdCG>1e-08</RsdCG>
            <MaxCG>2000</MaxCG>
          </InvertParam>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <source_id>sh_source</source_id>
          <prop_id>sh_prop</prop_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>ERASE_NAMED_OBJECT</Name>
        <Frequency>1</Frequency>
        <NamedObject>
          <object_id>sh_source</object_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>SINK_SMEAR</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>5</version>
          <Sink>
            <version>2</version>
            <SinkType>POINT_SINK</SinkType>
            <j_decay>3</j_decay>
            <Displacement>
              <version>1</version>
              <DisplacementType>NONE</DisplacementType>
            </Displacement>
          </Sink>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <prop_id>sh_prop</prop_id>
          <smeared_prop_id>pt_sh_prop</smeared_prop_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>ERASE_NAMED_OBJECT</Name>
        <Frequency>1</Frequency>
        <NamedObject>
          <object_id>sh_prop</object_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>HADRON_SPECTRUM</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>1</version>
          <MesonP>true</MesonP>
          <BaryonP>true</BaryonP>
          <CurrentP>false</CurrentP>
          <time_rev>false</time_rev>
          <mom2_max>6</mom2_max>
          <avg_equiv_mom>true</avg_equiv_mom>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <sink_pairs>
            <elem>
              <first_id>pt_sh_prop</first_id>
              <second_id>pt_sh_prop</second_id>
            </elem>
          </sink_pairs>
        </NamedObject>
        <xml_file>hadspec.xml.t0</xml_file>
      </elem>
      <elem>
        <Name>ERASE_NAMED_OBJECT</Name>
        <Frequency>1</Frequency>
        <NamedObject>
          <object_id>pt_sh_prop</object_id>
        </NamedObject>
      </elem>
    </InlineMeasurements>
    <nrow>12 12 12 96</nrow>
  </Param>
  <RNG>
    <Seed>
      <elem>9996</elem>
      <elem>32552</elem>
      <elem>27027</elem>
      <elem>18583</elem>
    </Seed>
  </RNG>
  <Cfg>
    <cfg_type>WEAK_FIELD</cfg_type>
    <parallel_io>true</parallel_io>
  </Cfg>
</chroma>

then the mpirun -np 1 chroma -geom 1 1 1 1 -i input.xml will give jit launch explicit geom error, grid=(2,1,1), block=(1024,1,1) for this 12^3 x 96 lattice, while there is no such problem at least for commit c54122030be2de6cb41b007c2d3d4c60145e0e2b and I can run a 12^3 x 96 lattice on a single GPU at that commit.
Currently, if I change lattice size to 12^3 x 48, the code can run successfully but with the output of qdp-jit/lib/qdp_cache.cc:468: void QDP::QDPCache::signoff(int): Assertion 'vecEntry.size() > id' failed. after qdp-jit statistics.

The environment I used is cuda 11.7, llvm 13.0.0, chroma(devel: 4b2e1171ac307b7f4273186543afad5b25b7bc00) on
Tesla V100-SXM2-32GB.

32-bit roll-over detected in numbits() in qdp_random.cc

Hi All,
I got an email from Eric Gregory at JSC with this comment:

In the course of this I am running some pretty large lattices and noticing a bug or at least a limitation of QDP.
At the moment I am running 128^2x512

This causes
int nbits = numbits(Layout::vol());
to hang in initRNG() in qdp_random.cc as the lattice volume is 2^30, which I guess is the limit of integer size. So the while loop

int numbits(int x)
{
int num = 1;
int iceiling = 2;
while (iceiling <= x)
{
num++;
iceiling *= 2;
}
return num;
}

never exits as iceiling rolls over and becomes 0 (in this case).
I am guessing, insofar as qdp_random.cc is identical between QDP-JIT and qdp++ this will be an issue there too...

Build fail at XML

After building all the required packages including xpath_reader itself and then running the cmake on qdp-jit. Is there something special I need to do to get it to build xpath_reader in other libs or use the xpath_reader I have build?
Thanks

Extract from cmake, showing it found requirements.
Using LLVMConfig.cmake in /QCDSolvers/libs/LLVM13/lib/cmake/llvm
-- Found MPI_C: /usr/lib64/mpi/gcc/openmpi4/lib64/libmpi.so (found version "3.1")
-- Found MPI_CXX: /usr/lib64/mpi/gcc/openmpi4/lib64/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found LibXml2: /QCDSolvers/libs/xml2/lib64/libxm2.so

Make is where it fails complaining about xml.

[ 0%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/xml_simplewriter.cc.o
[ 0%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/attribute.cc.o
[ 1%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/xml_struct_writer.cc.o
gmake[2]: *** No rule to make target '/QCDSolvers/libs/xml2/lib64/libxm2.so', needed by 'other_libs/xpath_reader/lib/libxmlWriter.so'. Stop.

Explicitly link necessary LLVM libraries to target `jit`.

QDP-JIT links all available LLVM libraries to target jit by using LLVM_AVAILABLE_LIBS.

target_link_libraries( jit PUBLIC ${LLVM_AVAILABLE_LIBS} )

In some Linux distros like Debian, LLVM_AVAILABLE_LIBS contains both static and shared libraries in LLVM shipped with apt install llvm-dev, and acquires more packages to be installed, even though they are not actually needed. The log below shows the result of llvm-dev shipped with Debian 12, which is actually llvm-14.

message( STATUS "${LLVM_AVAILABLE_LIBS}" )
LLVMDemangle;LLVMSupport;LLVMTableGen;LLVMTableGenGlobalISel;LLVMCore;LLVMFuzzMutate;LLVMFileCheck;LLVMInterfaceStub;LLVMIRReader;LLVMCodeGen;LLVMSelectionDAG;LLVMAsmPrinter;LLVMMIRParser;LLVMGlobalISel;LLVMBinaryFormat;LLVMBitReader;LLVMBitWriter;LLVMBitstreamReader;LLVMDWARFLinker;LLVMExtensions;LLVMFrontendOpenACC;LLVMFrontendOpenMP;LLVMTransformUtils;LLVMInstrumentation;LLVMAggressiveInstCombine;LLVMInstCombine;LLVMScalarOpts;LLVMipo;LLVMVectorize;LLVMObjCARCOpts;LLVMCoroutines;LLVMCFGuard;LLVMLinker;LLVMAnalysis;LLVMLTO;LLVMMC;LLVMMCParser;LLVMMCDisassembler;LLVMMCA;LLVMObject;LLVMObjectYAML;LLVMOption;LLVMRemarks;LLVMDebuginfod;LLVMDebugInfoDWARF;LLVMDebugInfoGSYM;LLVMDebugInfoMSF;LLVMDebugInfoCodeView;LLVMDebugInfoPDB;LLVMSymbolize;LLVMDWP;LLVMExecutionEngine;LLVMInterpreter;LLVMJITLink;LLVMMCJIT;LLVMOrcJIT;LLVMOrcShared;LLVMOrcTargetProcess;LLVMRuntimeDyld;LLVMPerfJITEvents;LLVMTarget;LLVMAArch64CodeGen;LLVMAArch64AsmParser;LLVMAArch64Disassembler;LLVMAArch64Desc;LLVMAArch64Info;LLVMAArch64Utils;LLVMAMDGPUCodeGen;LLVMAMDGPUAsmParser;LLVMAMDGPUDisassembler;LLVMAMDGPUTargetMCA;LLVMAMDGPUDesc;LLVMAMDGPUInfo;LLVMAMDGPUUtils;LLVMARMCodeGen;LLVMARMAsmParser;LLVMARMDisassembler;LLVMARMDesc;LLVMARMInfo;LLVMARMUtils;LLVMAVRCodeGen;LLVMAVRAsmParser;LLVMAVRDisassembler;LLVMAVRDesc;LLVMAVRInfo;LLVMBPFCodeGen;LLVMBPFAsmParser;LLVMBPFDisassembler;LLVMBPFDesc;LLVMBPFInfo;LLVMHexagonCodeGen;LLVMHexagonAsmParser;LLVMHexagonDisassembler;LLVMHexagonDesc;LLVMHexagonInfo;LLVMLanaiCodeGen;LLVMLanaiAsmParser;LLVMLanaiDisassembler;LLVMLanaiDesc;LLVMLanaiInfo;LLVMMipsCodeGen;LLVMMipsAsmParser;LLVMMipsDisassembler;LLVMMipsDesc;LLVMMipsInfo;LLVMMSP430CodeGen;LLVMMSP430Desc;LLVMMSP430Info;LLVMMSP430AsmParser;LLVMMSP430Disassembler;LLVMNVPTXCodeGen;LLVMNVPTXDesc;LLVMNVPTXInfo;LLVMPowerPCCodeGen;LLVMPowerPCAsmParser;LLVMPowerPCDisassembler;LLVMPowerPCDesc;LLVMPowerPCInfo;LLVMRISCVCodeGen;LLVMRISCVAsmParser;LLVMRISCVDisassembler;LLVMRISCVDesc;LLVMRISCVInfo;LLVMSparcCodeGen;LLVMSparcAsmParser;LLVMSparcDisassembler;LLVMSparcDesc;LLVMSparcInfo;LLVMSystemZCodeGen;LLVMSystemZAsmParser;LLVMSystemZDisassembler;LLVMSystemZDesc;LLVMSystemZInfo;LLVMVECodeGen;LLVMVEAsmParser;LLVMVEDisassembler;LLVMVEInfo;LLVMVEDesc;LLVMWebAssemblyCodeGen;LLVMWebAssemblyAsmParser;LLVMWebAssemblyDisassembler;LLVMWebAssemblyDesc;LLVMWebAssemblyInfo;LLVMWebAssemblyUtils;LLVMX86CodeGen;LLVMX86AsmParser;LLVMX86Disassembler;LLVMX86TargetMCA;LLVMX86Desc;LLVMX86Info;LLVMXCoreCodeGen;LLVMXCoreDisassembler;LLVMXCoreDesc;LLVMXCoreInfo;LLVMM68kCodeGen;LLVMM68kInfo;LLVMM68kDesc;LLVMM68kAsmParser;LLVMM68kDisassembler;LLVMAsmParser;LLVMLineEditor;LLVMProfileData;LLVMCoverage;LLVMPasses;LLVMTextAPI;LLVMDlltoolDriver;LLVMLibDriver;LLVMXRay;LLVMWindowsManifest;LTO;MLIRSupportIndentedOstream;LLVMCFIVerify;LLVMDiff;LLVMExegesisX86;LLVMExegesisAArch64;LLVMExegesisPowerPC;LLVMExegesisMips;LLVMExegesis;LLVM;Remarks;Polly

Here LLVMDebuginfod acquires CURL::libcurl, which means I have to install libcurl-openssl-dev and then add find_package(CURL REQUIRED) before find_package(LLVM "14.0" REQUIRED CONFIG).

Most of libraries above are static, but something like LLVM, Polly are actually shared libraries. Linking all of them to jit causes

$ ./t_basic 
: CommandLine Error: Option 'amdgpu-dump-hsa-metadata' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
[1]    2157571 IOT instruction  ./t_basic

$ ldd ./t_basic 
        linux-vdso.so.1 (0x00007ffc867fa000)
        libLLVM-14.so.1 => /usr/lib/llvm-14/lib/libLLVM-14.so.1 (0x00007f6414a00000)
        libcuda.so.1 => /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.1 (0x00007f6412c00000)
        libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f641f97f000)
        libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f6412a54000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6412800000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6414921000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f641f95d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f641261f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f641facf000)
        libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f641f951000)
        libedit.so.2 => /lib/x86_64-linux-gnu/libedit.so.2 (0x00007f641f917000)
        libz3.so.4 => /lib/x86_64-linux-gnu/libz3.so.4 (0x00007f6410e00000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f641b3e1000)
        libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f641b3ae000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f641f910000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f641b3a9000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f641b3a4000)
        libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f6412563000)
        libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f64124ac000)
        libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f641244f000)
        libicuuc.so.72 => /lib/x86_64-linux-gnu/libicuuc.so.72 (0x00007f6410c02000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f641b373000)
        libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f641b35b000)
        libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007f64148eb000)
        libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f64148e6000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f6412a26000)
        libicudata.so.72 => /lib/x86_64-linux-gnu/libicudata.so.72 (0x00007f640ee00000)
        libmd.so.0 => /lib/x86_64-linux-gnu/libmd.so.0 (0x00007f6412442000)

And ldd shows the executable requires LLVM-14.so, which is unnecessary.

I think we can filter out useless libraries from the list, and ensure that the final executables are fully statically or fully dynamically linked to LLVM.

FYI, I use

list(FILTER LLVM_AVAILABLE_LIBS INCLUDE REGEX "LLVM(MCJIT|.+(CodeGen|AsmParser))" )
target_link_libraries( jit PUBLIC ${LLVM_AVAILABLE_LIBS} ) 

to exclude unnecessary libraries, and the executable is statically linked against LLVM now.

bug in transpose function

Hi @fwinter, we found there maybe some bugs in transpose function of current qdp-jit, which didn't do the transpose on the spin component. A simple test program from @SaltyChiang was used to verify the code.

#include <qdp.h>
#include <qdp_layout.h>
#include <qdp_multi.h>
#include <qdp_parscalar_specific.h>
#include <qdp_primcolormat.h>
#include <qdp_scalarsite_defs.h>

using namespace QDP;

template <class T, int N> void printMatrix(PColorMatrix<T, N> matrix) {
  printf("\n");
  for (int i = 0; i < N; ++i) {
    for (int j = 0; j < N; ++j) {
      printf("%12.8f+%12.8fi   ", matrix.elem(i, j).real().elem(),
             matrix.elem(i, j).imag().elem());
    }
    printf("\n");
  }
}

int main(int argc, char *argv[]) {
  QDP_initialize(&argc, &argv);

  const int latt_size[Nd] = {4, 4, 4, 8};
  multi1d<int> nrows(Nd);
  nrows = latt_size;
  Layout::setLattSize(nrows);
  Layout::create();

  LatticePropagator prop;
  gaussian(prop);
  LatticePropagator prop_T = transpose(prop); //! ERROR here
  printMatrix(prop.elem(0).elem(0, 1)); //* reference
  printMatrix(prop_T.elem(0).elem(1, 0)); //* should be the transpose of the previous one
  printMatrix(prop_T.elem(0).elem(0, 1)); //* should be very different

  QDP_finalize();
  return 0;
}

The commit we used is the recent 2a1c29ffa4360c38b088c2baa08eec2c0692d472 of devel branch, and with gcc 11.2.1, cuda 11.7, llvm 13.0.0. Then the output of above code is

 -0.64040606+  0.24160731i     0.72971220+  0.89945116i     1.07925420+  0.83748773i   
  0.28545658+ -0.29035900i    -0.84708820+ -0.91764636i     0.57950565+ -1.20922577i   
 -0.62136288+  0.00338123i    -0.31409364+ -0.44601773i     0.15046804+  1.53841175i   

  1.40671455+  1.18308668i     0.14042949+ -0.79521487i     1.53403557+ -1.10512429i   
 -0.57295400+ -0.88400266i    -0.52849531+  0.53221092i     1.39610009+  0.41041381i   
 -0.15161463+ -1.79886656i    -0.72825843+  2.34345012i    -0.15578486+ -1.51035852i   

 -0.64040606+  0.24160731i     0.28545658+ -0.29035900i    -0.62136288+  0.00338123i   
  0.72971220+  0.89945116i    -0.84708820+ -0.91764636i    -0.31409364+ -0.44601773i   
  1.07925420+  0.83748773i     0.57950565+ -1.20922577i     0.15046804+  1.53841175i  

apparently the transpose function did not transpose the spin component.

In an old build (around May 7 2021), with gcc 7.3.1, cuda 11.0, llvm 6.0.0, the transpose function
works correctly on the spin component and the output of above code is

-0.64040606+  0.24160731i     0.72971220+  0.89945116i     1.07925420+  0.83748773i   
  0.28545658+ -0.29035900i    -0.84708820+ -0.91764636i     0.57950565+ -1.20922577i   
 -0.62136288+  0.00338123i    -0.31409364+ -0.44601773i     0.15046804+  1.53841175i   

 -0.64040606+  0.24160731i     0.28545658+ -0.29035900i    -0.62136288+  0.00338123i   
  0.72971220+  0.89945116i    -0.84708820+ -0.91764636i    -0.31409364+ -0.44601773i   
  1.07925420+  0.83748773i     0.57950565+ -1.20922577i     0.15046804+  1.53841175i   

  1.40671455+  1.18308668i     0.14042949+ -0.79521487i     1.53403557+ -1.10512429i   
 -0.57295400+ -0.88400266i    -0.52849531+  0.53221092i     1.39610009+  0.41041381i   
 -0.15161463+ -1.79886656i    -0.72825843+  2.34345012i    -0.15578486+ -1.51035852i

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.