jeffersonlab / qdp-jit Goto Github PK

License: Other

Makefile 0.53% C++ 96.96% Perl 0.02% M4 0.64% Shell 1.78% C 0.07%

qdp-jit's Introduction

See the QDP-JIT wiki:

https://github.com/JeffersonLab/qdp-jit/wiki

qdp-jit's People

Contributors

Stargazers

Watchers

Forkers

bjoo eromero-vlc cpviolator henrymonge utku-k-can clqcd

qdp-jit's Issues

`LLD_HAS_DRIVER` blocks compilation

The line here blocks compilation with ROCm based on LLVM 15. It seems that LLD_HAS_DRIVER is only available after LLVM 17.

qdp-jit/lib/qdp_llvm.cc

Lines 183 to 185 in e436239


	LLD_HAS_DRIVER(elf)

fconcepts vuild issue GCC 9

My compiler is GCC 9 and I have disabled the C++20 features flag. However it still tries to build the concepts, is there somewhere I can pass -fconcepts .
thanks

qdp-jit/lib/../include/qdp_sum.h:198:24: error: ‘concept’ does not name a type; did you mean ‘concat’?
198 | template concept ConceptHasShift = HasShift::value;

jit launch explicit geom error

Hi @fwinter , when I updated qdp-jit to the latest transpose bug fix (commit: 2b31be631645835febeb5accfe2cec120df40c05), there was a new issue appeared.

The test input.xml of chroma is as following:

<?xml version="1.0"?>
<chroma>
  <Param>
    <InlineMeasurements>
      <elem>
        <Name>MAKE_SOURCE</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>6</version>
          <Source>
            <version>3</version>
            <SourceType>SHELL_SOURCE</SourceType>
            <j_decay>3</j_decay>
            <t_srce>0 0 0 0</t_srce>
            <quark_smear_lastP>false</quark_smear_lastP>
            <SmearingParam>
              <wvf_kind>GAUGE_INV_GAUSSIAN</wvf_kind>
              <wvf_param>2.0</wvf_param>
              <wvfIntPar>30</wvfIntPar>
              <no_smear_dir>3</no_smear_dir>
            </SmearingParam>
            <Displacement>
              <version>1</version>
              <DisplacementType>NONE</DisplacementType>
            </Displacement>
          </Source>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <source_id>sh_source</source_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>PROPAGATOR</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>10</version>
          <quarkSpinType>FULL</quarkSpinType>
          <obsvP>false</obsvP>
          <numRetries>1</numRetries>
          <FermionAction>
            <FermAct>CLOVER</FermAct>
            <Mass>-0.04</Mass>
            <clovCoeffR>1.2</clovCoeffR>
            <clovCoeffT>0.6</clovCoeffT>
            <AnisoParam>
              <anisoP>true</anisoP>
              <t_dir>3</t_dir>
              <xi_0>5</xi_0>
              <nu>1</nu>
            </AnisoParam>
            <FermionBC>
              <FermBC>SIMPLE_FERMBC</FermBC>
              <boundary>1 1 1 -1</boundary>
            </FermionBC>
          </FermionAction>
          <InvertParam>
            <invType>CG_INVERTER</invType>
            <RsdCG>1e-08</RsdCG>
            <MaxCG>2000</MaxCG>
          </InvertParam>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <source_id>sh_source</source_id>
          <prop_id>sh_prop</prop_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>ERASE_NAMED_OBJECT</Name>
        <Frequency>1</Frequency>
        <NamedObject>
          <object_id>sh_source</object_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>SINK_SMEAR</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>5</version>
          <Sink>
            <version>2</version>
            <SinkType>POINT_SINK</SinkType>
            <j_decay>3</j_decay>
            <Displacement>
              <version>1</version>
              <DisplacementType>NONE</DisplacementType>
            </Displacement>
          </Sink>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <prop_id>sh_prop</prop_id>
          <smeared_prop_id>pt_sh_prop</smeared_prop_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>ERASE_NAMED_OBJECT</Name>
        <Frequency>1</Frequency>
        <NamedObject>
          <object_id>sh_prop</object_id>
        </NamedObject>
      </elem>
      <elem>
        <Name>HADRON_SPECTRUM</Name>
        <Frequency>1</Frequency>
        <Param>
          <version>1</version>
          <MesonP>true</MesonP>
          <BaryonP>true</BaryonP>
          <CurrentP>false</CurrentP>
          <time_rev>false</time_rev>
          <mom2_max>6</mom2_max>
          <avg_equiv_mom>true</avg_equiv_mom>
        </Param>
        <NamedObject>
          <gauge_id>default_gauge_field</gauge_id>
          <sink_pairs>
            <elem>
              <first_id>pt_sh_prop</first_id>
              <second_id>pt_sh_prop</second_id>
            </elem>
          </sink_pairs>
        </NamedObject>
        <xml_file>hadspec.xml.t0</xml_file>
      </elem>
      <elem>
        <Name>ERASE_NAMED_OBJECT</Name>
        <Frequency>1</Frequency>
        <NamedObject>
          <object_id>pt_sh_prop</object_id>
        </NamedObject>
      </elem>
    </InlineMeasurements>
    <nrow>12 12 12 96</nrow>
  </Param>
  <RNG>
    <Seed>
      <elem>9996</elem>
      <elem>32552</elem>
      <elem>27027</elem>
      <elem>18583</elem>
    </Seed>
  </RNG>
  <Cfg>
    <cfg_type>WEAK_FIELD</cfg_type>
    <parallel_io>true</parallel_io>
  </Cfg>
</chroma>

then the mpirun -np 1 chroma -geom 1 1 1 1 -i input.xml will give jit launch explicit geom error, grid=(2,1,1), block=(1024,1,1) for this 12^3 x 96 lattice, while there is no such problem at least for commit c54122030be2de6cb41b007c2d3d4c60145e0e2b and I can run a 12^3 x 96 lattice on a single GPU at that commit.
Currently, if I change lattice size to 12^3 x 48, the code can run successfully but with the output of qdp-jit/lib/qdp_cache.cc:468: void QDP::QDPCache::signoff(int): Assertion 'vecEntry.size() > id' failed. after qdp-jit statistics.

The environment I used is cuda 11.7, llvm 13.0.0, chroma(devel: 4b2e1171ac307b7f4273186543afad5b25b7bc00) on
Tesla V100-SXM2-32GB.

32-bit roll-over detected in numbits() in qdp_random.cc

Hi All,
I got an email from Eric Gregory at JSC with this comment:

In the course of this I am running some pretty large lattices and noticing a bug or at least a limitation of QDP.
At the moment I am running 128^2x512

This causes
int nbits = numbits(Layout::vol());
to hang in initRNG() in qdp_random.cc as the lattice volume is 2^30, which I guess is the limit of integer size. So the while loop

int numbits(int x)
{
int num = 1;
int iceiling = 2;
while (iceiling <= x)
{
num++;
iceiling *= 2;
}
return num;
}

never exits as iceiling rolls over and becomes 0 (in this case).
I am guessing, insofar as qdp_random.cc is identical between QDP-JIT and qdp++ this will be an issue there too...

Build fail at XML

After building all the required packages including xpath_reader itself and then running the cmake on qdp-jit. Is there something special I need to do to get it to build xpath_reader in other libs or use the xpath_reader I have build?
Thanks

Extract from cmake, showing it found requirements.
Using LLVMConfig.cmake in /QCDSolvers/libs/LLVM13/lib/cmake/llvm
-- Found MPI_C: /usr/lib64/mpi/gcc/openmpi4/lib64/libmpi.so (found version "3.1")
-- Found MPI_CXX: /usr/lib64/mpi/gcc/openmpi4/lib64/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found LibXml2: /QCDSolvers/libs/xml2/lib64/libxm2.so

Make is where it fails complaining about xml.

[ 0%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/xml_simplewriter.cc.o
[ 0%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/attribute.cc.o
[ 1%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/xml_struct_writer.cc.o
gmake[2]: *** No rule to make target '/QCDSolvers/libs/xml2/lib64/libxm2.so', needed by 'other_libs/xpath_reader/lib/libxmlWriter.so'. Stop.

QDP_ENABLE_CUDA_MANAGED_MEMORY doesn't set QDP_USE_CUDA_MANAGED_MEMORY

The cmake option QDP_ENABLE_CUDA_MANAGED_MEMORY is doing nothing as it does not set QDP_USE_CUDA_MANAGED_MEMORY which is used in configure_file later.

Explicitly link necessary LLVM libraries to target `jit`.

QDP-JIT links all available LLVM libraries to target jit by using LLVM_AVAILABLE_LIBS.

qdp-jit/lib/CMakeLists.txt

Line 100 in 7ffb650

target_link_libraries( jit PUBLIC ${LLVM_AVAILABLE_LIBS} )

In some Linux distros like Debian, LLVM_AVAILABLE_LIBS contains both static and shared libraries in LLVM shipped with apt install llvm-dev, and acquires more packages to be installed, even though they are not actually needed. The log below shows the result of llvm-dev shipped with Debian 12, which is actually llvm-14.

message( STATUS "${LLVM_AVAILABLE_LIBS}" )
LLVMDemangle;LLVMSupport;LLVMTableGen;LLVMTableGenGlobalISel;LLVMCore;LLVMFuzzMutate;LLVMFileCheck;LLVMInterfaceStub;LLVMIRReader;LLVMCodeGen;LLVMSelectionDAG;LLVMAsmPrinter;LLVMMIRParser;LLVMGlobalISel;LLVMBinaryFormat;LLVMBitReader;LLVMBitWriter;LLVMBitstreamReader;LLVMDWARFLinker;LLVMExtensions;LLVMFrontendOpenACC;LLVMFrontendOpenMP;LLVMTransformUtils;LLVMInstrumentation;LLVMAggressiveInstCombine;LLVMInstCombine;LLVMScalarOpts;LLVMipo;LLVMVectorize;LLVMObjCARCOpts;LLVMCoroutines;LLVMCFGuard;LLVMLinker;LLVMAnalysis;LLVMLTO;LLVMMC;LLVMMCParser;LLVMMCDisassembler;LLVMMCA;LLVMObject;LLVMObjectYAML;LLVMOption;LLVMRemarks;LLVMDebuginfod;LLVMDebugInfoDWARF;LLVMDebugInfoGSYM;LLVMDebugInfoMSF;LLVMDebugInfoCodeView;LLVMDebugInfoPDB;LLVMSymbolize;LLVMDWP;LLVMExecutionEngine;LLVMInterpreter;LLVMJITLink;LLVMMCJIT;LLVMOrcJIT;LLVMOrcShared;LLVMOrcTargetProcess;LLVMRuntimeDyld;LLVMPerfJITEvents;LLVMTarget;LLVMAArch64CodeGen;LLVMAArch64AsmParser;LLVMAArch64Disassembler;LLVMAArch64Desc;LLVMAArch64Info;LLVMAArch64Utils;LLVMAMDGPUCodeGen;LLVMAMDGPUAsmParser;LLVMAMDGPUDisassembler;LLVMAMDGPUTargetMCA;LLVMAMDGPUDesc;LLVMAMDGPUInfo;LLVMAMDGPUUtils;LLVMARMCodeGen;LLVMARMAsmParser;LLVMARMDisassembler;LLVMARMDesc;LLVMARMInfo;LLVMARMUtils;LLVMAVRCodeGen;LLVMAVRAsmParser;LLVMAVRDisassembler;LLVMAVRDesc;LLVMAVRInfo;LLVMBPFCodeGen;LLVMBPFAsmParser;LLVMBPFDisassembler;LLVMBPFDesc;LLVMBPFInfo;LLVMHexagonCodeGen;LLVMHexagonAsmParser;LLVMHexagonDisassembler;LLVMHexagonDesc;LLVMHexagonInfo;LLVMLanaiCodeGen;LLVMLanaiAsmParser;LLVMLanaiDisassembler;LLVMLanaiDesc;LLVMLanaiInfo;LLVMMipsCodeGen;LLVMMipsAsmParser;LLVMMipsDisassembler;LLVMMipsDesc;LLVMMipsInfo;LLVMMSP430CodeGen;LLVMMSP430Desc;LLVMMSP430Info;LLVMMSP430AsmParser;LLVMMSP430Disassembler;LLVMNVPTXCodeGen;LLVMNVPTXDesc;LLVMNVPTXInfo;LLVMPowerPCCodeGen;LLVMPowerPCAsmParser;LLVMPowerPCDisassembler;LLVMPowerPCDesc;LLVMPowerPCInfo;LLVMRISCVCodeGen;LLVMRISCVAsmParser;LLVMRISCVDisassembler;LLVMRISCVDesc;LLVMRISCVInfo;LLVMSparcCodeGen;LLVMSparcAsmParser;LLVMSparcDisassembler;LLVMSparcDesc;LLVMSparcInfo;LLVMSystemZCodeGen;LLVMSystemZAsmParser;LLVMSystemZDisassembler;LLVMSystemZDesc;LLVMSystemZInfo;LLVMVECodeGen;LLVMVEAsmParser;LLVMVEDisassembler;LLVMVEInfo;LLVMVEDesc;LLVMWebAssemblyCodeGen;LLVMWebAssemblyAsmParser;LLVMWebAssemblyDisassembler;LLVMWebAssemblyDesc;LLVMWebAssemblyInfo;LLVMWebAssemblyUtils;LLVMX86CodeGen;LLVMX86AsmParser;LLVMX86Disassembler;LLVMX86TargetMCA;LLVMX86Desc;LLVMX86Info;LLVMXCoreCodeGen;LLVMXCoreDisassembler;LLVMXCoreDesc;LLVMXCoreInfo;LLVMM68kCodeGen;LLVMM68kInfo;LLVMM68kDesc;LLVMM68kAsmParser;LLVMM68kDisassembler;LLVMAsmParser;LLVMLineEditor;LLVMProfileData;LLVMCoverage;LLVMPasses;LLVMTextAPI;LLVMDlltoolDriver;LLVMLibDriver;LLVMXRay;LLVMWindowsManifest;LTO;MLIRSupportIndentedOstream;LLVMCFIVerify;LLVMDiff;LLVMExegesisX86;LLVMExegesisAArch64;LLVMExegesisPowerPC;LLVMExegesisMips;LLVMExegesis;LLVM;Remarks;Polly

Here LLVMDebuginfod acquires CURL::libcurl, which means I have to install libcurl-openssl-dev and then add find_package(CURL REQUIRED) before find_package(LLVM "14.0" REQUIRED CONFIG).

Most of libraries above are static, but something like LLVM, Polly are actually shared libraries. Linking all of them to jit causes

$ ./t_basic 
: CommandLine Error: Option 'amdgpu-dump-hsa-metadata' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
[1]    2157571 IOT instruction  ./t_basic

$ ldd ./t_basic 
        linux-vdso.so.1 (0x00007ffc867fa000)
        libLLVM-14.so.1 => /usr/lib/llvm-14/lib/libLLVM-14.so.1 (0x00007f6414a00000)
        libcuda.so.1 => /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.1 (0x00007f6412c00000)
        libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f641f97f000)
        libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f6412a54000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6412800000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6414921000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f641f95d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f641261f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f641facf000)
        libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f641f951000)
        libedit.so.2 => /lib/x86_64-linux-gnu/libedit.so.2 (0x00007f641f917000)
        libz3.so.4 => /lib/x86_64-linux-gnu/libz3.so.4 (0x00007f6410e00000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f641b3e1000)
        libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f641b3ae000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f641f910000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f641b3a9000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f641b3a4000)
        libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f6412563000)
        libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f64124ac000)
        libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f641244f000)
        libicuuc.so.72 => /lib/x86_64-linux-gnu/libicuuc.so.72 (0x00007f6410c02000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f641b373000)
        libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f641b35b000)
        libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007f64148eb000)
        libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f64148e6000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f6412a26000)
        libicudata.so.72 => /lib/x86_64-linux-gnu/libicudata.so.72 (0x00007f640ee00000)
        libmd.so.0 => /lib/x86_64-linux-gnu/libmd.so.0 (0x00007f6412442000)

And ldd shows the executable requires LLVM-14.so, which is unnecessary.

I think we can filter out useless libraries from the list, and ensure that the final executables are fully statically or fully dynamically linked to LLVM.

FYI, I use

list(FILTER LLVM_AVAILABLE_LIBS INCLUDE REGEX "LLVM(MCJIT|.+(CodeGen|AsmParser))" )
target_link_libraries( jit PUBLIC ${LLVM_AVAILABLE_LIBS} )

to exclude unnecessary libraries, and the executable is statically linked against LLVM now.

bug in transpose function

Hi @fwinter, we found there maybe some bugs in transpose function of current qdp-jit, which didn't do the transpose on the spin component. A simple test program from @SaltyChiang was used to verify the code.

#include <qdp.h>
#include <qdp_layout.h>
#include <qdp_multi.h>
#include <qdp_parscalar_specific.h>
#include <qdp_primcolormat.h>
#include <qdp_scalarsite_defs.h>

using namespace QDP;

template <class T, int N> void printMatrix(PColorMatrix<T, N> matrix) {
  printf("\n");
  for (int i = 0; i < N; ++i) {
    for (int j = 0; j < N; ++j) {
      printf("%12.8f+%12.8fi   ", matrix.elem(i, j).real().elem(),
             matrix.elem(i, j).imag().elem());
    }
    printf("\n");
  }
}

int main(int argc, char *argv[]) {
  QDP_initialize(&argc, &argv);

  const int latt_size[Nd] = {4, 4, 4, 8};
  multi1d<int> nrows(Nd);
  nrows = latt_size;
  Layout::setLattSize(nrows);
  Layout::create();

  LatticePropagator prop;
  gaussian(prop);
  LatticePropagator prop_T = transpose(prop); //! ERROR here
  printMatrix(prop.elem(0).elem(0, 1)); //* reference
  printMatrix(prop_T.elem(0).elem(1, 0)); //* should be the transpose of the previous one
  printMatrix(prop_T.elem(0).elem(0, 1)); //* should be very different

  QDP_finalize();
  return 0;
}

The commit we used is the recent 2a1c29ffa4360c38b088c2baa08eec2c0692d472 of devel branch, and with gcc 11.2.1, cuda 11.7, llvm 13.0.0. Then the output of above code is

 -0.64040606+  0.24160731i     0.72971220+  0.89945116i     1.07925420+  0.83748773i   
  0.28545658+ -0.29035900i    -0.84708820+ -0.91764636i     0.57950565+ -1.20922577i   
 -0.62136288+  0.00338123i    -0.31409364+ -0.44601773i     0.15046804+  1.53841175i   

  1.40671455+  1.18308668i     0.14042949+ -0.79521487i     1.53403557+ -1.10512429i   
 -0.57295400+ -0.88400266i    -0.52849531+  0.53221092i     1.39610009+  0.41041381i   
 -0.15161463+ -1.79886656i    -0.72825843+  2.34345012i    -0.15578486+ -1.51035852i   

 -0.64040606+  0.24160731i     0.28545658+ -0.29035900i    -0.62136288+  0.00338123i   
  0.72971220+  0.89945116i    -0.84708820+ -0.91764636i    -0.31409364+ -0.44601773i   
  1.07925420+  0.83748773i     0.57950565+ -1.20922577i     0.15046804+  1.53841175i

apparently the transpose function did not transpose the spin component.

In an old build (around May 7 2021), with gcc 7.3.1, cuda 11.0, llvm 6.0.0, the transpose function
works correctly on the spin component and the output of above code is

-0.64040606+  0.24160731i     0.72971220+  0.89945116i     1.07925420+  0.83748773i   
  0.28545658+ -0.29035900i    -0.84708820+ -0.91764636i     0.57950565+ -1.20922577i   
 -0.62136288+  0.00338123i    -0.31409364+ -0.44601773i     0.15046804+  1.53841175i   

 -0.64040606+  0.24160731i     0.28545658+ -0.29035900i    -0.62136288+  0.00338123i   
  0.72971220+  0.89945116i    -0.84708820+ -0.91764636i    -0.31409364+ -0.44601773i   
  1.07925420+  0.83748773i     0.57950565+ -1.20922577i     0.15046804+  1.53841175i   

  1.40671455+  1.18308668i     0.14042949+ -0.79521487i     1.53403557+ -1.10512429i   
 -0.57295400+ -0.88400266i    -0.52849531+  0.53221092i     1.39610009+  0.41041381i   
 -0.15161463+ -1.79886656i    -0.72825843+  2.34345012i    -0.15578486+ -1.51035852i

jeffersonlab / qdp-jit Goto Github PK

qdp-jit's Introduction

qdp-jit's People

Contributors

Stargazers

Watchers

Forkers

qdp-jit's Issues

`LLD_HAS_DRIVER` blocks compilation

fconcepts vuild issue GCC 9

jit launch explicit geom error

32-bit roll-over detected in numbits() in qdp_random.cc

Build fail at XML

QDP_ENABLE_CUDA_MANAGED_MEMORY doesn't set QDP_USE_CUDA_MANAGED_MEMORY

Explicitly link necessary LLVM libraries to target `jit`.

bug in transpose function

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent