jeffersonlab / qdp-jit Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
See the QDP-JIT wiki: https://github.com/JeffersonLab/qdp-jit/wiki
The line here blocks compilation with ROCm based on LLVM 15. It seems that LLD_HAS_DRIVER
is only available after LLVM 17.
Lines 183 to 185 in e436239
My compiler is GCC 9 and I have disabled the C++20 features flag. However it still tries to build the concepts, is there somewhere I can pass -fconcepts .
thanks
qdp-jit/lib/../include/qdp_sum.h:198:24: error: ‘concept’ does not name a type; did you mean ‘concat’?
198 | template concept ConceptHasShift = HasShift::value;
Hi @fwinter , when I updated qdp-jit to the latest transpose bug fix (commit: 2b31be631645835febeb5accfe2cec120df40c05
), there was a new issue appeared.
The test input.xml
of chroma is as following:
<?xml version="1.0"?>
<chroma>
<Param>
<InlineMeasurements>
<elem>
<Name>MAKE_SOURCE</Name>
<Frequency>1</Frequency>
<Param>
<version>6</version>
<Source>
<version>3</version>
<SourceType>SHELL_SOURCE</SourceType>
<j_decay>3</j_decay>
<t_srce>0 0 0 0</t_srce>
<quark_smear_lastP>false</quark_smear_lastP>
<SmearingParam>
<wvf_kind>GAUGE_INV_GAUSSIAN</wvf_kind>
<wvf_param>2.0</wvf_param>
<wvfIntPar>30</wvfIntPar>
<no_smear_dir>3</no_smear_dir>
</SmearingParam>
<Displacement>
<version>1</version>
<DisplacementType>NONE</DisplacementType>
</Displacement>
</Source>
</Param>
<NamedObject>
<gauge_id>default_gauge_field</gauge_id>
<source_id>sh_source</source_id>
</NamedObject>
</elem>
<elem>
<Name>PROPAGATOR</Name>
<Frequency>1</Frequency>
<Param>
<version>10</version>
<quarkSpinType>FULL</quarkSpinType>
<obsvP>false</obsvP>
<numRetries>1</numRetries>
<FermionAction>
<FermAct>CLOVER</FermAct>
<Mass>-0.04</Mass>
<clovCoeffR>1.2</clovCoeffR>
<clovCoeffT>0.6</clovCoeffT>
<AnisoParam>
<anisoP>true</anisoP>
<t_dir>3</t_dir>
<xi_0>5</xi_0>
<nu>1</nu>
</AnisoParam>
<FermionBC>
<FermBC>SIMPLE_FERMBC</FermBC>
<boundary>1 1 1 -1</boundary>
</FermionBC>
</FermionAction>
<InvertParam>
<invType>CG_INVERTER</invType>
<RsdCG>1e-08</RsdCG>
<MaxCG>2000</MaxCG>
</InvertParam>
</Param>
<NamedObject>
<gauge_id>default_gauge_field</gauge_id>
<source_id>sh_source</source_id>
<prop_id>sh_prop</prop_id>
</NamedObject>
</elem>
<elem>
<Name>ERASE_NAMED_OBJECT</Name>
<Frequency>1</Frequency>
<NamedObject>
<object_id>sh_source</object_id>
</NamedObject>
</elem>
<elem>
<Name>SINK_SMEAR</Name>
<Frequency>1</Frequency>
<Param>
<version>5</version>
<Sink>
<version>2</version>
<SinkType>POINT_SINK</SinkType>
<j_decay>3</j_decay>
<Displacement>
<version>1</version>
<DisplacementType>NONE</DisplacementType>
</Displacement>
</Sink>
</Param>
<NamedObject>
<gauge_id>default_gauge_field</gauge_id>
<prop_id>sh_prop</prop_id>
<smeared_prop_id>pt_sh_prop</smeared_prop_id>
</NamedObject>
</elem>
<elem>
<Name>ERASE_NAMED_OBJECT</Name>
<Frequency>1</Frequency>
<NamedObject>
<object_id>sh_prop</object_id>
</NamedObject>
</elem>
<elem>
<Name>HADRON_SPECTRUM</Name>
<Frequency>1</Frequency>
<Param>
<version>1</version>
<MesonP>true</MesonP>
<BaryonP>true</BaryonP>
<CurrentP>false</CurrentP>
<time_rev>false</time_rev>
<mom2_max>6</mom2_max>
<avg_equiv_mom>true</avg_equiv_mom>
</Param>
<NamedObject>
<gauge_id>default_gauge_field</gauge_id>
<sink_pairs>
<elem>
<first_id>pt_sh_prop</first_id>
<second_id>pt_sh_prop</second_id>
</elem>
</sink_pairs>
</NamedObject>
<xml_file>hadspec.xml.t0</xml_file>
</elem>
<elem>
<Name>ERASE_NAMED_OBJECT</Name>
<Frequency>1</Frequency>
<NamedObject>
<object_id>pt_sh_prop</object_id>
</NamedObject>
</elem>
</InlineMeasurements>
<nrow>12 12 12 96</nrow>
</Param>
<RNG>
<Seed>
<elem>9996</elem>
<elem>32552</elem>
<elem>27027</elem>
<elem>18583</elem>
</Seed>
</RNG>
<Cfg>
<cfg_type>WEAK_FIELD</cfg_type>
<parallel_io>true</parallel_io>
</Cfg>
</chroma>
then the mpirun -np 1 chroma -geom 1 1 1 1 -i input.xml
will give jit launch explicit geom error, grid=(2,1,1), block=(1024,1,1)
for this 12^3 x 96
lattice, while there is no such problem at least for commit c54122030be2de6cb41b007c2d3d4c60145e0e2b
and I can run a 12^3 x 96
lattice on a single GPU at that commit.
Currently, if I change lattice size to 12^3 x 48
, the code can run successfully but with the output of qdp-jit/lib/qdp_cache.cc:468: void QDP::QDPCache::signoff(int): Assertion 'vecEntry.size() > id' failed.
after qdp-jit statistics.
The environment I used is cuda 11.7, llvm 13.0.0, chroma(devel: 4b2e1171ac307b7f4273186543afad5b25b7bc00)
on
Tesla V100-SXM2-32GB.
Hi All,
I got an email from Eric Gregory at JSC with this comment:
In the course of this I am running some pretty large lattices and noticing a bug or at least a limitation of QDP.
At the moment I am running 128^2x512
This causes
int nbits = numbits(Layout::vol());
to hang in initRNG() in qdp_random.cc as the lattice volume is 2^30, which I guess is the limit of integer size. So the while loop
int numbits(int x)
{
int num = 1;
int iceiling = 2;
while (iceiling <= x)
{
num++;
iceiling *= 2;
}
return num;
}
never exits as iceiling rolls over and becomes 0 (in this case).
I am guessing, insofar as qdp_random.cc is identical between QDP-JIT and qdp++ this will be an issue there too...
After building all the required packages including xpath_reader itself and then running the cmake on qdp-jit. Is there something special I need to do to get it to build xpath_reader in other libs or use the xpath_reader I have build?
Thanks
Extract from cmake, showing it found requirements.
Using LLVMConfig.cmake in /QCDSolvers/libs/LLVM13/lib/cmake/llvm
-- Found MPI_C: /usr/lib64/mpi/gcc/openmpi4/lib64/libmpi.so (found version "3.1")
-- Found MPI_CXX: /usr/lib64/mpi/gcc/openmpi4/lib64/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found LibXml2: /QCDSolvers/libs/xml2/lib64/libxm2.so
Make is where it fails complaining about xml.
[ 0%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/xml_simplewriter.cc.o
[ 0%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/attribute.cc.o
[ 1%] Building CXX object other_libs/xpath_reader/lib/CMakeFiles/xmlWriter.dir/xml_struct_writer.cc.o
gmake[2]: *** No rule to make target '/QCDSolvers/libs/xml2/lib64/libxm2.so', needed by 'other_libs/xpath_reader/lib/libxmlWriter.so'. Stop.
The cmake option QDP_ENABLE_CUDA_MANAGED_MEMORY
is doing nothing as it does not set QDP_USE_CUDA_MANAGED_MEMORY
which is used in configure_file later.
QDP-JIT links all available LLVM libraries to target jit
by using LLVM_AVAILABLE_LIBS
.
Line 100 in 7ffb650
In some Linux distros like Debian, LLVM_AVAILABLE_LIBS
contains both static and shared libraries in LLVM shipped with apt install llvm-dev
, and acquires more packages to be installed, even though they are not actually needed. The log below shows the result of llvm-dev
shipped with Debian 12, which is actually llvm-14
.
message( STATUS "${LLVM_AVAILABLE_LIBS}" )
LLVMDemangle;LLVMSupport;LLVMTableGen;LLVMTableGenGlobalISel;LLVMCore;LLVMFuzzMutate;LLVMFileCheck;LLVMInterfaceStub;LLVMIRReader;LLVMCodeGen;LLVMSelectionDAG;LLVMAsmPrinter;LLVMMIRParser;LLVMGlobalISel;LLVMBinaryFormat;LLVMBitReader;LLVMBitWriter;LLVMBitstreamReader;LLVMDWARFLinker;LLVMExtensions;LLVMFrontendOpenACC;LLVMFrontendOpenMP;LLVMTransformUtils;LLVMInstrumentation;LLVMAggressiveInstCombine;LLVMInstCombine;LLVMScalarOpts;LLVMipo;LLVMVectorize;LLVMObjCARCOpts;LLVMCoroutines;LLVMCFGuard;LLVMLinker;LLVMAnalysis;LLVMLTO;LLVMMC;LLVMMCParser;LLVMMCDisassembler;LLVMMCA;LLVMObject;LLVMObjectYAML;LLVMOption;LLVMRemarks;LLVMDebuginfod;LLVMDebugInfoDWARF;LLVMDebugInfoGSYM;LLVMDebugInfoMSF;LLVMDebugInfoCodeView;LLVMDebugInfoPDB;LLVMSymbolize;LLVMDWP;LLVMExecutionEngine;LLVMInterpreter;LLVMJITLink;LLVMMCJIT;LLVMOrcJIT;LLVMOrcShared;LLVMOrcTargetProcess;LLVMRuntimeDyld;LLVMPerfJITEvents;LLVMTarget;LLVMAArch64CodeGen;LLVMAArch64AsmParser;LLVMAArch64Disassembler;LLVMAArch64Desc;LLVMAArch64Info;LLVMAArch64Utils;LLVMAMDGPUCodeGen;LLVMAMDGPUAsmParser;LLVMAMDGPUDisassembler;LLVMAMDGPUTargetMCA;LLVMAMDGPUDesc;LLVMAMDGPUInfo;LLVMAMDGPUUtils;LLVMARMCodeGen;LLVMARMAsmParser;LLVMARMDisassembler;LLVMARMDesc;LLVMARMInfo;LLVMARMUtils;LLVMAVRCodeGen;LLVMAVRAsmParser;LLVMAVRDisassembler;LLVMAVRDesc;LLVMAVRInfo;LLVMBPFCodeGen;LLVMBPFAsmParser;LLVMBPFDisassembler;LLVMBPFDesc;LLVMBPFInfo;LLVMHexagonCodeGen;LLVMHexagonAsmParser;LLVMHexagonDisassembler;LLVMHexagonDesc;LLVMHexagonInfo;LLVMLanaiCodeGen;LLVMLanaiAsmParser;LLVMLanaiDisassembler;LLVMLanaiDesc;LLVMLanaiInfo;LLVMMipsCodeGen;LLVMMipsAsmParser;LLVMMipsDisassembler;LLVMMipsDesc;LLVMMipsInfo;LLVMMSP430CodeGen;LLVMMSP430Desc;LLVMMSP430Info;LLVMMSP430AsmParser;LLVMMSP430Disassembler;LLVMNVPTXCodeGen;LLVMNVPTXDesc;LLVMNVPTXInfo;LLVMPowerPCCodeGen;LLVMPowerPCAsmParser;LLVMPowerPCDisassembler;LLVMPowerPCDesc;LLVMPowerPCInfo;LLVMRISCVCodeGen;LLVMRISCVAsmParser;LLVMRISCVDisassembler;LLVMRISCVDesc;LLVMRISCVInfo;LLVMSparcCodeGen;LLVMSparcAsmParser;LLVMSparcDisassembler;LLVMSparcDesc;LLVMSparcInfo;LLVMSystemZCodeGen;LLVMSystemZAsmParser;LLVMSystemZDisassembler;LLVMSystemZDesc;LLVMSystemZInfo;LLVMVECodeGen;LLVMVEAsmParser;LLVMVEDisassembler;LLVMVEInfo;LLVMVEDesc;LLVMWebAssemblyCodeGen;LLVMWebAssemblyAsmParser;LLVMWebAssemblyDisassembler;LLVMWebAssemblyDesc;LLVMWebAssemblyInfo;LLVMWebAssemblyUtils;LLVMX86CodeGen;LLVMX86AsmParser;LLVMX86Disassembler;LLVMX86TargetMCA;LLVMX86Desc;LLVMX86Info;LLVMXCoreCodeGen;LLVMXCoreDisassembler;LLVMXCoreDesc;LLVMXCoreInfo;LLVMM68kCodeGen;LLVMM68kInfo;LLVMM68kDesc;LLVMM68kAsmParser;LLVMM68kDisassembler;LLVMAsmParser;LLVMLineEditor;LLVMProfileData;LLVMCoverage;LLVMPasses;LLVMTextAPI;LLVMDlltoolDriver;LLVMLibDriver;LLVMXRay;LLVMWindowsManifest;LTO;MLIRSupportIndentedOstream;LLVMCFIVerify;LLVMDiff;LLVMExegesisX86;LLVMExegesisAArch64;LLVMExegesisPowerPC;LLVMExegesisMips;LLVMExegesis;LLVM;Remarks;Polly
Here LLVMDebuginfod
acquires CURL::libcurl
, which means I have to install libcurl-openssl-dev
and then add find_package(CURL REQUIRED)
before find_package(LLVM "14.0" REQUIRED CONFIG)
.
Most of libraries above are static, but something like LLVM
, Polly
are actually shared libraries. Linking all of them to jit
causes
$ ./t_basic
: CommandLine Error: Option 'amdgpu-dump-hsa-metadata' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
[1] 2157571 IOT instruction ./t_basic
$ ldd ./t_basic
linux-vdso.so.1 (0x00007ffc867fa000)
libLLVM-14.so.1 => /usr/lib/llvm-14/lib/libLLVM-14.so.1 (0x00007f6414a00000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.1 (0x00007f6412c00000)
libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007f641f97f000)
libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f6412a54000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6412800000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6414921000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f641f95d000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f641261f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f641facf000)
libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f641f951000)
libedit.so.2 => /lib/x86_64-linux-gnu/libedit.so.2 (0x00007f641f917000)
libz3.so.4 => /lib/x86_64-linux-gnu/libz3.so.4 (0x00007f6410e00000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f641b3e1000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f641b3ae000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f641f910000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f641b3a9000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f641b3a4000)
libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x00007f6412563000)
libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x00007f64124ac000)
libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007f641244f000)
libicuuc.so.72 => /lib/x86_64-linux-gnu/libicuuc.so.72 (0x00007f6410c02000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f641b373000)
libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f641b35b000)
libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007f64148eb000)
libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x00007f64148e6000)
libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007f6412a26000)
libicudata.so.72 => /lib/x86_64-linux-gnu/libicudata.so.72 (0x00007f640ee00000)
libmd.so.0 => /lib/x86_64-linux-gnu/libmd.so.0 (0x00007f6412442000)
And ldd
shows the executable requires LLVM-14.so
, which is unnecessary.
I think we can filter out useless libraries from the list, and ensure that the final executables are fully statically or fully dynamically linked to LLVM.
FYI, I use
list(FILTER LLVM_AVAILABLE_LIBS INCLUDE REGEX "LLVM(MCJIT|.+(CodeGen|AsmParser))" )
target_link_libraries( jit PUBLIC ${LLVM_AVAILABLE_LIBS} )
to exclude unnecessary libraries, and the executable is statically linked against LLVM now.
Hi @fwinter, we found there maybe some bugs in transpose
function of current qdp-jit, which didn't do the transpose on the spin component. A simple test program from @SaltyChiang was used to verify the code.
#include <qdp.h>
#include <qdp_layout.h>
#include <qdp_multi.h>
#include <qdp_parscalar_specific.h>
#include <qdp_primcolormat.h>
#include <qdp_scalarsite_defs.h>
using namespace QDP;
template <class T, int N> void printMatrix(PColorMatrix<T, N> matrix) {
printf("\n");
for (int i = 0; i < N; ++i) {
for (int j = 0; j < N; ++j) {
printf("%12.8f+%12.8fi ", matrix.elem(i, j).real().elem(),
matrix.elem(i, j).imag().elem());
}
printf("\n");
}
}
int main(int argc, char *argv[]) {
QDP_initialize(&argc, &argv);
const int latt_size[Nd] = {4, 4, 4, 8};
multi1d<int> nrows(Nd);
nrows = latt_size;
Layout::setLattSize(nrows);
Layout::create();
LatticePropagator prop;
gaussian(prop);
LatticePropagator prop_T = transpose(prop); //! ERROR here
printMatrix(prop.elem(0).elem(0, 1)); //* reference
printMatrix(prop_T.elem(0).elem(1, 0)); //* should be the transpose of the previous one
printMatrix(prop_T.elem(0).elem(0, 1)); //* should be very different
QDP_finalize();
return 0;
}
The commit we used is the recent 2a1c29ffa4360c38b088c2baa08eec2c0692d472
of devel
branch, and with gcc 11.2.1, cuda 11.7, llvm 13.0.0. Then the output of above code is
-0.64040606+ 0.24160731i 0.72971220+ 0.89945116i 1.07925420+ 0.83748773i
0.28545658+ -0.29035900i -0.84708820+ -0.91764636i 0.57950565+ -1.20922577i
-0.62136288+ 0.00338123i -0.31409364+ -0.44601773i 0.15046804+ 1.53841175i
1.40671455+ 1.18308668i 0.14042949+ -0.79521487i 1.53403557+ -1.10512429i
-0.57295400+ -0.88400266i -0.52849531+ 0.53221092i 1.39610009+ 0.41041381i
-0.15161463+ -1.79886656i -0.72825843+ 2.34345012i -0.15578486+ -1.51035852i
-0.64040606+ 0.24160731i 0.28545658+ -0.29035900i -0.62136288+ 0.00338123i
0.72971220+ 0.89945116i -0.84708820+ -0.91764636i -0.31409364+ -0.44601773i
1.07925420+ 0.83748773i 0.57950565+ -1.20922577i 0.15046804+ 1.53841175i
apparently the transpose
function did not transpose the spin component.
In an old build (around May 7 2021), with gcc 7.3.1, cuda 11.0, llvm 6.0.0, the transpose
function
works correctly on the spin component and the output of above code is
-0.64040606+ 0.24160731i 0.72971220+ 0.89945116i 1.07925420+ 0.83748773i
0.28545658+ -0.29035900i -0.84708820+ -0.91764636i 0.57950565+ -1.20922577i
-0.62136288+ 0.00338123i -0.31409364+ -0.44601773i 0.15046804+ 1.53841175i
-0.64040606+ 0.24160731i 0.28545658+ -0.29035900i -0.62136288+ 0.00338123i
0.72971220+ 0.89945116i -0.84708820+ -0.91764636i -0.31409364+ -0.44601773i
1.07925420+ 0.83748773i 0.57950565+ -1.20922577i 0.15046804+ 1.53841175i
1.40671455+ 1.18308668i 0.14042949+ -0.79521487i 1.53403557+ -1.10512429i
-0.57295400+ -0.88400266i -0.52849531+ 0.53221092i 1.39610009+ 0.41041381i
-0.15161463+ -1.79886656i -0.72825843+ 2.34345012i -0.15578486+ -1.51035852i
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.