Coder Social home page Coder Social logo

Comments (19)

azrael417 avatar azrael417 commented on June 19, 2024 1

What worked was to execute bootstrap.sh, then having the following modules loaded:

`[tkurth@cori08 ~]$ module list
Currently Loaded Modulefiles:

  1. modules/3.2.6.7 9) pmi/5.0.10-1.0000.11050.0.0.ari 17) atp/2.0.2
  2. nsg/1.2.0 10) dmapp/7.1.0-12.37 18) PrgEnv-intel/6.0.3
  3. modules/3.2.10.4 11) gni-headers/5.0.7-3.1 19) craype-mic-knl
  4. craype-network-aries 12) xpmem/0.1-4.5 20) cray-shmem/7.4.0
  5. craype/2.5.5 13) job/1.5.5-3.58 21) cray-mpich/7.4.0
  6. cray-libsci/16.06.1 14) dvs/2.5_0.9.0-2.155 22) intel/17.0.0.098
  7. udreg/2.3.2-4.6 15) alps/6.1.3-17.12 23) altd/2.0
  8. ugni/6.0.12-2.1 16) rca/1.0.0-6.21 24) cray-memkind`

Then running the following configure

../src/configure --prefix=${installpath} \
    --enable-simd=AVX512MIC \
    --enable-precision=double \
    --enable-comms=mpi \
    --host=x86_64-unknown-linux \
    CXX="CC" \
    CXXFLAGS="-mkl -xMIC-AVX512 -std=c++11 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    CC="cc" \
    CFLAGS="-mkl -xMIC=AVX512 -std=c99 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    LDFLAGS="-mkl -lmemkind"

The -lmemkind linking can be required because in intel 2016 and 2017, MKL makes use of hbw_alloc calls but does not link against libmemkind by default. So if you don't link and run the "wrong routine", then you will see segfaults.
In MKL 2017, this problem was solved for most of the standard BLAS/LAPACK routines but not for the newly introduced Deep Learning optimized routines (such as convolutional routines, pooling routines etc.). I think GRID does not use any of that but it is good to make sure and link properly from the begin with.

from grid.

coppolachan avatar coppolachan commented on June 19, 2024

Hello Thorsten,

  1. did you run the bootstrap.h script to generate the configure?
  2. What is your configure command line?
    The configure command line should be
    CXX=<your compiler> ./configure ....
  3. We acknowledge this, a fix will be released soon.
  4. I think it is related to 1)

from grid.

paboyle avatar paboyle commented on June 19, 2024
  1. Not surprised. Antonin reworked the build system and I was worried about the complexities
    of modules and CC wrappers on the Crays and I worried something might go wrong.

  2. did you specify CXX=CC. I'm able to override CXX=mpicxx or example on other machines,
    or CXX=clang++-3.9 and am surprised you weren't able to override.

That said, Shoji seems to be able to compile on an XC40 just fine (except the missing typecast I committed last night). Please try develop again on that.

  1. Please specify the full configure command line, the configure output, and
    the output from "make V=1"

  2. Personally, I don't overly like default hiding of the compile flow details
    e.g.
    CXX Benchmark_comms.o
    CXXLD Benchmark_comms

and would prefer not to by default hide since there really is complexity and pretending it
is all magic just makes things harder to debug. It will be the death of open source.

But others disagree with me so feedback from many people welcome to get a feel
for the average opinion.

from grid.

paboyle avatar paboyle commented on June 19, 2024

p.s. I committed a patch to Travis for the typecast in Stencil.h

Important : are any of the NERSC Cray systems available for remote login and compile just now?

It would be good if we could try it ourselves, especially since Travis provides neither
the Intel compiler, nor the Cray wrappers so this is very hard to catch in our continuous
integration framework.

from grid.

aportelli avatar aportelli commented on June 19, 2024

Hi Thorsten,

We cannot really help you if we don't have the specifics of the build. Please:

  • confirm that you are using the HEAD of develop
  • give us the configure command line
  • give us the configure summary (at the end of configure output)
  • give us the config.log file
  • give us the output of make V=1

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

Hello, using mpicxx or something related is not a good option as that would basically disable priority access to Aries interconnect. To my knowledge, there is no good way of circumventing the cray wrappers and static linking when one wants a good performance at scale on a XC-40.

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

Here are my build details:

commit:
commit 7af9b8731847667eaf3b2e33a2457b977a7254ae Author: paboyle <[email protected]> Date: Tue Oct 18 09:51:37 2016 +0100

build script:

#!/bin/bash

installpath=$(pwd)/install/grid_dp

mkdir -p build

cd build
../src/configure --prefix=${installpath} \
    --enable-simd=AVX512MIC \
    --enable-precision=double \
    --enable-comms=mpi \
    --host=x86_64-unknown-linux \
    CXX="CC" \
    CXXFLAGS="-mkl -xMIC-AVX512 -std=c++11 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    CC="cc" \
    CFLAGS="-mkl -xMIC=AVX512 -std=c99 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    LDFLAGS="-mkl -lmemkind"

make -j12

cd ..

configure output:

[tkurth@cori08 src (develop)]$ cat config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Grid configure 1.0, which was
generated by GNU Autoconf 2.63.  Invocation command line was

  $ ./configure 

## --------- ##
## Platform. ##
## --------- ##

hostname = cori12
uname -m = x86_64
uname -r = 3.12.51-52.39-default
uname -s = Linux
uname -v = #1 SMP Fri Jan 15 20:03:12 UTC 2016 (16f5bac)

/usr/bin/uname -p = x86_64
/bin/uname -X     = unknown

/bin/arch              = x86_64
/usr/bin/arch -k       = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo      = unknown
/bin/machine           = unknown
/usr/bin/oslevel       = unknown
/bin/universe          = unknown

PATH: /usr/common/software/darshan/3.0.1/bin
PATH: /usr/common/software/bin
PATH: /usr/common/mss/bin
PATH: /usr/common/nsg/bin
PATH: /global/homes/t/tkurth/MODULES/spack/bin
PATH: /usr/common/software/intel/compilers_and_libraries_2017.0.064/linux/bin/intel64
PATH: /opt/cray/pe/mpt/7.4.0/gni/bin
PATH: /opt/cray/rca/1.0.0-6.21/bin
PATH: /opt/cray/alps/6.1.3-17.12/sbin
PATH: /opt/cray/job/1.5.5-3.58/bin
PATH: /opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/bin
PATH: /opt/cray/pe/craype/2.5.5/bin
PATH: /opt/cray/pe/modules/3.2.10.4/bin
PATH: /usr/syscom/nsg/sbin
PATH: /usr/syscom/nsg/bin
PATH: /opt/modules/3.2.6.7/bin
PATH: /global/homes/t/tkurth/bin
PATH: /usr/local/bin
PATH: /usr/bin
PATH: /bin
PATH: /usr/bin/X11
PATH: /usr/games
PATH: /usr/lib/mit/bin
PATH: /usr/lib/mit/sbin
PATH: /opt/cray/pe/bin
PATH: /global/homes/t/tkurth/src/xmldiff/bin


## ----------- ##
## Core tests. ##
## ----------- ##


## ---------------- ##
## Cache variables. ##
## ---------------- ##

ac_cv_env_CCC_set=
ac_cv_env_CCC_value=
ac_cv_env_CC_set=
ac_cv_env_CC_value=
ac_cv_env_CFLAGS_set=
ac_cv_env_CFLAGS_value=
ac_cv_env_CPPFLAGS_set=
ac_cv_env_CPPFLAGS_value=
ac_cv_env_CXXCPP_set=
ac_cv_env_CXXCPP_value=
ac_cv_env_CXXFLAGS_set=
ac_cv_env_CXXFLAGS_value=
ac_cv_env_CXX_set=
ac_cv_env_CXX_value=
ac_cv_env_LDFLAGS_set=
ac_cv_env_LDFLAGS_value=
ac_cv_env_LIBS_set=
ac_cv_env_LIBS_value=
ac_cv_env_build_alias_set=
ac_cv_env_build_alias_value=
ac_cv_env_host_alias_set=
ac_cv_env_host_alias_value=
ac_cv_env_target_alias_set=
ac_cv_env_target_alias_value=

## ----------------- ##
## Output variables. ##
## ----------------- ##

ACLOCAL=''
AMDEPBACKSLASH=''
AMDEP_FALSE=''
AMDEP_TRUE=''
AMTAR=''
AUTOCONF=''
AUTOHEADER=''
AUTOMAKE=''
AWK=''
BUILD_CHROMA_REGRESSION_FALSE=''
BUILD_CHROMA_REGRESSION_TRUE=''
BUILD_COMMS_MPI_FALSE=''
BUILD_COMMS_MPI_TRUE=''
BUILD_COMMS_NONE_FALSE=''
BUILD_COMMS_NONE_TRUE=''
BUILD_COMMS_SHMEM_FALSE=''
BUILD_COMMS_SHMEM_TRUE=''
BUILD_ZMM_FALSE=''
BUILD_ZMM_TRUE=''
CC=''
CCDEPMODE=''
CFLAGS=''
CPPFLAGS=''
CXX=''
CXXCPP=''
CXXDEPMODE=''
CXXFLAGS=''
CYGPATH_W=''
DEFS=''
DEPDIR=''
ECHO_C=''
ECHO_N='-n'
ECHO_T=''
EGREP=''
EXEEXT=''
GREP=''
INSTALL_DATA=''
INSTALL_PROGRAM=''
INSTALL_SCRIPT=''
INSTALL_STRIP_PROGRAM=''
LDFLAGS=''
LIBOBJS=''
LIBS=''
LTLIBOBJS=''
MAKEINFO=''
MKDIR_P=''
OBJEXT=''
OPENMP_CXXFLAGS=''
PACKAGE=''
PACKAGE_BUGREPORT='[email protected]'
PACKAGE_NAME='Grid'
PACKAGE_STRING='Grid 1.0'
PACKAGE_TARNAME='grid'
PACKAGE_VERSION='1.0'
PATH_SEPARATOR=':'
RANLIB=''
SET_MAKE=''
SHELL='/bin/sh'
SIMD_FLAGS=''
STRIP=''
USE_LAPACK_FALSE=''
USE_LAPACK_LIB_FALSE=''
USE_LAPACK_LIB_TRUE=''
USE_LAPACK_TRUE=''
VERSION=''
ac_ct_CC=''
ac_ct_CXX=''
am__fastdepCC_FALSE=''
am__fastdepCC_TRUE=''
am__fastdepCXX_FALSE=''
am__fastdepCXX_TRUE=''
am__include=''
am__isrc=''
am__leading_dot=''
am__quote=''
am__tar=''
am__untar=''
bindir='${exec_prefix}/bin'
build=''
build_alias=''
build_cpu=''
build_os=''
build_vendor=''
datadir='${datarootdir}'
datarootdir='${prefix}/share'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
dvidir='${docdir}'
exec_prefix='NONE'
host=''
host_alias=''
host_cpu=''
host_os=''
host_vendor=''
htmldir='${docdir}'
includedir='${prefix}/include'
infodir='${datarootdir}/info'
install_sh=''
libdir='${exec_prefix}/lib'
libexecdir='${exec_prefix}/libexec'
localedir='${datarootdir}/locale'
localstatedir='${prefix}/var'
mandir='${datarootdir}/man'
mkdir_p=''
oldincludedir='/usr/include'
pdfdir='${docdir}'
prefix='NONE'
program_transform_name='s,x,x,'
psdir='${docdir}'
sbindir='${exec_prefix}/sbin'
sharedstatedir='${prefix}/com'
sysconfdir='${prefix}/etc'
target=''
target_alias=''
target_cpu=''
target_os=''
target_vendor=''

## ----------- ##
## confdefs.h. ##
## ----------- ##

#define PACKAGE_NAME "Grid"

configure: caught signal 2
configure: exit 1

environment:
`[tkurth@cori08 src (develop)]$ module list
Currently Loaded Modulefiles:

  1. modules/3.2.6.7 7) udreg/2.3.2-4.6 13) job/1.5.5-3.58 19) craype-mic-knl
  2. nsg/1.2.0 8) ugni/6.0.12-2.1 14) dvs/2.5_0.9.0-2.155 20) cray-shmem/7.4.0
  3. modules/3.2.10.4 9) pmi/5.0.10-1.0000.11050.0.0.ari 15) alps/6.1.3-17.12 21) cray-mpich/7.4.0
  4. craype-network-aries 10) dmapp/7.1.0-12.37 16) rca/1.0.0-6.21 22) intel/17.0.0.098
  5. craype/2.5.5 11) gni-headers/5.0.7-3.1 17) atp/2.0.2 23) altd/2.0
  6. cray-libsci/16.06.1 12) xpmem/0.1-4.5 18) PrgEnv-intel/6.0.3 24) cray-memkind`

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

That is the makefile output (relevant part)

make  all-am
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
depbase=`echo Init.o | sed 's|[^/]_$|.deps/&|;s|.o$||'`;\
g++ -DHAVE_CONFIG_H -I. -I../../src/lib    -I/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include -mavx512f -mavx512pf -mavx512er -mavx512cd -fopenmp  -O3  -std=c++11 -MT Init.o -MD -MP -MF $depbase.Tpo -c -o Init.o ../../src/lib/Init.cc &&\
mv -f $depbase.Tpo $depbase.Po
g++: error: unrecognized command line option '-mavx512f'
g++: error: unrecognized command line option '-mavx512pf'
g++: error: unrecognized command line option '-mavx512er'
g++: error: unrecognized command line option '-mavx512cd'
Makefile:1059: recipe for target 'Init.o' failed
make[2]: *_\* [Init.o] Error 1
make[2]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:784: recipe for target 'all' failed
make[1]: **\* [all] Error 2
make[1]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:369: recipe for target 'all-recursive' failed
make: **\* [all-recursive] Error 1

it tries using g++/gcc, not CC/cc

from grid.

coppolachan avatar coppolachan commented on June 19, 2024
  1. The config.log would be useful
  2. also the final summary of the output of the configure step.
  3. What version is your gcc, and why you are not using the intel compiler?

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

I have added the config log.
I want to use the cray compiler wrappers for intel CC/cc when PrgEnv-intel is loaded, so CC points to icpc and cc points to icc, but it wants to take gnu instead for the lib build. I think this is a bug. It should use the compiler selected by the user.

from grid.

coppolachan avatar coppolachan commented on June 19, 2024

We need the config log after your configure, the one posted says that the run command was just
./configure
I do not think this is the one you ran.

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

Oh, maybe my build script is buggy

Am 19.10.2016 um 13:09 schrieb Guido Cossu [email protected]:

We need the config log after your configure, the one posted says that the run command was just
./configure
I do no think this is the one you ran.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

Here it is (updated, now memkind loaded)

config.txt

from grid.

coppolachan avatar coppolachan commented on June 19, 2024

This looks still a problem in the environment
/usr/bin/ld: cannot find -lmemkind

configure correctly recognized icpc but some libs are missing, maybe not in the libraries path.

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

I did that before but now I switched to intel 2016 and it seems to work. Previously I used a 2017 beta.

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

Ok, we can close the issue, seems to work.

Before we do that: the bin folder only contains the Benchmarks, is that right?
Additionally, shall I run some benchmarks on Cori Peter? If yes, which ones are you most interested in?

from grid.

azrael417 avatar azrael417 commented on June 19, 2024

one last thing: I can the comms benchmark test with:

srun -n 64 -c 68 --cpu_bind=cores numactl -p 1 ./Benchmark_comms --threads 64 --mpi 2.2.4.4 --grid 128.128.128.128

and it ran the test but when trying to compute the summary:

Grid : Message        : 24906 ms : 30       4       10368000        1198.7      2397.41
Grid : Message        : 26629 ms : 30       8       20736000        1199.82     2399.64
Grid : Message        : 30097 ms : 30       16      41472000        1208.64     2417.28
Grid : Message        : 30599 ms : 32       1       3145728     710.73      1421.46
Grid : Message        : 31088 ms : 32       2       6291456     1173.55     2347.1
Grid : Message        : 32166 ms : 32       4       12582912        1159.37     2318.73
Grid : Message        : 34211 ms : 32       8       25165824        1231.09     2462.19
Grid : Message        : 38414 ms : 32       16      50331648        1210.31     2420.63
Grid : Message        : 38500 ms : ====================================================================================================
Grid : Message        : 38500 ms : = Benchmarking sequential halo exchange in 4 dimensions
Grid : Message        : 38500 ms : ====================================================================================================
Grid : Message        : 38500 ms :   L           Ls         bytes       MB/s uni        MB/s bidi
srun: error: nid12126: task 23: Floating point exception

are the parameters chosen wrong?

Am 19.10.2016 um 13:35 schrieb Guido Cossu [email protected]:

This looks still a problem in the environment
/usr/bin/ld: cannot find -lmemkind

configure correctly recognized icpc but some libs are missing, maybe not in the libraries path.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub #57 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/ABAQ5pxFhrbAXfFCD_9QcIc0XM_gn3Ycks5q1n8PgaJpZM4KUY1L.

from grid.

coppolachan avatar coppolachan commented on June 19, 2024

Can I ask you a couple of things?
Since the compilation issue seems solved can you summarize in few lines the solution, including your environment and the configure command?
It would be a good reference for other people in the same situation.

Could you open a new thread for the last request?

from grid.

paboyle avatar paboyle commented on June 19, 2024

g++ is not yet known good on AVX512 intrinsics for us. Can you try current develop with ICPC?
Peter

On 19 Oct 2016, at 16:47, Thorsten Kurth [email protected] wrote:

That is the makefile output (relevant part)

``make[1]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
make all-am
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
depbase=echo Init.o | sed 's|[^/]$|.deps/&|;s|.o$||'`;
g++ -DHAVE_CONFIG_H -I. -I../../src/lib -I/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include -mavx512f -mavx512pf -mavx512er -mavx512cd -fopenmp -O3 -std=c++11 -MT Init.o -MD -MP -MF $depbase.Tpo -c -o Init.o ../../src/lib/Init.cc &&
mv -f $depbase.Tpo $depbase.Po
g++: error: unrecognized command line option '-mavx512f'
g++: error: unrecognized command line option '-mavx512pf'
g++: error: unrecognized command line option '-mavx512er'
g++: error: unrecognized command line option '-mavx512cd'
Makefile:1059: recipe for target 'Init.o' failed
make[2]: ** [Init.o] Error 1
make[2]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:784: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:369: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

it tries using g++/gcc, not CC/cc

You are receiving this because you commented.
Reply to this email directly, view it on GitHub #57 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AHMczV8wec23JYmBchmadXvgYlKsAUvCks5q1juSgaJpZM4KUY1L.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

from grid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.