Coder Social home page Coder Social logo

mpip's Introduction

mpiP 3.5

A light-weight MPI profiler.

Introduction

mpiP is a light-weight profiling library for MPI applications. Because it only collects statistical information about MPI functions, mpiP generates considerably less overhead and much less data than tracing tools. All the information captured by mpiP is task-local. It only uses communication during report generation, typically at the end of the experiment, to merge results from all of the tasks into one output file.

Downloading

The current version of mpiP can be accessed at https://github.com/LLNL/mpiP/releases/latest.

New Features & Bug Fixes

Version 3.5 includes several new features, including

  • Multi-threaded support
  • Additional MPI-IO functions
  • Various updates including
    • New configuration options and tests
    • Updated test suite
    • Updated build behavior

Please see the ChangeLog for additional changes.

Configuring and Building mpiP

Dependencies

  • MPI installation
  • libunwind : for collecting stack traces.
  • binutils : for address to source translation
  • glibc backtrace() can also be usef for stack tracing, but source line numbers may be inconsistent.

Configuration

Several specific configuration flags can be using, as provided by ./configure -h. Standard configure flags, such as CC, can be used for specifying MPI compiler wrapper scripts.

Build Make Targets

Target Effect
[default] Build libmpiP.so
all Build shared library and all tests
check Use dejagnu to run and evaluate tests

Using mpiP

Using mpiP is very simple. Because it gathers MPI information through the MPI profiling layer, mpiP is a link time library. That is, you don't have to recompile your application to use mpiP. Note that you might have to recompile to include the '-g' option. This is important if you want mpiP to decode the PC to a source code filename and line number automatically. mpiP will work without -g, but mileage may vary.

Instrumentation

Link the mpiP library with an executable. The dependent libraries may need to be specified as well. If the link command includes the MPI library, order the mpiP library before the MPI library, as in -lmpiP -lmpi.

Run Time Instrumentation

An uninstrumented executable may able to be instrumented at run time by setting the LD_PRELOAD environment variable, as in export LD_PRELOAD=[path to mpiP]/libmpiP.so. Preloading libmpiP can possibly interfere with the launcher and may need to be specified on the launch command, such as srun -n 2 --export=LD_PRELOAD=[path to mpiP]/libmpiP.so [executable].

mpiP Run Time Flags

The behavior of mpiP can be set at run time through the use of the following flags. Multiple flags can be delimited with spaces or commas.

Option Description Default
-c Generate concise version of report, omitting callsite process-specific detail.
-d Suppress printing of callsite detail sections.
-e Print report data using floating-point format.
-f dir Record output file in directory <dir>. .
-g Enable mpiP debug mode. disabled
-k n Sets callsite stack traceback depth to . 1
-l Use less memory to generate the report by using MPI collectives to generate callsite information on a callsite-by-callsite basis.
-n Do not truncate full pathname of filename in callsites.
-o Disable profiling at initialization. Application must enable profiling with MPI_Pcontrol().
-p Point-to-point histogram reporting on message size and communicator used.
-r Generate the report by aggregating data at a single task. default
-s n Set hash table size to <n>. 256
-t x Set print threshold for report, where <x> is the MPI percentage of time for each callsite. 0.0
-v Generates both concise and verbose report output.
-x exe Specify the full path to the executable.
-y Collective histogram reporting on message size and communicator used.
-z Suppress printing of the report at MPI_Finalize.

For example, to set the callsite stack walking depth to 2 and the report print threshold to 10%, you simply need to define the mpiP string in your environment, as in any of the following examples:

$ export MPIP="-t 10.0 -k 2" (bash)

$ export MPIP=-t10.0,-k2 (bash)

$ setenv MPIP "-t 10.0 -k 2" (csh)

mpiP prints a message at initialization if it successfully finds the MPIP variable.

For more information on mpiP, please see the User Guide in the mpiP distribution.

License

Copyright (c) 2006, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory Written by Jeffery Vetter and Christopher Chambreau. UCRL-CODE-223450. All rights reserved.

This file is part of mpiP. For details, see http://llnl.github.io/mpiP.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the disclaimer below.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the disclaimer (as noted below) in the documentation and/or other materials provided with the distribution.

  • Neither the name of the UC/LLNL nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE U.S. DEPARTMENT OF ENERGY OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Additional BSD Notice

  1. This notice is required to be provided under our contract with the U.S. Department of Energy (DOE). This work was produced at the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-ENG-48 with the DOE.

  2. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights.

  3. Also, reference herein to any specific commercial products, process, or services by trade name, trademark, manufacturer or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes.

mpip's People

Contributors

artpol84 avatar bkmgit avatar cchambreau avatar lee218llnl avatar milthorpe avatar roblatham00 avatar rothpc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mpip's Issues

./configure doesn't check for python

How to reproduce: run ./configure on a system where python is not in the PATH.
Then configure succeeds, but make fails.

$  make -j 8
python ./make-wrappers.py --xlate --arch=x86_64 --f77symbol symbol_ mpi.protos.txt make-wrappers.py
make: python: Command not found
Makefile:235: recipe for target 'mpiPi_def.h' failed

Aggregate Collective Time "MPI Time %" values are too small

I am trying to make sense of a set of mpiP profiles that were generated for our application, such as the one below (112 MPI ranks).

Namely, the MPI Time % column in the Aggregate Collective Time section does not seem to match up with the MPI% column in the Aggregate Time section. The former percentages seem too small, and their sum for a particular MPI operation does not equal the overall MPI % for that call..

Looking at the snippets from the example in the documentation (https://software.llnl.gov/mpiP/), a similar mismatch seems to be present for the Allreduce calls there.

@ Version                  : 3.5.0
@ MPIP Build date          : Jun  3 2021, 15:44:58
@ Start time               : 2021 06 11 11:17:11
@ Stop time                : 2021 06 11 11:21:34
@ Timer Used               : PMPI_Wtime
@ MPIP env var             : -c -d -p -y -k 0
@ Collector Rank           : 0
@ Collector PID            : 84251
@ Final Output Dir         : .
@ Report generation        : Single collector task

---------------------------------------------------------------------------
@--- Task Time Statistics (seconds) ---------------------------------------
---------------------------------------------------------------------------
                     AppTime           MPITime   MPI%   App Task   MPI Task
Max               262.443481         54.193995                18         18
Mean              262.300032         24.499705
Min               262.157525          9.101441               103         97
Stddev              0.143029         12.336197
Aggregate       29377.603587       2743.966912   9.34
---------------------------------------------------------------------------
@--- Aggregate Time (top twenty, descending, milliseconds) ----------------
---------------------------------------------------------------------------
Call                 Site       Time    App%    MPI%      Count
Allreduce               4   1.79e+06    6.10   65.33     134512
Iprobe                117   5.27e+05    1.79   19.20 1283478018
Test                  156   2.12e+05    0.72    7.74 1283497221
Start                 154   1.19e+05    0.40    4.33   30248784
File_open              56   6.74e+04    0.23    2.46        112
Wait                  179   1.97e+04    0.07    0.72   15138384
File_close             47   2.82e+03    0.01    0.10        112
Bcast                   8    2.4e+03    0.01    0.09        224
File_seek              62   1.01e+03    0.00    0.04        112
File_read              58       21.7    0.00    0.00          1
Recv_init             136         21    0.00    0.00      12592
Request_free          139       8.85    0.00    0.00      25184
Send_init             149       6.19    0.00    0.00      12592
---------------------------------------------------------------------------
@--- Aggregate Sent Message Size (top twenty, descending, bytes) ----------
---------------------------------------------------------------------------
Call                 Site      Count      Total       Avrg  Sent%
Allreduce               4     134512   2.96e+06         22  93.90
Bcast                   8        224   1.92e+05        859   6.10
---------------------------------------------------------------------------
@--- Aggregate Collective Time (top twenty, descending) -------------------
---------------------------------------------------------------------------
Call                 MPI Time %             Comm Size             Data Size
Allreduce                0.0785         64 -      127         32 -       63
Allreduce                0.0294         64 -      127          8 -       15
Bcast                   0.00698         64 -      127       1024 -     2047
Bcast                  3.27e-05         64 -      127          8 -       15
---------------------------------------------------------------------------
@--- Aggregate Point-To-Point Sent (top twenty, descending) ---------------
---------------------------------------------------------------------------
No point to point operations to report
---------------------------------------------------------------------------
@--- Aggregate I/O Size (top twenty, descending, bytes) -------------------
---------------------------------------------------------------------------
Call                 Site      Count      Total       Avrg   I/O%
File_read              58          1    4.1e+03    4.1e+03 100.00
---------------------------------------------------------------------------
@--- End of Report --------------------------------------------------------
---------------------------------------------------------------------------

Out-of-tree builds fail

Building mpiP outside of teh source tree fails for me.

Reproducing the problem is simple:

cd mpiP-3.4.1
mkdir _build; cd __build
../configure ....
make shared
make install

I run into two issues:

  1. The header file mpiPi.h is not found during compilation.
    A quick fix is to have CFLAGS="-I.", but the build system should deal with this.
  2. Later on, I get the error:
/usr/bin/install -c doc/*txt doc/*html doc/README ...../share/doc/mpip ; \

/usr/bin/install: cannot stat ‘doc/*txt’: No such file or directory
/usr/bin/install: cannot stat ‘doc/*html’: No such file or directory
/usr/bin/install: cannot stat ‘doc/README’: No such file or directory

can not work with intelmpi

I build mpiP with intelmpi and icc. It seems allright when i build the libmpiP.so.

But when I run NPB program mg.D.64 with my intelmpi-compiled mpiP, segmentation in mpiPi_init happened.

Have anyone build mpiP with intelmpi?

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
==== backtrace (tid:  30323) ====
 0 0x0000000000066aba MPI_Comm_rank()  ???:0
 1 0x000000000006bbac mpiPi_init()  /gpfs/home/cs/sunjw/addition/run/mpiP-master/mpiPi.c:93
 2 0x000000000006a302 _MPI_Init()  /gpfs/home/cs/sunjw/addition/run/mpiP-master/wrappers_special.c:45
 3 0x00000000004084e1 MAIN__()  mg.f:0
 4 0x000000000040a1e7 main()  ???:0
 5 0x0000000000022555 __libc_start_main()  ???:0
 6 0x00000000004011e9 _start()  ???:0
=================================

My intelmpi version is:

which mpirun
/opt/intel/2020/compilers_and_libraries_2020.0.166/linux/mpi/intel64/bin/mpirun
which icc
/opt/intel/2020/compilers_and_libraries_2020.0.166/linux/bin/intel64/icc

Communication matrix

Does mpiP output the communication matrix?
sending rank | receiving rank | bytes sent

Porting to OS X

I've been interested in using mpiP to try and understand some parallel bugs I've been having, and I like to use my laptop for small-scale debugging, so I started porting mpiP over to OS X.

I have a build that works without BFD or libelf/libdwarf support in spack/spack#7146. The patch therein is inelegant, and I suspect there is a simpler one that will work better, but it's a starting point. Would the devs on this project be interested in a patch if I file a PR?

I also have a branch of spack that builds mpiP on OS X with libelf/libdwarf support, and I can build examples, but libdwarf throws an error in mpip (Code 134, DW_DLE_ARANGE_OFFSET_BAD: The debug_arange entry has a DIE offset that is larger than the size of the .debug_info section). I suspect this error might be due to spack packages inconsistently using GNU libtool or Apple's BSD libtool, but was wondering if you guys might have more insight into that bug, or interest in that capability.

Possible data corruption for Fortran in the presence of MPI_IN_PLACE

A range of MPI operations allow to reuse send buffers as receive buffers by setting the send buffer to a special constant MPI_IN_PLACE. With Fortran applications this can lead to data corruption if executed with mpiP.

The corruption can be demonstrated with this simple code:

PROGRAM sample_allreduce
  USE mpi
  IMPLICIT NONE

  INTEGER :: ierr
  INTEGER :: rank, rank_in_place
  INTEGER :: rank_sum

  CALL MPI_Init(ierr)
  CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

  rank_in_place = rank

  PRINT *, 'Rank: ', rank
  CALL MPI_Allreduce(rank, rank_sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)                           
  CALL MPI_Allreduce(MPI_IN_PLACE, rank_in_place, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)              
  PRINT *, 'Sum: ', rank_sum, ' - ', rank_in_place

  CALL MPI_Finalize(ierr)
END PROGRAM sample_allreduce

Executing without mpiP instrumentation leads to the expected output

$ mpirun -np 3 ./a.out                                                           
 Rank:            0
 Sum:            3  -            3
 Rank:            1
 Sum:            3  -            3
 Rank:            2
 Sum:            3  -            3

while executing with mpiP corrupts the data as

$ mpirun -np 3 env LD_PRELOAD=$HLRS_MPIP_ROOT/lib/libmpiP.so ./a.out
mpiP: 
mpiP: mpiP V3.5.0 (Build Mar 16 2023/14:16:24)
mpiP: 
 Rank:            0
 Sum:            3  -            0
 Rank:            1
 Sum:            3  -            0
 Rank:            2
 Sum:            3  -            0
mpiP: 
mpiP: Storing mpiP output in [./a.out.3.1905013.1.mpiP].
mpiP: 

Note, that the second column (which used MPI_IN_PLACE) is "0" while it should be "3".

I guess that the underlying problem is missing or incorrect treatment of constants such as MPI_IN_PLACE in the transition from Fortran to C PMPI interfaces. A similar problem has been observed in other projects / tools using PMPI such as here. In fact, the code above is taken from that issue.

I have observed this behavior for mpiP v3.4.1 and v3.5 using GCC v10.2 with either OpenMPI v4.1.4 or HPE's MPI implementation MPT 2.26.

Also note, that the code runs correctly when replacing use mpi with use mpi_f08, at least for OpenMPI (but not for MPT).

MPI 4.0 Support

Adding support for some of the functionality in the MPI 4.0 Standard, including:

  • Persistent Collectives
  • Partitioned Communications
  • Isendrecv and Isendrecv_replace

Compilation issue

Hi,

I am using topic/mt branch for profiling communication through threads in MVAPICH2.

Two issues.

  1. I cannot checkout topic/mt branch unless I delete mpiPi.h file
  2. The compilation fails saying following.
    checking whether MPI_Init is declared... no
    configure: error: "Failed to find declaration for MPI_Init!"

Is the topic/mt branch still under testing or what branch should I use for thread profiling?

Thanks
Amit

Incorrect timing values with Python Code

While using mpiP with a python based application that uses Horovod + NCCL, I am seeing wrong timings reported by mpiP. A sample report along with system details is attached for reference. As per the Linux time command (time mpiexec ....), the application finishes in 6950 seconds whereas mpiP reports show apptime of 13900 seconds for each rank. Does mpiP work correctly with Python + NCCL code or let me know if I am missing something?

mpiP.zip

Thanks,
Amit

No mpiP report generated

Hi, I am using mpiP 3.5.0 to do profiling for WRFV-3.9.1.1. I built mpiP with Open MPI 4.0.5

./configure --prefix=$MPIP_ROOT --disable-libunwind CC=/path/to/mpicc CXX=/path/to/mpicxx F77=/path/to/mpif77

and pre-load it in my mpirun as

mpirun -x "LD_PRELOAD=/path/to/libmpiP.so" ....(other mpi arguments) wrf.exe

I can see mpiP printed in the stdout as

mpiP:
mpiP: mpiP V3.5.0 (Build Dec 18 2020/00:20:30)
mpiP:

But there is no final report generated and no error message was printed. Could you help provide any suggestions on debugging this issue? The application is coded with fortran 90. But I used mpiP to do profiling for the simple mpif90 program ring_usempi and do not have such issue.

Getting callstack information

I'd like to know which functions in the code are making the MPI calls shown in the profile. I compiled the code with -g, and still the function or source file names are not shown. What additional steps do I need to take to get that information?
Thanks.

mpiP profiling data problem

Hi,

The mpiP profiling data has a callSite property, I did not understand it.

I want to collect mpi trace data like this:
MPI_RANK {MPI_Functionlist} such as process 0 : {MPI_send, MPI_rev, ……}
and How can I know the source and destination of each MPI function call, along with the message_size and start_time, end_time?

Can not go through the fortran part

Hi,

I am trying to build mpiP on Argonne Polaris system, but it raises an issue when running "./configure;"

Do you have any idea to solve it?

Thanks
-Zhen

Below are the logs:
./configure;
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu

Note: mpiP sets cross_compiling to yes to keep configure from failing in
the case where test executables would need to be run as a parallel job.

checking for a BSD-compatible install... /usr/bin/install -c
checking for ranlib... ranlib
checking for python... python
checking for mpixlc... no
checking for mpiicc... no
checking for cmpicc... no
checking for mpcc... no
checking for mpicc... mpicc
checking for mpixlC... no
checking for mpiicpc... no
checking for mpiCC... no
checking for mpicxx... mpicxx
checking for mpixlf... no
checking for mpiifort... no
checking for mpifort... mpifort
checking for poe... no
checking for jsrun... no
checking for srun... no
checking for prun... no
checking for mpirun... no
checking for mpiexec... mpiexec
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... yes
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether mpicc accepts -g... yes
checking for mpicc option to accept ISO C89... none needed
checking for sqrt in -lm... yes
checking how to run the C preprocessor... mpicc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking size of void*... 8
checking PIC flags... -fpic -DPIC
checking whether MPI_Init is declared... yes
checking for library containing dlopen... -ldl
checking for inflate in -lz... yes
checking for dcgettext__ in -lintl... no
checking for objalloc_create in -liberty... yes
checking bfd.h usability... yes
checking bfd.h presence... yes
checking for bfd.h... yes
checking for bfd_openr in -lbfd... yes
checking whether bfd_get_section_size is declared... no
checking whether bfd_get_section_vma is declared... no
checking for bfd_boolean... yes
checking demangle.h usability... yes
checking demangle.h presence... yes
checking for demangle.h... yes
checking for MPIR_ToPointer in -lmpi... no
checking for MPIR_ToPointer in -lmpich... no
checking for rts_get_timebase... no
checking whether _rtc is declared... no
checking for mread_real_time... no
checking for read_real_time... no
checking for MPI_Wtime... yes
checking libunwind.h usability... yes
checking libunwind.h presence... yes
checking for libunwind.h... yes
checking for MPI_File_open... no
checking for MPI_File_open in -lmpio... no
MPI I/O symbols not found. MPI I/O reporting deactivated.
checking for MPI_Win_allocate... no
MPI RMA symbols not found. MPI RMA reporting deactivated.
checking for MPI_Ibarrier... no
MPI NONBLOCKINGCOLLECTIVES symbols not found. MPI NONBLOCKINGCOLLECTIVES reporting deactivated.
checking /proc/self/maps... yes
checking for getline... no
getline not found so SO lookup is disabled.
checking fortran to C conversion... functions not found translation deactivated
checking ARM LSE... no
checking fortran symbols... could not determine F77 symbol names. Example error follows:
configure: flink.c:1:1: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
main(){ FF(); return 0; }
^
int
flink.c:1:9: error: call to undeclared function 'f_fun'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
main(){ FF(); return 0; }
^
:1:12: note: expanded from here
#define FF f_fun
^
2 errors generated.
configure: error: giving up

Python not found during configure

Our system does not provide the command python, only python2 and python3. I believe this is becoming increasingly widespread.

The configure scripts accepts only python and I do not see a way to change this via options or environment variables.

I suggest to replace

AC_CHECK_PROG(HAVE_PYTHON,[python],[python],[no])

with

AC_CHECK_PROGS(HAVE_PYTHON,[python, python3, python2],[no])

Consider dropping support for python2.

deal with deleted MPI-1 functions

From https://sourceforge.net/p/mpip/mailman/message/36511483/:

I am excited to see mpiP and I hope it will help me debug some MPI performance issues in my application. I am running into issues trying to use it though. I suspect I am doing something basic wrong. Can you please help me?

If it matters, I have OpenMPI 4 on Ubuntu 18.04, gcc 7.3.
My OpenMPI is installed system-wide in /usr/local/openmpi. I installed mpiP like this:
./configure --prefix=/usr/local/mpiP/ --without-f77 && make && sudo make install

I have a test file with an MPIReduce call. I’m building it as follows:
mpicc -g reduce.c -o reduce -L/usr/local/mpiP/lib -lmpiP -lm

I am unable to successfully build the executable.
/usr/local/mpiP/lib/libmpiP.a(wrappers.o): In function mpiPif_MPI_Attr_delete': /home/ubuntu/mpiP-3.4.1/wrappers.c:489: undefined reference to PMPI_Attr_delete'
/home/ubuntu/mpiP-3.4.1/wrappers.c:489: undefined reference to PMPI_Attr_delete' /usr/local/mpiP/lib/libmpiP.a(wrappers.o): In function mpiPif_MPI_Attr_get':
/home/ubuntu/mpiP-3.4.1/wrappers.c:555: undefined reference to PMPI_Attr_get' /home/ubuntu/mpiP-3.4.1/wrappers.c:555: undefined reference to PMPI_Attr_get'
/usr/local/mpiP/lib/libmpiP.a(wrappers.o): In function mpiPif_MPI_Attr_put': /home/ubuntu/mpiP-3.4.1/wrappers.c:621: undefined reference to PMPI_Attr_put'
/home/ubuntu/mpiP-3.4.1/wrappers.c:621: undefined reference to PMPI_Attr_put' /usr/local/mpiP/lib/libmpiP.a(wrappers.o): In function mpiPif_MPI_Keyval_create':
/home/ubuntu/mpiP-3.4.1/wrappers.c:4941: undefined reference to PMPI_Keyval_create' /home/ubuntu/mpiP-3.4.1/wrappers.c:4941: undefined reference to PMPI_Keyval_create'
/usr/local/mpiP/lib/libmpiP.a(wrappers.o): In function mpiPif_MPI_Keyval_free': /home/ubuntu/mpiP-3.4.1/wrappers.c:5005: undefined reference to PMPI_Keyval_free'
/home/ubuntu/mpiP-3.4.1/wrappers.c:5005: undefined reference to PMPI_Keyval_free' /usr/local/mpiP/lib/libmpiP.a(record_stack.o): In function mpiPi_RecordTraceBack':
/home/ubuntu/mpiP-3.4.1/record_stack.c:51: undefined reference to _Ux86_64_getcontext' /home/ubuntu/mpiP-3.4.1/record_stack.c:57: undefined reference to _ULx86_64_init_local'
/home/ubuntu/mpiP-3.4.1/record_stack.c:66: undefined reference to _ULx86_64_step' /home/ubuntu/mpiP-3.4.1/record_stack.c:73: undefined reference to _ULx86_64_step'
/home/ubuntu/mpiP-3.4.1/record_stack.c:76: undefined reference to _ULx86_64_get_reg' /home/ubuntu/mpiP-3.4.1/record_stack.c:66: undefined reference to _ULx86_64_step'
collect2: error: ld returned 1 exit status

This is needed because Open-MPI has implemented the deletion of these symbols.

I am working on a patch now.

VPATH buils fail with v3.5

Building mpiP in-tree works fine. But out-of-tree / VPATH builds fail.

To reproduce:

$> # extract tarball and cd into mpiP directory
$> mkdir mybuild
$> cd mybuild
$> ../configure 
$> make
../arch/arch.h:17:10: fatal error: arch/arch_x86_64.h: No such file or directory

It looks like the include path does not contain $(srcdir).

In fact, calling make as

$> make CPATH=$CPATH:../

serves as a workaround.

Issue while building : KeyError: 'LOGNAME'

Hello,

I am trying to build mpiP and see below error:

==> 'make'
python /home/guest/workarena/softwares/sources/spack/var/spack/stage/mpip-3.4.1-x7l5jk256ayuuirddcxdpbpytlnis3hq/mpiP-3.4.1/make-wrappers.py --xlate --arch=x86_64 --f77symbol symbol_ mpi.protos.txt make-wrappers.py
MPI Wrapper Generator ($Revision: 498 $)
-----*----- Parsing input file
-----*----- Parsing completed:  162  functions found.
-----*----- Beginning parameter optimization
-----*----- Generating structure files
Traceback (most recent call last):
  File "/home/guest/workarena/softwares/sources/spack/var/spack/stage/mpip-3.4.1-x7l5jk256ayuuirddcxdpbpytlnis3hq/mpiP-3.4.1/make-wrappers.py", line 1425, in <module>
    main()
  File "/home/guest/workarena/softwares/sources/spack/var/spack/stage/mpip-3.4.1-x7l5jk256ayuuirddcxdpbpytlnis3hq/mpiP-3.4.1/make-wrappers.py", line 1414, in main
    GenerateStructureFile()
  File "/home/guest/workarena/softwares/sources/spack/var/spack/stage/mpip-3.4.1-x7l5jk256ayuuirddcxdpbpytlnis3hq/mpiP-3.4.1/make-wrappers.py", line 745, in GenerateStructureFile
    olist = StandardFileHeader(sname)
  File "/home/guest/workarena/softwares/sources/spack/var/spack/stage/mpip-3.4.1-x7l5jk256ayuuirddcxdpbpytlnis3hq/mpiP-3.4.1/make-wrappers.py", line 712, in StandardFileHeader
    olist.append("/* Creator: " + os.environ["LOGNAME"] + "  */\n")
  File "/usr/lib/python2.7/UserDict.py", line 40, in __getitem__
    raise KeyError(key)
KeyError: 'LOGNAME'
Makefile:235: recipe for target 'mpiPi_def.h' failed
make: *** [mpiPi_def.h] Error 1
==> Error: ProcessError: Command exited with status 2:
    'make'

The LOGNAME env variable is not set on this system (actually container). Would be nice if we handle this case.

callsite timing only show one rank

I lanch application with 8 rank, but Callsite time statistics and callsite message sent statics one site one rank ,why not one site eight rank ?

Callsite representation (avoid pointing to wrappers.c)

For the following application (ring_c.txt), I have to set the stack depth:

export MPIP="-k 2")

To see the actual callsites and get the following output:

---------------------------------------------------------------------------
@--- Callsites: 6 ---------------------------------------------------------
---------------------------------------------------------------------------
 ID Lev File/Address        Line Parent_Funct             MPI_Call
  1   0 wrappers.c           515 MPI_Barrier              Barrier
  1   1 ring_c.c              81 main
  2   0 wrappers.c           515 MPI_Barrier              Barrier
  2   1 ring_c.c              78 main
  3   0 wrappers.c          6649 MPI_Recv                 Recv
  3   1 ring_c.c              52 main
  4   0 wrappers.c          7608 MPI_Send                 Send
  4   1 ring_c.c              60 main
  5   0 wrappers.c          6649 MPI_Recv                 Recv
  5   1 ring_c.c              71 main
  6   0 wrappers.c          7608 MPI_Send                 Send
  6   1 ring_c.c              39 main

An entry in the wrapper.c as it points to mpiP (profiler tool) internals that are irrelevant to the application being analyzed.
Does it make sense to always shift one stack frame up to skip wrappers.c?

unable to find source line info for address

Sorry for leaving this message here because I do not know how to solve this problem.

I have tried using bfd and dwarf/elf to convert the binary to the source line number but all failed.

mpiP Configuration Summary

C compiler : mpicc
C++ compiler : mpicxx
Fortran compiler : mpif90

Timer : MPI_Wtime
Stack Unwinding : libunwind
Address to Source Lookup : bfd

MPI-I/O support : yes
MPI-RMA support : yes
MPI-NBC support : yes


mpiP Configuration Summary

C compiler : mpicc
C++ compiler : mpicxx
Fortran compiler : mpif90

Timer : MPI_Wtime
Stack Unwinding : libunwind
Address to Source Lookup : libelf/libdwarf

MPI-I/O support : yes
MPI-RMA support : yes
MPI-NBC support : yes


The source line of the output always look like this:

28 @--- Callsites: 4 ---------------------------------------------------------
29 ---------------------------------------------------------------------------
30 ID Lev File/Address Line Parent_Funct MPI_Call
31 1 0 0x7fe7d3ae7b2a [unknown] Recv
32 2 0 0x7f260692259a [unknown] Send
33 3 0 0x7f1637a7859a [unknown] Send
34 4 0 0x7f044e9dc59a [unknown] Send

Can someone point me to the right direction? Much appreciated!

ARM support

Is MPIP expected to work on ARM?

I am using MPIP with MPICH on an ARM machine and it looks like MPIP starts up but then almost immediately gives a signal 11. I have no problem on x86. The application I am profiling on is miniAMR (a DOE proxy app).

Don't think mpiP is detecting my installation of libunwind.

After cloning the repository, I am running configure as

./configure CPPFLAGS='-I /home/ckalahi/software/modules/linux-rocky8-x86_64/include/' LDFLAGS='-L /home/ckalahi/software/modules/linux-rocky8-x86_64/lib/' --with-libunwind='/home/ckalahi/software/modules/linux-rocky8-x86_64/lib/libunwind.so'
make
make install

When trying to run a program using mpiP, I am seeing

error while loading shared libraries: libunwind.so.8: cannot open shared object file: No such file or directory

Upon running

ldd libmpiP.so

I noticed the line

libunwind.so.8 => not found

I assumed that the --with-libunwind flag would let mpiP know where libunwind.so is installed, but that doesn't seem to be the case.

It's probably also worth noting that I am working on my school's cluster and don't have the ability to install libunwind to the usual directory. As such, I am trying to point mpiP to the location that I am able to install libunwind to.

MPI profiling for hanging code

mpiP generates the profile if a program ends completely. Is it possible to generate the profile after some time as many codes either never completes (generate intermediate files) or have some programming issues. This would be helpful to debug the slow/hung applications to identify what communication operations were used and in what message range. A simple environment variable might be implemented to trigger the writing of the mpiP report at some specified time.

Limit 20 : Aggregate Collective Time

I would like to get information of all the collective operation rather top 20 shown in "Aggregate Collective Time" table. Is it possible to define any such env variable at runtime or change in the code to enable this information?

eg.

---------------------------------------------------------------------------
@--- Aggregate Collective Time (top twenty, descending) -------------------
---------------------------------------------------------------------------
Call                 MPI Time %             Comm Size             Data Size
Allreduce                  4.08        256 -      511          8 -       15
Allreduce                  2.31        256 -      511          0 -        7
Barrier                  0.0468        256 -      511          0 -        7
Bcast                    0.0343        256 -      511        512 -     1023

Thread-multiple support

Hello,
We would like to see if there are any plans/interest in supporting multi-threaded mode in mpiP.
I’m interested in using it and possibly can help with the development.
It seems like a relatively simple change from the first sight if using TLS to isolate measurements in different threads.
At the report time an additional step to bring it all together will be needed, but that is fine.
I may be missing some details, for example whether or not libfd/ dwarf are thread safe or/and lightweight in that mode. Please correct me if that could be an issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.