darshan-hpc / darshan Goto Github PK

Darshan I/O characterization tool

License: Other

C 49.60% Makefile 0.62% Shell 2.05% Perl 8.58% CSS 0.66% C++ 0.06% Fortran 0.20% Gnuplot 0.16% TeX 0.13% Python 16.01% Batchfile 0.03% Jupyter Notebook 18.79% M4 3.04% HTML 0.08%

darshan's Introduction

Darshan is a lightweight I/O characterization tool that transparently captures I/O access pattern information from HPC applications. Darshan can be used to tune applications for increased scientific productivity or to gain insight into trends in large-scale computing systems.

Please see the Darshan web page for more in-depth news and documentation.

The Darshan source tree is divided into two main parts:

darshan-runtime: to be installed on systems where you intend to instrument MPI applications. See darshan-runtime/doc/darshan-runtime.txt for installation instructions.
darshan-util: to be installed on systems where you intend to analyze log files produced by darshan-runtime. See darshan-util/doc/darshan-util.txt for installation instructions.

The darshan-test directory contains various test harnesses, benchmarks, patches, and unsupported utilites that are mainly of interest to Darshan developers.

darshan's People

Contributors

Stargazers

Watchers

darshan's Issues

poor error message if bzip support not available

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

If you compile darshan-utils without bzip support and then try to run darshan-job-summary on a bzip'd log file, then this message is displayed:

Error: incompatible darshan file.
Error: expected version 2.01, but got BZh91AY&SY
Error: unable to read job information from log file.
Use of uninitialized value $starttime in localtime at /tmp/darshan-util/bin/darshan-job-summary.pl line 285.
This darshan log has no file records. No summary was produced.
Use of uninitialized value $jobid in concatenation (.) or string at /tmp/darshan-util/bin/darshan-job-summary.pl line 288.
    jobid:
Use of uninitialized value $uid in concatenation (.) or string at /tmp/darshan-util/bin/darshan-job-summary.pl line 289.
      uid:
Use of uninitialized value $starttime in concatenation (.) or string at /tmp/darshan-util/bin/darshan-job-summary.pl line 290.
starttime: Wed Dec 31 19:00:00 1969 ( )
Use of uninitialized value $runtime in concatenation (.) or string at /tmp/darshan-util/bin/darshan-job-summary.pl line 291.
  runtime: (seconds)
Use of uninitialized value $nprocs in concatenation (.) or string at /tmp/darshan-util/bin/darshan-job-summary.pl line 292.
   nprocs:
Use of uninitialized value $version in concatenation (.) or string at /tmp/darshan-util/bin/darshan-job-summary.pl line 293.
  version:

The utilities should detect this gracefully (by checking magic numbers in header, for example) and provide a more helpful error message.

optimize darshan startup routines

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Darshan scans mount points and collects information about them at startup. In the past we were forced to do this on every process in order to correctly match files to mount points at run time. This is no longer necessary, however.

We should eliminate the logic that collects device ids from each mount points. We should also modify the algorithm that collects the remaining information (mount point path and default block size) so that it is only collected at rank 0 and then broadcasted to ranks 1 through N-1.

darshan-job-summary.pl fails when lots of files created

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

The darshan-job-summary.pl as installed on eureka fails to generate a pdf with log files containing lots (93,000+) records.

you can sort of see for yourself (except the error reporting is horrible here: i'll open a new bug for that):

darshan-job-summary.pl /intrepid-fs0/logs/darshan/2010/10/1/mmin_nekcem_id314635_10-1-13849_4.darshan.gz

after 3 minutes there will be an error about unable to move summary.pdf

I had to hack up the perl script to keep the temp dir handy and run pdflatex myself. the error from pdflatex (once it is no longer diverted to the output file) is the fairly terse


Overfull \hbox (44.47777pt too wide) in paragraph at lines 95--96
 [][]
! Dimension too large.
\@currbox ->\bx@C

l.96 \end{figure*}

track per-process information in addition to per-file information

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

This ticket would be best done after modularizing the log format (see #46). In addition to storing per-file records, we could store summary record per process (but spanning files) as well.

For example, we might want to know the total amount of time a process spent doing metadata, reads, and writes, as well as the number of bytes it read and wrote, regardless of how many files it opened or how many threads it used.

We could use this per-process data (in conjunction with a reduction step) to produce an immediate performance estimate without much post processing.

This also depends on accurate thread accounting in #81.

fprintf (and other fstream function) handling

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Darshan doesn't detect data written to a file using fprintf. We would need to add wrappers for this and other related functions (fputs, fscanf, etc.).

address cuserid() issue on BG/Q

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

The current BG/Q environment doesn't support cuserid(), which Darshan uses to identify the userid for each job. There is a ticket open with IBM to track this issue. If IBM doesn't address it, our fallback plan on BG/Q will be to use the $USER environment variable, and possibly hash it to generate a unique numerical id for each user.

out-of-tree builds are not working in trunk

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

I'm hitting a few misc. errors when trying to build the current darshan trunk out of tree. Need to reproduce on one of our normal development platforms.

mpi wrappers no longer report mpich version

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

expected behavior: 'mpicc -v' should tell me something like "mpicc for MPICH2 version 1.4a1" or "mpicc for 1.1".

observed behavior: 'mpicc -v' give me only compiler flags, mentioning nothing about the mpi implementation version.

fallout: the HDF5 configure checks do not find the string they expect and then fail to enable some significant optimizations.

documentation for new darshan util/runtime split

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Put a readme file in the top level explaining the basic and pointing to the wiki. Update the wiki with new documentation for the upcoming version of darshan.

job header corruption in darshan-parser

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

New log files created in current trunk produce corrupt values for nprocs, version, and start and end time when processed with darshan-parser.

Add 'verbose' option to darshan-job-summary.pl

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Add some type of option that will allow keeping the tmp directory created for all the generated files that darshan-job-summary.pl produces. This makes

have darshan-job-summary.pl output pdf using same base name as input file by default

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Right now darshan-job-summary.pl outputs the pdf file in a file named summary.pdf in the cwd by default. It would be more helpful if it named the file based on the base name of the input log file instead.

add darshan-configure utility to show link flags and other parameters

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

In particular, darshan-configure could at least show --pre-ld-flags and --post-ld-flags to indicate the manual link flags that must be added before and after existing link flags in order to manually add Darshan instrumentation.

Right now there isn't any good way to find these settings except to inspect a script produced by darshan-gen-cc.pl.

environment variable to enable timing

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

The darshan shutdown routine includes the ability to time various steps and print the results. This is normally used only by the cp-shutdown-bench program.

It would probably be a good idea to add an environment variable that can be checked at run time to enable this timing on any job run.

document steps/issues in using Darshan on Cray

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Get Darshan running on a Cray machine and record what is necessary to make it work (both LD_PRELOAD method and preliminary static linking method).

Configure test for BG/P fails to identify need for _FILE_OFFSET_BITS.

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

The configure test on BG will result in the _LARGEFILE64_SOURCE macro being set but not the _FILE_OFFSET_BITS=64 macro. This results in off_t being defined as 32 bits instead of 64. This means the CP_MAX_BYTE_WRITTEN and CP_MAX_BYTE_READ can not be larger than 32 bits.

add configure script argument to workaround broken cuserid() function

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

We need a configure-time argument to work around cuserid() segmentation faults by replacing that logic with getenv("LOGNAME"). I don't see a way to handle this at run time.

control more parameters via environment variable

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Suggested by Bill Barth:

Add the ability to control the logpath, jobid variable, and memory alignment parameters via environment variables at run time.

evaluate how many jobs hit wall time on Intrepid

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

This is just an exercise to better understand why we lack coverage of some jobs/projects. One reason is if the job end without calling MPI_Finalize. We should be able to detect these jobs based on job database information and cross reference against darshan coverage information.

Use of off_t in darshan wrapper

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Since off_t can defined as 32 or 64 bits in the calling program. Some interfaces that use off_t could potentially fail. I'm not sure what happens in each scenario, but we should evaluate this.

use statfs() to detect file alignment when --enable-stat-at-open is not set

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Right now we set alignment to -1 if we aren't able to stat the file at open time. As an alternative, we can call statfs() on all mounted file systems to get the default block size. This can be done on rank 0 at startup and broadcast to all processes.

We also need to add an exception for Lustre, because it appears to always set the block size in statfs() to 4K.

update install/upgrade scripts for alcf machines

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

The old scripts are in darshan-runtime/maint/. They are out of date in relation to how the code is organized now.

concurrent I/O from threads gets counted twice in timing

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

If two threads (in the same MPI process) access the same file concurrently, then the cumulative time counters are incremented too far.

We need to add a reference count to the run-time data structure to tell how many threads are accessing the same file at once. The time should not be incremented until the reference counter hits zero.

This does not require a log format change.

create demo test program of cuserid() problem on Beagle

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

The cuserid() call made by Darshan is causing a segfault on Beagle.
We need to reconfirm this using a standalone MPI program. We can also test it on the Hopper system at NERSC.

install darshan-job-summary.pl with make install target

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

The darshan-job-summary.pl is not currently installed by default. If it were, then we could place the associated perl modules etc. in a consistent location (the install prefix) and prevent problems that occur when the darshan-job-summary.pl is moved.

The script itself should be updated to print helpful messages if pdflatex is not found, gnuplot is not found, or gnuplot does not have pdf support.

Collisions on darshan generated file name

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Ray Loy reported this with cobalt-subrun but could also happen in a normal mpirun environment.

Jobs started with cobalt-subrun all have the same jobid and user and possibly the same start/end time to the second resolution. This will cause one darshan log to be written but other mpiruns will have errors in the standard error complaining that the darshan log could not be written.

use darshan-gen-* scripts at make install time by default

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

The current Darshan releases install hand-coded mpi scripts for the IBM BG/P. We should convert to using the darshan-gen-cc.pl, darshan-gen-cxx.pl, and darshan-gen-fortran.pl scripts instead.

We should also make sure to include the XL compiler wrappers on BG/P.

detect and reduce partially shared files

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

At runtime, Darshan detects files that are shared across MPI_COMM_WORLD and reduces them to a single record. It also finds the min, max, and variance of both time and bytes moved for each process.

We do not have the same functionality on partially shared files (ie, files in which only a subset of nodes open the same file). Detecting partially shared files and reducing them may be difficult at run time; we might want to consider building that functionality into darshan-parser.

darshan-job-summary.pl operation count bug

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

For large jobs, the darshan-job-summary.pl sometimes generates a faulty graph of operation counts in which almost all operations are zero.

support function wrapping via LD_PRELOAD

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Add the ability to intercept both posix and mpi-io calls via LD_PRELOAD as an alternative to link-time wrappers.

test darshan instrumentation for PGI Fortran and C++ compilers on Cray

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Things are working ok for the normal cc compiler, and similar changes should likewise work for Fortran and C++ in theory.

Add min/max fields for shared file time and bytes

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Add counters that show the fastest and slowest ranks for a given file as well as the standard deviation.

update darshan-gen-* scripts to use darshan-configure

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

I'm not certain about ticket yet; we need to re-evaluate after implementing darshan-configure. It may make the existing compiler wrappers simpler if they do something like:

ld darshan-configure --pre-ld-flags NORMAL_ARGUMENTS darshan-configure --post-ld-flags rather than hardcoding the full list in the generated script itself.

Darshan doesn't compile on Intrepid without --disable-ld-preload

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Even though we don't use LD_PRELOAD on Intrepid we should at least be able to compile it. Right now we get the following error:

/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/gnu-linux/lib/gcc/powerpc-bgp-linux/4.1.2/../../../../powerpc-bgp-linux/bin/ld: cannot find -lz

modularize log file format

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

In the future we need to track more high level library and data model information in Darshan. Right now, however, the log file format has to change every time we modify any counters and it gets larger with each interface we instrument.

It would be nice if the counters in darshan weren't just an array of integers, but instead had a concept of dividing the information into opaque sections for each type of instrumentation. That would allow us to experiment with new instrumentation (and parsers for that portion of the instrumentation) without breaking the entire file format. Old parsers could ignore any optional sections of the log file that it doesn't know how to parse.

darshan-job-summary.pl variance table un-renderable if log file contains too many records

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

darshan-job-summary.pl should do a better job of reporting errors. this log file (/intrepid-fs0/logs/darshan/2010/10/1/mmin_nekcem_id314635_10-1-13849_4.darshan.gz) has a lot of records. the perl script generates the variance table, but that table contains 16954 lines.

the perl script runs pdflatex (twice) but does not check for errors. Ok, so you call it with -halt-on-error but if there is an eror, the subsequent move of the output file fails, and fails in a very cryptic way.

suggested fix: either check the exit status of the final pdflatex or check for the existence of the summary.pdf file. in case of error, at least dump the latex.output2 file. since darshan deletes the temp dir, there's no record of what went wrong.

Darshan support for Cray

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Darshan can be made to work on Cray already by using the LD_PRELOAD method, but ideally we would support static linking as well via modifications the cc and ftn scripts.

support mpi 1.x

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Darshan does a few things that are particular to mpi 2.x. Notably the MPI_Type_get_envelope (and corresponding #defines for types that it produces) are only available in 2.x.

Note that on systems that have both 1.x and 2.x flavors of MPI installed, we will need to install multiple darshan libraries to support them.

valgrind error in log compression

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Need to see if there is an off-by-one error in darshan or if libz is doing something unusual:

==5852== Invalid read of size 4
==5852==    at 0x404A27D: crc32 (in /lib/libz.so.1.2.3.3)
==5852==    by 0x404C85A: ??? (in /lib/libz.so.1.2.3.3)
==5852==    by 0x404E1BA: ??? (in /lib/libz.so.1.2.3.3)
==5852==    by 0x404CB16: deflate (in /lib/libz.so.1.2.3.3)
==5852==    by 0x805BF26: cp_log_compress (darshan-mpi-io.c:1687)
==5852==    by 0x804C173: darshan_shutdown (darshan-mpi-io.c:429)
==5852==    by 0x804C4A6: MPI_Finalize (darshan-mpi-io.c:504)
==5852==    by 0x804B430: main (in /home/pcarns/working/darshan-examples/mpi-io-test)
==5852==  Address 0x55db614 is 964 bytes inside a block of size 967 alloc'd
==5852==    at 0x4024F20: malloc (vg_replace_malloc.c:236)
==5852==    by 0x805C321: darshan_get_exe_and_mounts (darshan-mpi-io.c:1818)
==5852==    by 0x804BC3C: darshan_shutdown (darshan-mpi-io.c:303)
==5852==    by 0x804C4A6: MPI_Finalize (darshan-mpi-io.c:504)
==5852==    by 0x804B430: main (in /home/pcarns/working/darshan-examples/mpi-io-test)
==5852== 
==5852== Invalid read of size 1
==5852==    at 0x4026979: memcpy (mc_replace_strmem.c:497)
==5852==    by 0x404C6C9: ??? (in /lib/libz.so.1.2.3.3)
==5852==    by 0x404E1BA: ??? (in /lib/libz.so.1.2.3.3)
==5852==    by 0x404CB16: deflate (in /lib/libz.so.1.2.3.3)
==5852==    by 0x805BF26: cp_log_compress (darshan-mpi-io.c:1687)
==5852==    by 0x804C173: darshan_shutdown (darshan-mpi-io.c:429)
==5852==    by 0x804C4A6: MPI_Finalize (darshan-mpi-io.c:504)
==5852==    by 0x804B430: main (in /home/pcarns/working/darshan-examples/mpi-io-test)
==5852==  Address 0x55db617 is 0 bytes after a block of size 967 alloc'd
==5852==    at 0x4024F20: malloc (vg_replace_malloc.c:236)
==5852==    by 0x805C321: darshan_get_exe_and_mounts (darshan-mpi-io.c:1818)
==5852==    by 0x804BC3C: darshan_shutdown (darshan-mpi-io.c:303)
==5852==    by 0x804C4A6: MPI_Finalize (darshan-mpi-io.c:504)
==5852==    by 0x804B430: main (in /home/pcarns/working/darshan-examples/mpi-io-test)

expanded instrumentation for a high level library

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Pick one of pnetcdf, hdf5, or damsel and add additional optional support in darshan. See #46.

track exact number of bytes moved via MPI

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Right now we only track the total number of bytes read and written at the posix level. This may be different than the number of bytes read and written at the MPI level due to various MPI-IO optimizations.

For files opened via MPI, we could get more accurate performance estimates by using the bytes transferred at the MPI level rather than at the POSIX level.

retain full file paths in darshan logs

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Right now darshan only records the last N characters (12?) of each file name. This was done mainly because we were overly conservative out of concern for memory overhead.

Modify Darshan to record complete paths, ether by expanding the name field to PATH_MAX or by malloc'ing on demand.

We also need to record CWD as well, so that in post processing we can make a good guess as to the full path even when the application opens relative paths.

realpath() and similar functions are not an option because they walk the path and stat each directory.

File ticket with CI (or NERSC) on cuserid() problem

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Depends on #70. Assuming we can reproduce the problem, we should open a ticket and see if we can get Cray support involved.

switch from stat() to fstat() when --enable-stat-at-open is set

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

fstat() has a lower overhead on some file systems due to the fact that it does not require walking the name space.

Handle jobs that run to Wall Time limit

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Some users choose to run jobs that will run to the wall time limit and allow the job to be killed by the scheduler. Since these jobs never call MPI_Finalize a darshan log is never produced. It would be useful if darshan logs could be captured for these type of users.

document how to modify the Cray compiler scripts for Darshan support

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

This will be fairly simple after #74 is complete, but we should add notes to the darshan-runtime documentation indicating how to add Darshan instrumentation to the Cray compiler scripts for statically linked executables.

The short story is that cc invokes linux-cc which ends with a link command. The additional arguments must be added to that link command.

There is probably a similar change to make for fortran and c++ compilation.

improve performance measurement methodology

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

Right now there is a fair amount of work involved in computing the time spent performing I/O on a given process, and certain use scenarios (ie, multithreaded concurrent I/O) can obfuscate the calculation.

We should add explicit support for measuring I/O time per process at least, and maybe also directly calculate a performance estimate at runtime.

mismatch in variance table

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

To reproduce using attached log file:

$ darshan-job-summary.pl darshan-summary-table-bug-example.darshan.gz --output darshan-summary-table-bug-example.pdf

$ darshan-parser darshan-summary-table-bug-example.darshan.gz |grep 392949181 |grep RANK_BYTES

-1	17827256362278409223	CP_FASTEST_RANK_BYTES	8388608	...392949181	/intrepid-fs0	gpfs
-1	17827256362278409223	CP_SLOWEST_RANK_BYTES	8388608	...392949181	/intrepid-fs0	gpfs
-1	17827256362278409223	CP_F_VARIANCE_RANK_BYTES	0.000000	...392949181	/intrepid-fs0	gpfs

If you compare that darshan-parser output to the 2nd entry in the variance table in the summary pdf, then you acn see that the number of bytes doesn't match up.

performance test overhead across platforms

In GitLab by @shanedsnyder on Sep 24, 2015, 16:25

Run some tests to give a current snapshot of the % overhead that darshan introduces on BG/P, Cray, and Linux clusters. We have access to at least one example of all three. The test plan:

use IOR with a relatively small access size (256K)
test both shared and unique files
do 5 samples per data point (to combat noise across runs)
aim for 60 second runs
weak scaling
compare with and without darshan, both in terms of run time and ior reported performance

pwrite/pread don't record offset used

In GitLab by @shanedsnyder on Sep 24, 2015, 16:24

THe pread and pwrite interfaces do not record the offset used in darshan. The question is should pwrite/pread change the max offset values recorded?

darshan-hpc / darshan Goto Github PK

darshan's Introduction

darshan's People

Contributors

Stargazers

Watchers

Forkers

darshan's Issues

Recommend Projects

Recommend Topics

Recommend Org