The collective_profiler from gvallee

Call selection for comparison

Enable the capability to select a call and compare others to it. For instance, be able to select 1->N individual calls and highlight the calls that are different in nature. At first, do not focus on actual values, just patterns.

Write a test/validation script

We now have a good set of tests and we know exactly what output is expected, we should therefore have a tool that checks that everything is correct while executing these tests with a pre-defined number of ranks.

Makefile: skip Go tools when go is not available on the platform

getcalldata - add data sizes

Data size: add the total, avg, min, max. Right now we only have counts

Check for consistency when parsing counter files

Check that the number of calls in the file equate to the number of calls specify in the header

Write tool to analyse late arrival and alltoallv execution time

We need timing data for late arrival and time spent in alltoallv calls. When a rank is late, extract counters. Requires #1

Extract alltoallv data specifically for validation

For random ranks/calls, extract counters and later check whether the counter files provide the exact same data.

Requires #1

Create a rank file with profiling data

We need to track which rank runs where. I personally believe it would be beneficial to track PIDs to correlate processes to rank across different communicators.

Add pattern-level bin capability

Right now we save the bins' data separately because we do not have a good way at the moment to mix bins and patterns (bins are specific to a count file, not a call; we could change that but it would take time).
We could then update the analysis of sub-communicators' results to precisely detail bins for given patterns.

Add sorted list of late arrivals

User process for using the post processing tools

The steps to be taken by the user in processing the captured data is not documented. There is a rendered graph (.png) in the doc directory showing the data sets and programs and arrows showing data sets are inputs and outputs of each program. However the middle row has some files which are supposedly the output of two different programs. If that it is indeed correct the user will be puzzled how can that be so and will ask which program do they have to run first to ensure that the correct final output is generated.

Save data in the context of the .so destructor

Since we cannot make any assumption that MPI_Finalize() is called by the application, it might be better to rely on the shared library destructor:

void __attribute__((destructor)) exit_handler();
void exit_handler()
{
	log_data(...);
}

Profiling tool - step 5 is long running

This step (step 5) generates 1 graph per call that was captured. So this can be O(10,000) graphs and takes hours to generate. This should therefore be parallelised or consideration should be given as to whether all the graphs need to be precalculated; they could instead be gnerated on the fly when a user wishes to view one. (A detail is that the generation may be in two steps - is there a data calculation step followed by a render step - I do not know which takes the time.)

Bins seems to always be empty

All bin files seems to contain only 0

create a validate test
fix problems

Write tool to extract rank/call specific counters

Write a tool that extracts from the counter file(s) the counters for a specific rank and alltoallv call.
This will be used for validation

Check that the format of the data is compatible with post-mortem analysis tool

Requires #70

Once the format version is added to the meta-data, we can check if the current version of the post-mortem tool can handle it.

add the individual call's graph

Write test that handle many send counts vectors

The code to handle counts and track counts for many calls seem to create problem. I need a good test to track this down. I label it as bug because it is need to track a bug.

'profile' fails when executed against 'alltoallv_f'

$ ../tools/cmd/profile/profile -dir .
* Step 1/5: analyzing counts...
Reading count files: 2/2
Analyzing alltoallv calls: 2/2
Bin creation: 2/2
Step completed in 3.433329ms

* Step 2/5: analyzing MPI communicator data...
Step completed in 110.164µs

* Step 3/5: create maps...
Gathering map data: 2/2
Step completed in 1.439437ms

* Step 4/5: analyzing timing files...
Step completed in 308.865µs

* Step 5/5: generating plots...
Plotting data for alltoallv calls: 1/1
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/gvallee/alltoallv_profiling/tools/internal/pkg/plot.write(0xc00000e158, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x529167, ...)
	/home/gvallee/src/alltoall_profiling/tools/internal/pkg/plot/plot.go:350 +0x11a6
github.com/gvallee/alltoallv_profiling/tools/internal/pkg/plot.generateCallPlotScript(0x7fff1b387650, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/gvallee/src/alltoall_profiling/tools/internal/pkg/plot/plot.go:388 +0x4aa
github.com/gvallee/alltoallv_profiling/tools/internal/pkg/plot.generateCallDataFiles(0x7fff1b387650, 0x1, 0x7fff1b387650, 0x1, 0x0, 0x0, 0xc000010c30, 0xc000010d50, 0xc000010e40, 0xc0000112c0, ...)
	/home/gvallee/src/alltoall_profiling/tools/internal/pkg/plot/plot.go:311 +0x566
github.com/gvallee/alltoallv_profiling/tools/internal/pkg/plot.CallsData(0x7fff1b387650, 0x1, 0x7fff1b387650, 0x1, 0x0, 0x0, 0xc000010c30, 0xc000010d50, 0xc000010e40, 0xc0000112c0, ...)
	/home/gvallee/src/alltoall_profiling/tools/internal/pkg/plot/plot.go:444 +0xb4
main.plotCallsData(0x7fff1b387650, 0x1, 0xc000046810, 0x1, 0x1, 0xc000010ae0, 0xc000010ab0, 0xc000011230, 0xc000011260, 0x0, ...)
	/home/gvallee/src/alltoall_profiling/tools/cmd/profile/profile.go:37 +0x2a2
main.main()
	/home/gvallee/src/alltoall_profiling/tools/cmd/profile/profile.go:134 +0x13b9

Write a parser for receive counts

We only have a parser for send counts and we need the same type of parser for the receive counts to have a fully featured validation tool. Required by #5

Compression the send/recv arrays for a specific calls

Right now, we can compare whether two send/recv counts for two calls are the same and if so we track what calls are associated to the counts instead of duplicating them. The same should be done within a call: when two ranks have the same counts, save which ranks have that specific count instead of duplicating.

'profile' fails if timing files do not exist

Profiling tool - step 5 - clarity of text output

the prgress message during step 5 reads “* Step 5/5: generating plots... Plotting data for alltoallv calls: 2110/1” but it is not clear what "/1" means. (The 2110 is the number of the captured call being rendered and is updated for as each one is rendered).

Find a way to get all the data when MPI_Finalize is never called

I am dealing with an app that does not call MPI_Finalize(). As a result, the profiling data is never written to the profile files that therefore end up being empty. We need a way to force the dump of all the data during a specific alltoallv call and document how users can check whether MPI_Finalize() is actually called and if not, find the number of alltoallv calls and set the library to dump the data at the end of the last alltoallv call.

Validation tool: run OSU alltoallv test

The Alltoallv test from the OSU benchmarks are a good set up from the examples. So we should also use it for validation.
This is the high-level issue to do such integration, sub-issues may come later when work will start.

Implement a clock synchronization capability

Detect when an alltoallv call is in fact on-node data exchange

Expend scaling to set scale upfront

We need to support the following workflow:

Calculate the scaled amount of data send and received.
For there, figure out the scale of the bandwidth
Update the bandwidth data with the scale

In other terms, we need to be able to set the scale and update the data, which the scale package does not currently support.

Detect when an alltoallv call only exchange data for a single rank

All send and receive count are equal to zero except for one rank, itself.

Write tool that allows me to merge traces

I now rely on the capability to split the tracing into chunks: profile call 0-999, then 1000-1999 and so on. So now i need a tool that would merge all these traces back together. Outside of the trace size, it should be easy: at first, use the first file (0-999) as is and add the other files one by one. When parsing a file, extract the counters and do a string comparison. If it already exists, increment the call counter and add the call ID to the list of calls associated to the pattern.

Add the format version to the metadata of the files generated by the profiler

A typical issue when looking at data after the fact is to know what format was used. Without it, it is difficult to know what version of the tool is required to do post-mortem analysis. We already have a file to track the version of the data format but we do not use it when generating data. We need to include it to the generated file. It would be okay to just include it in the file name for now.

Change all sprintf calls to snprintf

Switch all sprintf calls to snprintf and handle potential errors and truncated results

add tab with averages graph

Find a way to automatically detect m->1 patterns

extend validation test with post-mortem analysis

Right now the validation tests check that we can generate the profiles. We need to extend it to check if we can do post-mortem analysis and that we get the expected output.

Save patterns in a separate file.

Write a validation tool

Write a tool that takes data from #1 and #4 and compare it for validation

Enable loading maps from files

Maps are saved to files but we cannot load them from files, resulting in a lot of computation when we bring the WebUI up.

Patterns classification

Find a way to group patterns in 3 different groups:

1->N
N->1
N->M

Profiling of alltoallv calls on communicators other than COMM_WORLD not working properly

The logging of data is based on the assumption that rank 0 on COMM_WORLD has all the data. We cannot make that assumption or we may miss all the calls where work rank 0 is not rank 0 of the communicator used for the alltoallv calls.

Update getbins to deal with sub-communicators

Right now, the assumption here is that we do not deal with alltoallv calls on subcommunicators, otherwise we would need to deal with jobid and rank when calling SaveBins(). For now we set them to -1. Once it is fixed, we can simply the code of SaveBins by removing the code specific to the case where the jobid and rank are not provided

gvallee / collective_profiler Goto Github PK

collective_profiler's People

Contributors

Stargazers

Watchers

Forkers

collective_profiler's Issues

Recommend Projects

Recommend Topics

Recommend Org