sstsimulator / sst-sqe Goto Github PK

View Code? Open in Web Editor NEW

4.0 16.0 13.0 18.66 MB

SST Software Quality, Testing and Engineering Resources

License: Other

C 5.01% Shell 78.36% Perl 1.03% CSS 1.30% HTML 7.88% Roff 5.89% Cuda 0.50% BitBake 0.03%

simulation testing sst discrete-event snl-build-tools

sst-sqe's People

Contributors

Stargazers

Watchers

Forkers

grantmackey-wdc grantmackey hughes-c shayan-taheri e10harvey abhi5658054 fryeguy52 gvoskuilen jpvandy allevin researcherben jwilso berquist

sst-sqe's Issues

openmpi is leaving around empty directories on /tmp

On sst=test, Gwen has less than 5000, jpvandy has more than 15,000 and jwilson has over 30,000.

(they are in subdirectories by user, so the number shown by "ls /tmp" is not large.)

The needed clean-up may well be other than in SQE

Merlin test Suite produces output on error that is abusively excessive.

On error, the Merlin test Suite produces tens of thousand of lines which simple hide the problem.

Ember Sweep time limit not handled correctly by SQE code.

Ubuntu 14.04 fails configure since the Jenkins and VM upgrades

I will attempt to attach the log file from the Ubuntu mainline run #296
Ubuntu.on.Jenkins.txt

DNS for sst-simulator.org needs to be redirected to sstsimulator.github.io

sstsimulator.github.io is the new website for sst simulator project. Currently the url sst-simulator.org is pointing to the old trac based wiki. The DNS for the url must be changed to point to the new website.

An error in the memHSieve test on MacOS sometimes takes out Jenkins Agent

The memHSieve test uses Ariel. On MacOS, currently there are problems with the deployment of Ariel. It appears that if the child that Ariel creates does not successfully start, Ariel clean-up via its emergency shutdown code issuing a kill(0, ...) that kills parents and children up (down) including the Jenkins agent that is the parent of the bash script. See the Ariel Issue.

memHSieve test fails with no useful output with mpirun

test_memHSieve
~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests ~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk
sed: can't read /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv: No such file or directory
3 69 637 /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv.gold
wc: /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv: No such file or directory
3 69 637 total
1,3d0
< ComponentName, StatisticName, StatisticSubId, StatisticType, SimTime, Rank, BinsMinValue.u64, BinsMaxValue.u64, BinWidth.u32, TotalNumBins.u32, Sum.u64, SumSQ.u64, NumActiveBins.u32, NumItemsCollected.u64, NumItemsBinned.u64, NumOutOfBounds-MinValue.u64, NumOutOfBounds-MaxValue.u64, Bin0:0-4095.u64, Bin1:4096-8191.u64, Bin2:8192-12287.u64, Bin3:12288-16383.u64, Bin4:16384-20479.u64, Bin5:20480-24575.u64
< sieve, histogram_reads, , Histogram, 0, 0, 24575, 4096, 6, 16395, 268468325, 2, 2, 2, 0, 0, 1, 0, 0, 0, 1, 0
< sieve, histogram_writes, , Histogram, 0, 0, 24575, 4096, 6, 12267, 83534309, 1, 2, 2, 0, 0, 0, 2, 0, 0, 0, 0
shunit.ASSERT: Reference does not Match Output

Simulation is complete, simulated time: 5.5 ns

~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk
TESTSUITE testSuite_memHSieve.sh: Total Suite Wall Clock Time 1 seconds

(This was from SST__zzQandD_sst-test #5, run from branch, useMpirRun)

SQE testing time-limit processor only looks for a single SST invocation.

The DetailedNetwork test is an example of a test that does multiple sst invocations and can get it in a tight loop.

All Nighty tests failing from time out on git clone of Elements.

This showed up Thursday the 4th or Friday the 5th.

This has been addressed by changing the timeout from 90 seconds to 250 seconds.

The next level of relief will be to have the Jenkins runs do "shallow" clones, which will cut the time from two plus minutes to on the order of 17 seconds.

@nmhamster

The Ariel test Suite fails except on sst-test, because of faulty testing input.

The Sandy-bridge and Ivy-bridge tests were created using an existing static binary test object. That binary was created in 2014 (or before) on sst-test. It was built using pthreads download from m5threads. The binary object is still processed by Ariel seemingly happily on sst-test. The Ariel processing of it fails on Ubuntu 14.04. (The four other in the Suite, which are believed to pass, use a locally built stream object.)
@gvoskuilen: @nmhamster

Multi-file CramSim test data for 11 file download can be deleted when 7.0 is released.

CramSim does eleven checkouts, one per test. On December 14th, the Auto tester failed on the eighth (and all following) downloads. The mainline Nightly had also had a CramSim checkout failure. It failed on the first and then all it tried.
We need to get rid of the 11 file checkout. They are only about 6 kilobytes each before decompression (zip). Should tar them up and do a single checkout. This is an SQE only task—plus of course a repository change.

(I also want to add a stderr to stdout redirect in the CramSim Suite.)

Valgrind MemHSieve test hung

the test was manually killed after 12 hours. The test normally takes approximately 135 seconds.

The test, memHSieve is known to get occasional time limits. This case is an unacceptable failure of the time limit enforcer.
Here's the log file:
Capture-time-limit-May8.txt

Significant Slow down on El Capitan Xcode-8 from rogue tasks

Many "stream" tasks were being left behind following Ariel test time outs. It was assumed that these running tasks were causing the slow down. Restarting the VM cleared the problem, but it returned. The Ariel tests continued to time out, even if the stream task were immediately removed. It was then discovered that there had been two sstsim.x tasks running since November 21st. Removing those tasks allowed Ariel tests to run without time limits.

The issue remains. Where did the rouge tasks come from? What events cause them to be left behind? Is the problem related to Xcode-8, or is it just the location where the problem was encounter?

SimpleComponent test Failed on El Capitan with number of Ranks = 2

El Capitan nranks=2 SimpleComponent test

Failure that I have never noticed before on this test on any platform. Appears that the output from the two threads got intermixed.

There are a five test Suites that have code to manage this.

In this case, line-count/word-count/byte-count are identical. Two lines have shuffled output.

Ref file, last two lines (100,101);
Component Finished.
Simulation is complete, simulated time: 25 us
Output file of failing test – line 85:
Simulation is complete, Component Finished.
And line 101:
simulated time: 25 us

Auto tester does NOT fully detect Multi Thread errors.

Auto tester needs to add a (or some) multi-thread test(s) to it's list.

Currently the nightly runs are detecting one test failure on all multi-thread projects.

There are also two test failures on Multi Rank Projects, but this is not a concern because multi-rank is not included in the 6.0 release.

The trace-back code after a time limit is not support on some builders

See Elements issue #655.
The trace-back script explicitly requires Python-2.7, which is not the version included in the release on COE-RHEL-6.
@bjmoor
@jwilso
The trace-back invocation is still not handled correctly on the timeLimitEnforcer on El Capitan. (And that's my problem to deal with.)

Project names need to be changed to make them all consistant

The Jenkins Project names need to be consistent.
Want to delete "COE" from names.
Names like MultiThread... need to include OS, mpi and boost version.

The 2x2 test run on Ubuntu 16.04 is a mess of time-limits

Needs to be characterized and understood.

Need better detection of test Suites failures.

This is about a failure in the test Suite, not a failure in an individual test. The output for Jenkins is not generated until an individual test actually runs.

If a test Suite fails in an unanticipated manner, Jenkins likely doesn't detect it and so it's not flagged in the Summary records. It simply vanishes from the list of executed tests.

(When testing our test process on Out-of-Source build and denying write permission to the source, at least, 5 test Suites failed and silently vanished from the list.)

The deadlock determing code, from Branden is not supported on El Capitan and most VMs

On El Capitan, it is not being called with the pid of SST

On Linux, Red Hat 6, etal, The VM does not have Python 2.7, which is required by the script.

Trac: #275 Upon successful nightly build & test, publish make dist tarball and svn rev num

This ticket collects two related enhancements. Developers are often unsure when to do an svn update in their sandbox. They don't really know if the head of the trunk produced a good build or not. We need to publish in some very convenient place the svn revision number that was used for the last successfull build and test of the trunk during the nightly process.

In a similar vein, some non-developer users may need a version of the source code from the trunk in order to obtain a new feature or bug fix. They too are at risk if they simply check out the trunk head. Instead, provide a tarball of the last successful build and test of the trunk during the nightly process.

Looping can result fromTest Suite using nested python files, without return code processing

SST AutoTester Needs to Work on SST/Macro repo for Merging from PR to Devel

Want to have SST autotester work on the SST/Macro repo for merging PR into devel branch.

Email list servers called in website (sstsimulator.github.io) support page not accessible

On the sstsimulator.github.io website, the support page http://sstsimulator.github.io/SSTPages/SSTMainSupport/ calls out a number of email lists that the user can subscribe to. However the links are not accessible.

Here is one of the example links
https://sst-simulator.org/mailman/listinfo/sst-users

This is most likely a fallout of the direct DNS forward of sst-simulator.org to sstsimulator.github.io. If the DNS is changed from a direct forward to a CNAME record change, this problem most likely will be corrected.

Testing harness inserts duplicate thread count parameters in tests.

This appear to be totally benign, but should not be the way it's done. Is potentially confusing and a tiny change in process could alter the fact the the thread count used is the last onethat the harness inserts.

The process is not clean and continuing growth of options and needs increase the probability of future failure including that is not immediately recognized as wrong.

Time Limit processer on El Capitan failed to terminate memHSieve test.

Time limit processing is different between Linux and El Capitan.

Time Limit processor left job running (until manually killed) with a memHSieve test on 6.1 release branch on El Capitan.

TL Enforcer: TIME LIMIT test_memHSieve
TL Enforcer: test has exceed alloted time of 50 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_40708_41063

I am 41063, I was called from 40708, my parent PID is 41016
UID PID PPID C STIME TTY TIME CMD
502 40708 51283 0 2:40AM ?? 0:00.01 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 41016 40708 0 2:40AM ?? 0:00.02 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76

502 41072 41068 0 2:40AM ?? 0:00.03 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/intel64/bin/pinbin -p32 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/ia32/bin/pinbin -sigchld_handler 1 -follow_execv -t /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-elements/libexec/fesimple.dylib -p /sst_shmem_41068-0-16807 -v 1 -t 0 -i 1000000000 -c 16 -s 1 -m 1 -k 1 -d 0 -- ./ompsievetest
502 41187 41072 0 2:41AM ?? 0:11.06 ./ompsievetest
502 41301 41259 0 2:41AM ?? 0:00.03 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/intel64/bin/pinbin -p32 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/ia32/bin/pinbin -sigchld_handler 1 -follow_execv -t /Users/sstbuild/jenkins/workspace/SST__Nightly_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/221_221/local/sst-elements/libexec/fesimple.dylib -p /sst_shmem_41259-0-16807 -v 1 -t 0 -i 1000000000 -c 16 -s 1 -m 1 -k 1 -d 0 -- ./ompsievetest
502 41304 41301 0 2:41AM ?? 0:11.15 ./ompsievetest
502 41413 41063 0 2:41AM ?? 0:00.00 grep ompsievetest
this might better go in the Suite
Identified routine: gettimeofday, replacing with Ariel equivalent...
Replacement complete.
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
4 86 1127 omps_list
OMP_PID = 41187
41304
/Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/test/utilities/timeLimitEnforcer.sh: line 55: [: 41187: binary operator expected
Sat Jan 14 02:41:49 MST 2017

I am 41063, I was called from 41016, my parent PID is 41016
No corresponding child named "sst"

Look for a child named "python"
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
No corresponding child named "python"

Look for a child named "valgrind"
No corresponding child named "valgrind"

Sibling is 41067
Kill pid is 41068
502 41068 41067 0 2:40AM ?? 0:46.49 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-core/bin/sst -n 1 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/sst-elements/src/sst/elements/memHierarchy/Sieve/tests/sieve-test.py
502 41068 41067 0 2:40AM ?? 0:46.49 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-core/bin/sst -n 1 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/sst-elements/src/sst/elements/memHierarchy/Sieve/tests/sieve-test.py
It's still there! (41068)
Try a "kill -9"
502 41068 41067 0 2:40AM ?? 0:00.00 (sstsim.x)
Build was aborted
Aborted by Vandyke, John P

Need an automated Valgrind check against as many different sst tests as reasonable.

Si has previously requested automated Valgrind testing.
The number of "one-time-only" failures in the Nightly testing being seen means that ways to pursue validation is need as a way to attack tests failures that a simple rerun does not reproduce.

timeLimitEnforcer has an infinite loop edge case.

Move SQE reference files to Elements

Currently most reference files for Elements are located in the sst-sqe repo. This introduces a synchronization issue for the Auto-Tester as necessary changes to an element in repo sst-elements that changes its output will fail tests until the reference file in sst-sqe is changed.

Suggest that we move the reference files under the test directory for each specific element. Then (usually) only changes to the sst-element repo will be required.

Not testing with --disable-mpi

None of our test builds use '--disable-mpi'. That build mechanism should probably be tested.

Nightly build on Ubuntu 16.04 fails due to module initialization issue.

Latest Ubuntu update seem to have altered something with non-login module initialization.

The Jenkins runs are no longer loading modules. Build failure come from needing Boost-1.56

Red Hat builder is getting access errors on /dev/hfi1_0

Many test failures on the COE_RHEL builder

sst-coerhel7.sandia.gov.107064hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out.

The above line was appearing in the log of the failed tests. The time out was not the failure that the test Suite was aborting. The line was an extra line in the output file, which caused the Suite to abort the test because out did not match the gold file

@jwilso

The memHSieve testSuite test for Ariel is not valid

The file, libariel.so, can be built and exist without actually working with PIN.

"memHSieve requires Ariel", but existence of libariel.so is not the appropriate test.

Consequently, there are currently MacOS and Ubuntu failures of the memHS test.

As a stopgap measure, the memHS test Suite is going to be modified to only run on sst-test

Ember Sweep time limit hang

Time limit on Ember Sweep test 80 on El Capitan, number of ranks=2, run #445.
This resulted in a hang, which I believe is a SQE problem, not an Ember problem.
"""
test_EmberSweep_080_a23d85edc49a8319faea14f6104ed70b nr=2
80 run, 0 have failed
torus --shape=2 PingPong iterations=1 messageSize=0
~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/emberSweep_folder ~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk
Simulation is complete, simulated time: 4.12 us

Test Passed
80: Wall Clock Time 1 sec. torus --shape=2 PingPong iterations=1 messageSize=0

~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk

TL Enforcer: TIME LIMIT test_EmberSweep_080_a23d85edc49a8319faea14f6104ed70b
TL Enforcer: test has exceed alloted time of 900 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_10676_28931

I am 28931, I was called from 10676, my parent PID is 10709
UID PID PPID C STIME TTY TIME CMD
502 10676 73706 0 4:29AM ?? 0:00.02 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 10709 10676 0 4:29AM ?? 0:01.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh

502 54817 28931 0 4:54AM ?? 0:00.00 grep ompsievetest
this might better go in the Suite
0 0 0 /tmp/28931_omps_list
OMP_PID =
Thu May 18 04:54:31 MDT 2017

###############################################################
JOHNS sanity check
502 54769 53884 0 4:54AM ?? 0:00.04 mpirun -np 2 -map-by numa:pe=2 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54770 54769 0 4:54AM ?? 0:00.56 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54771 54769 0 4:54AM ?? 0:00.55 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
----------- first
502 54770 54769 0 4:54AM ?? 0:00.59 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
----------- all
502 54770 54769 0 4:54AM ?? 0:00.63 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54771 54769 0 4:54AM ?? 0:00.62 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
54770
the pid of an sst is 54770
the pid of the mpirun is 54769
Check for Dead Lock
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[0:0] SST Core: Signal handers will be registed for USR1, USR2, INT and TERM...
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[0:0] SST Core: Signal handler registration is completed
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[1:0] SST Core: Signal handers will be registed for USR1, USR2, INT and TERM...
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[1:0] SST Core: Signal handler registration is completed
###############################################################
I am 28931, I was called from 10709, my parent PID is 10709
No corresponding child named "sst"

Look for a child named "python"
No corresponding child named "python"

Look for a child named "valgrind"
No corresponding child named "valgrind"

Sibling is 28944
I am 28931, I was called from 28944, my parent PID is 10709
No corresponding child named "sst"

Look for a child named "python"
No corresponding child named "python"

Look for a child named "valgrind"
No corresponding child named "valgrind"

Kill pid is

ps: option requires an argument -- p
usage: ps [-AaCcEefhjlMmrSTvwXx] [-O fmt | -o fmt] [-G gid[,gid...]]
[-g grp[,grp...]] [-u [uid,uid...]]
[-p pid[,pid...]] [-t tty[,tty...]] [-U user[,user...]]
ps [-L]
Invoke the traceback routine
$SST_ROOT/test/utilities/stackback.py

Must specify PIDs of SST, or specify --all to find SST PIDs
Use --help for more information

Thu May 18 04:54:41 MDT 2017
Return to timeLimitEnforcer

Can not find process to terminate, I am 28931, my parent was 10709
UID PID PPID C STIME TTY 502 10676 73706 0 4:29AM ?? 502 10709 10676 0 4:29AM ?? 502 12818 20000 0 4:10AM ?? 502 12824 12818 0 4:10AM ?? 502 14313 1 0 4:31AM ?? 502 19951 1 0 Tue10AM ?? 502 20000 1 0 Tue10AM ?? 502 28931 10709 0 4:39AM ?? 502 28944 10709 0 4:39AM ?? 502 28945 28944 0 4:39AM ?? 502 28946 28944 0 4:39AM ?? 0 28947 28945 0 4:39AM ?? 502 29431 73574 0 4:39AM ?? 502 29458 29431 0 4:39AM ?? 502 53498 1 0 2:04AM ?? 502 53499 1 0 2:04AM ?? 502 53500 1 0 2:04AM ?? 502 53501 1 0 2:04AM ?? 502 53504 1 0 2:04AM ?? 502 53820 12824 0 4:54AM ?? 502 53884 53820 0 4:54AM ?? 502 54896 29458 0 4:54AM ?? 502 54899 29458 0 4:54AM ?? 502 54900 54896 0 4:54AM ?? 502 55035 53884 0 4:54AM ?? 502 55037 55035 0 4:54AM ?? 502 55040 53884 0 4:54AM ?? 502 55041 55040 0 4:54AM ?? 502 55042 55040 0 4:54AM ?? 0 55126 28931 0 4:54AM ?? 502 60675 1 0 Tue11AM ?? 502 62995 1 0 Tue11AM ?? 502 72773 1 0 Tue11AM ?? 502 73568 20000 0 4:05AM ?? 502 73574 73568 0 4:05AM ?? 502 73700 20000 0 4:05AM ?? 502 73706 73700 0 4:05AM ?? 502 98283 1 0 4:24AM ?? 502 98648 1 0 4:24AM ?? pid =
TIME CMD
0:00.02 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:01.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson2016493627556830810.sh
0:00.31 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
0:01.14 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
0:03.43 /usr/sbin/cfprefsd agent
4:49.19 /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -jar /Users/sstbuild/jenkins/slave.jar -jnlpUrl https://jenkins-srn.sandia.gov:8443/computer/SST%20Mac%20OSX%2010.11%20Xcode%207%20(2)/slave-agent.jnlp -secret 17e052036837b2f06631093b7f59eddc3a336352a67333231e5e10768feceaa2
0:00.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:00.00 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:00.00 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:00.00 awk { if ( $3 == ENVIRON["__timerChild"] ) print $2 }
0:00.00 ps -ef -u sstbuild
0:00.03 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:01.49 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:47.20 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
0:45.98 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
0:41.20 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
0:46.84 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
0:40.47 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
0:00.02 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
0:00.07 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
0:00.01 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
0:11.17 sst -n 2 --model-options=--topo=fattree --shape=9,9:9,9:18 --cmdLine="Init" --cmdLine="AllPingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
0:00.00 sleep 900
0:00.01 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
0:00.00 sleep 200
0:00.05 mpirun -np 2 -map-by numa:pe=2 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
0:01.96 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
0:01.96 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
0:00.00 ps -f -U sstbuild
0:00.13 /usr/sbin/distnoted agent
0:00.22 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdflagwriter
0:00.23 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker-sizing -c MDSSizingWorker -m com.apple.mdworker.sizing
0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson6215251261761081075.sh
0:00.16 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson6861595538799566695.sh
0:00.22 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
0:01.11 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
0:02.03 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared

            EXIT without killing my parents

Build was aborted
Aborted by Vandyke, John P
"""

Four test Suites missing from weekly Valgrind test

While examining a Messier test issue, it was noticed the Messier was not in the list for the weekly Valgrind test. More careful examination revels that Samba, memHierarchy-A and simpleCarWash Suites were also missing.

The developer resolved the Messier issue, from the Valgrind output.

memHA and simple Car Wash appear to be clean with respect to Valgrind.

There is a recurrent testing problem on MacOS with left behind excuting stream binaries

These executing tasks can soak up cpu cycles such that tests take a factor of ten or more times as much to complete. This is primarily observed on El Capitan, but that is our principal MacOS test environment. This task are created by (or for) Ariel, but test Suites other the "Ariel" Suite use Ariel. (Cannot reproduce the problem on Linux.)

Testing needs generic solution to case with a file per rank.

So far for tests that fail, because one file is expected and Multi-Rank spreads the output over a file per rank, individual Suites have had specific solutions inserted. Until memHSieve, these were all on stdout.

A subroutine, or Macro needs to be created to handle these cases, instead of spreading unique point solutions over the testSuites.

Linux utilitiy, patch, is not available on COE RHEL 7 VM

Building of qsim external is failing on command not found

Auto tester does not detect errors unique to out-of-source build

Testing this may be more expensive, than it merits. This would likely require an addition test Project because the variable is the actual build.

(A change that only broke out-of-source build did get auto-merged this month.)

The trace back script, stackback.py,T fails on COE_RHEL-6 for lack of Python-2.7

May 2nd During EmberSweep #58:
'''
test_EmberSweep_058_238ff71fbe090af2a8d8365cdfe9015f nt=1
58 run, 0 have failed
torus --shape=8x8x8 PingPong iterations=10 messageSize=10000
~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/emberSweep_folder ~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk

TL Enforcer: TIME LIMIT test_EmberSweep_058_238ff71fbe090af2a8d8365cdfe9015f
TL Enforcer: test has exceed alloted time of 900 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_7637_10133

I am 10133, I was called from 7637, my parent PID is 7670
UID PID PPID C STIME TTY TIME CMD
sstbuild 7637 98256 0 05:07 ? 00:00:00 /bin/bash /home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
sstbuild 7670 7637 0 05:07 ? 00:00:00 /bin/bash /home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/testSuite_EmberSweep.sh

sstbuild 10255 10133 0 05:41 ? 00:00:00 grep ompsievetest
this might better go in the Suite
0 0 0 /tmp/10133_omps_list
OMP_PID =
Tue May 2 05:41:52 MDT 2017

Kill pid is 10136

      Invoke the traceback routine

$SST_ROOT/test/utilities/stackback.py 10136

/usr/bin/env: python2.7: No such file or directory

Tue May 2 05:41:52 MDT 2017
Return to timeLimitEnforcer

sstbuild 10136 7670 0 05:26 ? 00:00:01 sst -n 1 --model-options=--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
sstbuild 10136 7670 0 05:26 ? 00:00:01 sst -n 1 --model-options=--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
It's still there! (10136)
Try a "kill -9"
/home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testTempFiles/bashIN: line 343: 10136 Killed sst -n 1 --model-options="--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini"" emberLoad.py > tmp_file
Time Limit detected at 900 seconds
shunit.ASSERT: Time Limit detected at 900 seconds
~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk
-------
'''

running "mpirun -np 2", two miranda tests time limit

Have not yet tried a major increase in time limit to see if they eventually finish.

miranda_singlestream went time limit at 200 seconds. (without multithread - 6 sec/ n=2 - 43 sec)

miranda_randomgen went time limit at nearly an hour, 3500 seconds. (w/o mt 130 sec/ n=2 - 495)

Static Build have not been updated since the core/element split

Is static build supposed to be resurrected or deleted as a possibility?

New failure mode for scheduler detailed Network test on Multi Rank

The Scheduler Detailed Network test was failing consistently on Multi-Rank tests, the same way as the test fails on COE-RHEL-7 and Ubuntu-16.04. See Issue 108. On June 9, the failure mode changed on Multi-Rank. This should be examined after Issue 108 in Ember is fixed and when Multi-Rank is fully supported.

SST_BEGIN_NEW_SUITE testSuite_scheduler_DetailedNetwork.sh
/home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/sst-elements/src

Started at Thu Jul 14 05:27:53 MDT 2016
Be Patient. This test runs over 4 minutes on sst-test
IGNORE the word "FATAL" in output messages
-------
test_scheduler_DetailedNetwork
Setting Multi rank count to 2
init_cmd = "sst ./%s" %(options.schedPythonFile)
execcommand = "mpirun -np 2 sst "
execcommand = "mpirun -np 2 sst --stop-at " + StopAtTime
FATAL: Failed to start job #1 at guaranteed time
Time: 255740 Guarantee: 110000
46 259 1704 /home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/test/testReferenceFiles/test_scheduler_DetailedNetwork.out
21 83 525 /home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/test/testOutputs/test_DetailedNetwork.out
67 342 2229 total
Test 6 FAILED

Daily Report Summary has an Name Error on Mac Builds

I am seeing:

ERROR : SST__OutOfSource_MacOSX_10.10.5_OMPI_1.8_Boost_1.56_clang_700.1.81_mainline

This seems to be a name mis-match in configuration for email.

Detailed Network upgrade has been forwarded. Needs to be included in testing.

This probably has a "chicken and egg" issue between Elements and SQE

Nightly Status Email Bug with Failed Builds

In the email from 25th Feb 2016, the Ubuntu build failed but the marking is TESTFAIL when it should be FAILED.

TESTFAIL : SST__COE_RHEL_7_OMPI-1.10_Boost-1.56_mainline

PASSED : SST__memH_using_Ariel

TESTFAIL : SST__Ubuntu_16.04B_OMPI_1.6.5_Boost_1.56_mainline

TESTFAIL : SST__x-Multithread-MacOS-n2

TESTFAIL : SST__x-Multithread-Ubuntu-n2

But further below:

FAILED : Job SST__Ubuntu_16.04B_OMPI_1.6.5_Boost_1.56_mainline: Build # 36
Job - Is Running: False
Job - Is Good: False
Job - Status: FAILURE
Job - Duration: 0:28:05.475000

in Bamboo.sh, failure to find the command, module, does not return error flag.

An error message is written to stderr, but script execution continues. ModuleEx detects when "module" does not find the desired specific module, but if the command module is not found or execute correctly no error code is returned.

Sample log output:
OpenMPI (openmpi-1.8) selected
/home/jwilso/jenkins/workspace/memH_without_openMP/7_7/devel/trunk/test/utilities/moduleex.sh: line 26: module: command not found
/home/jwilso/jenkins/workspace/memH_without_openMP/7_7/devel/trunk/test/utilities/moduleex.sh: line 26: module: command not found
bamboo.sh: Boost 1.56 selected

Trac: #202 Provide nightly tarballs

We should provide, on our download site, a nightly tarball of the current state of SSTs trunk repository - the result of running make dist. #152 could then download and use that tarball as the basis for running tests.

Ubuntu 16.04 not connected to Jenkins server

Started by user Vandyke, John P
[EnvInject] - Loading node environment variables.
Building remotely on SST Ubuntu 16.04 (ubuntu16.04 sstresource) in workspace /home/sstbuild/jenkins/workspace/SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline
FATAL: java.io.EOFException
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:303)
at hudson.remoting.Channel.terminate(Channel.java:847)
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:614)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at ......remote call to SST Ubuntu 16.04(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
at hudson.remoting.Request.call(Request.java:172)
at hudson.remoting.Channel.call(Channel.java:780)
at hudson.FilePath.act(FilePath.java:979)
at hudson.FilePath.act(FilePath.java:968)
at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:131)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:741)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:733)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1080)
at hudson.scm.SCM.checkout(SCM.java:485)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1269)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1738)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
Caused by: java.io.EOFException
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:614)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
ERROR: Step ‘E-mail Notification’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
ERROR: Step ‘Delete workspace when build is done’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
Finished: FAILURE

Test Suites should detect and report data down-load failures.

Presently data down load failures in a test Suite results in test failure, but doesn't immediately report the cause. Examples: January 12, on Ubuntu-14.04 both Sirius Zodiac (run #31) and multiple CramSim tests. CramSim data download has an additional problem described in SQE issue #489

sstsimulator / sst-sqe Goto Github PK

sst-sqe's People

Contributors

Stargazers

Watchers

Forkers

sst-sqe's Issues

Recommend Projects

Recommend Topics

Recommend Org