sstsimulator / sst-sqe Goto Github PK
View Code? Open in Web Editor NEWSST Software Quality, Testing and Engineering Resources
License: Other
SST Software Quality, Testing and Engineering Resources
License: Other
On sst=test, Gwen has less than 5000, jpvandy has more than 15,000 and jwilson has over 30,000.
(they are in subdirectories by user, so the number shown by "ls /tmp" is not large.)
The needed clean-up may well be other than in SQE
On error, the Merlin test Suite produces tens of thousand of lines which simple hide the problem.
I will attempt to attach the log file from the Ubuntu mainline run #296
Ubuntu.on.Jenkins.txt
sstsimulator.github.io is the new website for sst simulator project. Currently the url sst-simulator.org is pointing to the old trac based wiki. The DNS for the url must be changed to point to the new website.
The memHSieve test uses Ariel. On MacOS, currently there are problems with the deployment of Ariel. It appears that if the child that Ariel creates does not successfully start, Ariel clean-up via its emergency shutdown code issuing a kill(0, ...) that kills parents and children up (down) including the Jenkins agent that is the parent of the bash script. See the Ariel Issue.
test_memHSieve
~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests ~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk
sed: can't read /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv: No such file or directory
3 69 637 /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv.gold
wc: /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv: No such file or directory
3 69 637 total
1,3d0
< ComponentName, StatisticName, StatisticSubId, StatisticType, SimTime, Rank, BinsMinValue.u64, BinsMaxValue.u64, BinWidth.u32, TotalNumBins.u32, Sum.u64, SumSQ.u64, NumActiveBins.u32, NumItemsCollected.u64, NumItemsBinned.u64, NumOutOfBounds-MinValue.u64, NumOutOfBounds-MaxValue.u64, Bin0:0-4095.u64, Bin1:4096-8191.u64, Bin2:8192-12287.u64, Bin3:12288-16383.u64, Bin4:16384-20479.u64, Bin5:20480-24575.u64
< sieve, histogram_reads, , Histogram, 0, 0, 24575, 4096, 6, 16395, 268468325, 2, 2, 2, 0, 0, 1, 0, 0, 0, 1, 0
< sieve, histogram_writes, , Histogram, 0, 0, 24575, 4096, 6, 12267, 83534309, 1, 2, 2, 0, 0, 0, 2, 0, 0, 0, 0
shunit.ASSERT: Reference does not Match Output
Simulation is complete, simulated time: 5.5 ns
~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk
TESTSUITE testSuite_memHSieve.sh: Total Suite Wall Clock Time 1 seconds
(This was from SST__zzQandD_sst-test #5, run from branch, useMpirRun)
The DetailedNetwork test is an example of a test that does multiple sst invocations and can get it in a tight loop.
This showed up Thursday the 4th or Friday the 5th.
This has been addressed by changing the timeout from 90 seconds to 250 seconds.
The next level of relief will be to have the Jenkins runs do "shallow" clones, which will cut the time from two plus minutes to on the order of 17 seconds.
The Sandy-bridge and Ivy-bridge tests were created using an existing static binary test object. That binary was created in 2014 (or before) on sst-test. It was built using pthreads download from m5threads. The binary object is still processed by Ariel seemingly happily on sst-test. The Ariel processing of it fails on Ubuntu 14.04. (The four other in the Suite, which are believed to pass, use a locally built stream object.)
@gvoskuilen: @nmhamster
CramSim does eleven checkouts, one per test. On December 14th, the Auto tester failed on the eighth (and all following) downloads. The mainline Nightly had also had a CramSim checkout failure. It failed on the first and then all it tried.
We need to get rid of the 11 file checkout. They are only about 6 kilobytes each before decompression (zip). Should tar them up and do a single checkout. This is an SQE only task—plus of course a repository change.
(I also want to add a stderr to stdout redirect in the CramSim Suite.)
the test was manually killed after 12 hours. The test normally takes approximately 135 seconds.
The test, memHSieve is known to get occasional time limits. This case is an unacceptable failure of the time limit enforcer.
Here's the log file:
Capture-time-limit-May8.txt
Many "stream" tasks were being left behind following Ariel test time outs. It was assumed that these running tasks were causing the slow down. Restarting the VM cleared the problem, but it returned. The Ariel tests continued to time out, even if the stream task were immediately removed. It was then discovered that there had been two sstsim.x tasks running since November 21st. Removing those tasks allowed Ariel tests to run without time limits.
The issue remains. Where did the rouge tasks come from? What events cause them to be left behind? Is the problem related to Xcode-8, or is it just the location where the problem was encounter?
El Capitan nranks=2 SimpleComponent test
Failure that I have never noticed before on this test on any platform. Appears that the output from the two threads got intermixed.
There are a five test Suites that have code to manage this.
In this case, line-count/word-count/byte-count are identical. Two lines have shuffled output.
Ref file, last two lines (100,101);
Component Finished.
Simulation is complete, simulated time: 25 us
Output file of failing test – line 85:
Simulation is complete, Component Finished.
And line 101:
simulated time: 25 us
Auto tester needs to add a (or some) multi-thread test(s) to it's list.
Currently the nightly runs are detecting one test failure on all multi-thread projects.
There are also two test failures on Multi Rank Projects, but this is not a concern because multi-rank is not included in the 6.0 release.
The Jenkins Project names need to be consistent.
Want to delete "COE" from names.
Names like MultiThread... need to include OS, mpi and boost version.
Needs to be characterized and understood.
This is about a failure in the test Suite, not a failure in an individual test. The output for Jenkins is not generated until an individual test actually runs.
If a test Suite fails in an unanticipated manner, Jenkins likely doesn't detect it and so it's not flagged in the Summary records. It simply vanishes from the list of executed tests.
(When testing our test process on Out-of-Source build and denying write permission to the source, at least, 5 test Suites failed and silently vanished from the list.)
On El Capitan, it is not being called with the pid of SST
On Linux, Red Hat 6, etal, The VM does not have Python 2.7, which is required by the script.
This ticket collects two related enhancements. Developers are often unsure when to do an svn update in their sandbox. They don't really know if the head of the trunk produced a good build or not. We need to publish in some very convenient place the svn revision number that was used for the last successfull build and test of the trunk during the nightly process.
In a similar vein, some non-developer users may need a version of the source code from the trunk in order to obtain a new feature or bug fix. They too are at risk if they simply check out the trunk head. Instead, provide a tarball of the last successful build and test of the trunk during the nightly process.
Want to have SST autotester work on the SST/Macro repo for merging PR into devel branch.
On the sstsimulator.github.io website, the support page http://sstsimulator.github.io/SSTPages/SSTMainSupport/ calls out a number of email lists that the user can subscribe to. However the links are not accessible.
Here is one of the example links
https://sst-simulator.org/mailman/listinfo/sst-users
This is most likely a fallout of the direct DNS forward of sst-simulator.org to sstsimulator.github.io. If the DNS is changed from a direct forward to a CNAME record change, this problem most likely will be corrected.
This appear to be totally benign, but should not be the way it's done. Is potentially confusing and a tiny change in process could alter the fact the the thread count used is the last onethat the harness inserts.
The process is not clean and continuing growth of options and needs increase the probability of future failure including that is not immediately recognized as wrong.
Time limit processing is different between Linux and El Capitan.
Time Limit processor left job running (until manually killed) with a memHSieve test on 6.1 release branch on El Capitan.
TL Enforcer: TIME LIMIT test_memHSieve
TL Enforcer: test has exceed alloted time of 50 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_40708_41063
I am 41063, I was called from 40708, my parent PID is 41016
UID PID PPID C STIME TTY TIME CMD
502 40708 51283 0 2:40AM ?? 0:00.01 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 41016 40708 0 2:40AM ?? 0:00.02 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 41072 41068 0 2:40AM ?? 0:00.03 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/intel64/bin/pinbin -p32 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/ia32/bin/pinbin -sigchld_handler 1 -follow_execv -t /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-elements/libexec/fesimple.dylib -p /sst_shmem_41068-0-16807 -v 1 -t 0 -i 1000000000 -c 16 -s 1 -m 1 -k 1 -d 0 -- ./ompsievetest
502 41187 41072 0 2:41AM ?? 0:11.06 ./ompsievetest
502 41301 41259 0 2:41AM ?? 0:00.03 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/intel64/bin/pinbin -p32 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/ia32/bin/pinbin -sigchld_handler 1 -follow_execv -t /Users/sstbuild/jenkins/workspace/SST__Nightly_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/221_221/local/sst-elements/libexec/fesimple.dylib -p /sst_shmem_41259-0-16807 -v 1 -t 0 -i 1000000000 -c 16 -s 1 -m 1 -k 1 -d 0 -- ./ompsievetest
502 41304 41301 0 2:41AM ?? 0:11.15 ./ompsievetest
502 41413 41063 0 2:41AM ?? 0:00.00 grep ompsievetest
this might better go in the Suite
Identified routine: gettimeofday, replacing with Ariel equivalent...
Replacement complete.
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
4 86 1127 omps_list
OMP_PID = 41187
41304
/Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/test/utilities/timeLimitEnforcer.sh: line 55: [: 41187: binary operator expected
Sat Jan 14 02:41:49 MST 2017
I am 41063, I was called from 41016, my parent PID is 41016
No corresponding child named "sst"
Look for a child named "python"
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
No corresponding child named "python"
Look for a child named "valgrind"
No corresponding child named "valgrind"
Sibling is 41067
Kill pid is 41068
502 41068 41067 0 2:40AM ?? 0:46.49 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-core/bin/sst -n 1 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/sst-elements/src/sst/elements/memHierarchy/Sieve/tests/sieve-test.py
502 41068 41067 0 2:40AM ?? 0:46.49 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-core/bin/sst -n 1 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/sst-elements/src/sst/elements/memHierarchy/Sieve/tests/sieve-test.py
It's still there! (41068)
Try a "kill -9"
502 41068 41067 0 2:40AM ?? 0:00.00 (sstsim.x)
Build was aborted
Aborted by Vandyke, John P
Si has previously requested automated Valgrind testing.
The number of "one-time-only" failures in the Nightly testing being seen means that ways to pursue validation is need as a way to attack tests failures that a simple rerun does not reproduce.
Currently most reference files for Elements are located in the sst-sqe repo. This introduces a synchronization issue for the Auto-Tester as necessary changes to an element in repo sst-elements that changes its output will fail tests until the reference file in sst-sqe is changed.
Suggest that we move the reference files under the test directory for each specific element. Then (usually) only changes to the sst-element repo will be required.
None of our test builds use '--disable-mpi'. That build mechanism should probably be tested.
Latest Ubuntu update seem to have altered something with non-login module initialization.
The Jenkins runs are no longer loading modules. Build failure come from needing Boost-1.56
Many test failures on the COE_RHEL builder
sst-coerhel7.sandia.gov.107064hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out.
The above line was appearing in the log of the failed tests. The time out was not the failure that the test Suite was aborting. The line was an extra line in the output file, which caused the Suite to abort the test because out did not match the gold file
The file, libariel.so, can be built and exist without actually working with PIN.
"memHSieve requires Ariel", but existence of libariel.so is not the appropriate test.
Consequently, there are currently MacOS and Ubuntu failures of the memHS test.
As a stopgap measure, the memHS test Suite is going to be modified to only run on sst-test
Time limit on Ember Sweep test 80 on El Capitan, number of ranks=2, run #445.
This resulted in a hang, which I believe is a SQE problem, not an Ember problem.
"""
test_EmberSweep_080_a23d85edc49a8319faea14f6104ed70b nr=2
80 run, 0 have failed
torus --shape=2 PingPong iterations=1 messageSize=0
~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/emberSweep_folder ~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk
Simulation is complete, simulated time: 4.12 us
Test Passed
80: Wall Clock Time 1 sec. torus --shape=2 PingPong iterations=1 messageSize=0
~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk
TL Enforcer: TIME LIMIT test_EmberSweep_080_a23d85edc49a8319faea14f6104ed70b
TL Enforcer: test has exceed alloted time of 900 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_10676_28931
I am 28931, I was called from 10676, my parent PID is 10709
UID PID PPID C STIME TTY TIME CMD
502 10676 73706 0 4:29AM ?? 0:00.02 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 10709 10676 0 4:29AM ?? 0:01.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 54817 28931 0 4:54AM ?? 0:00.00 grep ompsievetest
this might better go in the Suite
0 0 0 /tmp/28931_omps_list
OMP_PID =
Thu May 18 04:54:31 MDT 2017
###############################################################
JOHNS sanity check
502 54769 53884 0 4:54AM ?? 0:00.04 mpirun -np 2 -map-by numa:pe=2 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54770 54769 0 4:54AM ?? 0:00.56 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54771 54769 0 4:54AM ?? 0:00.55 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
----------- first
502 54770 54769 0 4:54AM ?? 0:00.59 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
----------- all
502 54770 54769 0 4:54AM ?? 0:00.63 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54771 54769 0 4:54AM ?? 0:00.62 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
54770
the pid of an sst is 54770
the pid of the mpirun is 54769
Check for Dead Lock
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[0:0] SST Core: Signal handers will be registed for USR1, USR2, INT and TERM...
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[0:0] SST Core: Signal handler registration is completed
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[1:0] SST Core: Signal handers will be registed for USR1, USR2, INT and TERM...
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[1:0] SST Core: Signal handler registration is completed
###############################################################
I am 28931, I was called from 10709, my parent PID is 10709
No corresponding child named "sst"
Look for a child named "python"
No corresponding child named "python"
Look for a child named "valgrind"
No corresponding child named "valgrind"
Sibling is 28944
I am 28931, I was called from 28944, my parent PID is 10709
No corresponding child named "sst"
Look for a child named "python"
No corresponding child named "python"
Look for a child named "valgrind"
No corresponding child named "valgrind"
Kill pid is
ps: option requires an argument -- p
usage: ps [-AaCcEefhjlMmrSTvwXx] [-O fmt | -o fmt] [-G gid[,gid...]]
[-g grp[,grp...]] [-u [uid,uid...]]
[-p pid[,pid...]] [-t tty[,tty...]] [-U user[,user...]]
ps [-L]
Invoke the traceback routine
$SST_ROOT/test/utilities/stackback.py
Must specify PIDs of SST, or specify --all to find SST PIDs
Use --help for more information
Thu May 18 04:54:41 MDT 2017
Return to timeLimitEnforcer
Can not find process to terminate, pid =
I am 28931, my parent was 10709
UID PID PPID C STIME TTY TIME CMD
502 10676 73706 0 4:29AM ?? 0:00.02 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 10709 10676 0 4:29AM ?? 0:01.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 12818 20000 0 4:10AM ?? 0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson2016493627556830810.sh
502 12824 12818 0 4:10AM ?? 0:00.31 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 14313 1 0 4:31AM ?? 0:01.14 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 19951 1 0 Tue10AM ?? 0:03.43 /usr/sbin/cfprefsd agent
502 20000 1 0 Tue10AM ?? 4:49.19 /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -jar /Users/sstbuild/jenkins/slave.jar -jnlpUrl https://jenkins-srn.sandia.gov:8443/computer/SST%20Mac%20OSX%2010.11%20Xcode%207%20(2)/slave-agent.jnlp -secret 17e052036837b2f06631093b7f59eddc3a336352a67333231e5e10768feceaa2
502 28931 10709 0 4:39AM ?? 0:00.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 28944 10709 0 4:39AM ?? 0:00.00 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 28945 28944 0 4:39AM ?? 0:00.00 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 28946 28944 0 4:39AM ?? 0:00.00 awk { if ( $3 == ENVIRON["__timerChild"] ) print $2 }
0 28947 28945 0 4:39AM ?? 0:00.00 ps -ef -u sstbuild
502 29431 73574 0 4:39AM ?? 0:00.03 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 29458 29431 0 4:39AM ?? 0:01.49 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 53498 1 0 2:04AM ?? 0:47.20 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53499 1 0 2:04AM ?? 0:45.98 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53500 1 0 2:04AM ?? 0:41.20 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53501 1 0 2:04AM ?? 0:46.84 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53504 1 0 2:04AM ?? 0:40.47 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53820 12824 0 4:54AM ?? 0:00.02 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 53884 53820 0 4:54AM ?? 0:00.07 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 54896 29458 0 4:54AM ?? 0:00.01 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 54899 29458 0 4:54AM ?? 0:11.17 sst -n 2 --model-options=--topo=fattree --shape=9,9:9,9:18 --cmdLine="Init" --cmdLine="AllPingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
502 54900 54896 0 4:54AM ?? 0:00.00 sleep 900
502 55035 53884 0 4:54AM ?? 0:00.01 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 55037 55035 0 4:54AM ?? 0:00.00 sleep 200
502 55040 53884 0 4:54AM ?? 0:00.05 mpirun -np 2 -map-by numa:pe=2 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 55041 55040 0 4:54AM ?? 0:01.96 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 55042 55040 0 4:54AM ?? 0:01.96 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
0 55126 28931 0 4:54AM ?? 0:00.00 ps -f -U sstbuild
502 60675 1 0 Tue11AM ?? 0:00.13 /usr/sbin/distnoted agent
502 62995 1 0 Tue11AM ?? 0:00.22 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdflagwriter
502 72773 1 0 Tue11AM ?? 0:00.23 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker-sizing -c MDSSizingWorker -m com.apple.mdworker.sizing
502 73568 20000 0 4:05AM ?? 0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson6215251261761081075.sh
502 73574 73568 0 4:05AM ?? 0:00.16 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 73700 20000 0 4:05AM ?? 0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson6861595538799566695.sh
502 73706 73700 0 4:05AM ?? 0:00.22 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 98283 1 0 4:24AM ?? 0:01.11 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 98648 1 0 4:24AM ?? 0:02.03 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
EXIT without killing my parents
Build was aborted
Aborted by Vandyke, John P
"""
While examining a Messier test issue, it was noticed the Messier was not in the list for the weekly Valgrind test. More careful examination revels that Samba, memHierarchy-A and simpleCarWash Suites were also missing.
The developer resolved the Messier issue, from the Valgrind output.
memHA and simple Car Wash appear to be clean with respect to Valgrind.
These executing tasks can soak up cpu cycles such that tests take a factor of ten or more times as much to complete. This is primarily observed on El Capitan, but that is our principal MacOS test environment. This task are created by (or for) Ariel, but test Suites other the "Ariel" Suite use Ariel. (Cannot reproduce the problem on Linux.)
So far for tests that fail, because one file is expected and Multi-Rank spreads the output over a file per rank, individual Suites have had specific solutions inserted. Until memHSieve, these were all on stdout.
A subroutine, or Macro needs to be created to handle these cases, instead of spreading unique point solutions over the testSuites.
Building of qsim external is failing on command not found
Testing this may be more expensive, than it merits. This would likely require an addition test Project because the variable is the actual build.
(A change that only broke out-of-source build did get auto-merged this month.)
May 2nd During EmberSweep #58:
'''
test_EmberSweep_058_238ff71fbe090af2a8d8365cdfe9015f nt=1
58 run, 0 have failed
torus --shape=8x8x8 PingPong iterations=10 messageSize=10000
~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/emberSweep_folder ~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk
TL Enforcer: TIME LIMIT test_EmberSweep_058_238ff71fbe090af2a8d8365cdfe9015f
TL Enforcer: test has exceed alloted time of 900 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_7637_10133
I am 10133, I was called from 7637, my parent PID is 7670
UID PID PPID C STIME TTY TIME CMD
sstbuild 7637 98256 0 05:07 ? 00:00:00 /bin/bash /home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
sstbuild 7670 7637 0 05:07 ? 00:00:00 /bin/bash /home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
sstbuild 10255 10133 0 05:41 ? 00:00:00 grep ompsievetest
this might better go in the Suite
0 0 0 /tmp/10133_omps_list
OMP_PID =
Tue May 2 05:41:52 MDT 2017
Kill pid is 10136
Invoke the traceback routine
$SST_ROOT/test/utilities/stackback.py 10136
/usr/bin/env: python2.7: No such file or directory
Tue May 2 05:41:52 MDT 2017
Return to timeLimitEnforcer
sstbuild 10136 7670 0 05:26 ? 00:00:01 sst -n 1 --model-options=--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
sstbuild 10136 7670 0 05:26 ? 00:00:01 sst -n 1 --model-options=--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
It's still there! (10136)
Try a "kill -9"
/home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testTempFiles/bashIN: line 343: 10136 Killed sst -n 1 --model-options="--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini"" emberLoad.py > tmp_file
Time Limit detected at 900 seconds
shunit.ASSERT: Time Limit detected at 900 seconds
~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk
-------
'''
Have not yet tried a major increase in time limit to see if they eventually finish.
miranda_singlestream went time limit at 200 seconds. (without multithread - 6 sec/ n=2 - 43 sec)
miranda_randomgen went time limit at nearly an hour, 3500 seconds. (w/o mt 130 sec/ n=2 - 495)
Is static build supposed to be resurrected or deleted as a possibility?
The Scheduler Detailed Network test was failing consistently on Multi-Rank tests, the same way as the test fails on COE-RHEL-7 and Ubuntu-16.04. See Issue 108. On June 9, the failure mode changed on Multi-Rank. This should be examined after Issue 108 in Ember is fixed and when Multi-Rank is fully supported.
SST_BEGIN_NEW_SUITE testSuite_scheduler_DetailedNetwork.sh
/home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/sst-elements/src
Started at Thu Jul 14 05:27:53 MDT 2016
Be Patient. This test runs over 4 minutes on sst-test
IGNORE the word "FATAL" in output messages
-------
test_scheduler_DetailedNetwork
Setting Multi rank count to 2
init_cmd = "sst ./%s" %(options.schedPythonFile)
execcommand = "mpirun -np 2 sst "
execcommand = "mpirun -np 2 sst --stop-at " + StopAtTime
FATAL: Failed to start job #1 at guaranteed time
Time: 255740 Guarantee: 110000
46 259 1704 /home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/test/testReferenceFiles/test_scheduler_DetailedNetwork.out
21 83 525 /home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/test/testOutputs/test_DetailedNetwork.out
67 342 2229 total
Test 6 FAILED
I am seeing:
ERROR : SST__OutOfSource_MacOSX_10.10.5_OMPI_1.8_Boost_1.56_clang_700.1.81_mainline
This seems to be a name mis-match in configuration for email.
This probably has a "chicken and egg" issue between Elements and SQE
In the email from 25th Feb 2016, the Ubuntu build failed but the marking is TESTFAIL when it should be FAILED.
TESTFAIL : SST__COE_RHEL_7_OMPI-1.10_Boost-1.56_mainline
PASSED : SST__memH_using_Ariel
TESTFAIL : SST__Ubuntu_16.04B_OMPI_1.6.5_Boost_1.56_mainline
TESTFAIL : SST__x-Multithread-MacOS-n2
TESTFAIL : SST__x-Multithread-Ubuntu-n2
But further below:
FAILED : Job SST__Ubuntu_16.04B_OMPI_1.6.5_Boost_1.56_mainline: Build # 36
Job - Is Running: False
Job - Is Good: False
Job - Status: FAILURE
Job - Duration: 0:28:05.475000
An error message is written to stderr, but script execution continues. ModuleEx detects when "module" does not find the desired specific module, but if the command module is not found or execute correctly no error code is returned.
Sample log output:
OpenMPI (openmpi-1.8) selected
/home/jwilso/jenkins/workspace/memH_without_openMP/7_7/devel/trunk/test/utilities/moduleex.sh: line 26: module: command not found
/home/jwilso/jenkins/workspace/memH_without_openMP/7_7/devel/trunk/test/utilities/moduleex.sh: line 26: module: command not found
bamboo.sh: Boost 1.56 selected
We should provide, on our download site, a nightly tarball of the current state of SSTs trunk repository - the result of running make dist. #152 could then download and use that tarball as the basis for running tests.
Started by user Vandyke, John P
[EnvInject] - Loading node environment variables.
Building remotely on SST Ubuntu 16.04 (ubuntu16.04 sstresource) in workspace /home/sstbuild/jenkins/workspace/SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline
FATAL: java.io.EOFException
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:303)
at hudson.remoting.Channel.terminate(Channel.java:847)
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:614)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at ......remote call to SST Ubuntu 16.04(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
at hudson.remoting.Request.call(Request.java:172)
at hudson.remoting.Channel.call(Channel.java:780)
at hudson.FilePath.act(FilePath.java:979)
at hudson.FilePath.act(FilePath.java:968)
at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:131)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:741)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:733)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1080)
at hudson.scm.SCM.checkout(SCM.java:485)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1269)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1738)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
Caused by: java.io.EOFException
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:614)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
ERROR: Step ‘E-mail Notification’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
ERROR: Step ‘Delete workspace when build is done’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
Finished: FAILURE
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.