Coder Social home page Coder Social logo

sstsimulator / sst-sqe Goto Github PK

View Code? Open in Web Editor NEW
4.0 16.0 13.0 18.66 MB

SST Software Quality, Testing and Engineering Resources

License: Other

C 5.01% Shell 78.36% Perl 1.03% CSS 1.30% HTML 7.88% Roff 5.89% Cuda 0.50% BitBake 0.03%
simulation testing sst discrete-event snl-build-tools

sst-sqe's People

Contributors

allevin avatar berquist avatar dogquixote avatar gvoskuilen avatar hughes-c avatar jpvandy avatar jwilso avatar nmhamster avatar researcherben avatar sst-autotester avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sst-sqe's Issues

openmpi is leaving around empty directories on /tmp

On sst=test, Gwen has less than 5000, jpvandy has more than 15,000 and jwilson has over 30,000.

(they are in subdirectories by user, so the number shown by "ls /tmp" is not large.)

The needed clean-up may well be other than in SQE

An error in the memHSieve test on MacOS sometimes takes out Jenkins Agent

The memHSieve test uses Ariel. On MacOS, currently there are problems with the deployment of Ariel. It appears that if the child that Ariel creates does not successfully start, Ariel clean-up via its emergency shutdown code issuing a kill(0, ...) that kills parents and children up (down) including the Jenkins agent that is the parent of the bash script. See the Ariel Issue.

memHSieve test fails with no useful output with mpirun

test_memHSieve
~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests ~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk
sed: can't read /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv: No such file or directory
3 69 637 /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv.gold
wc: /home/jwilso/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk/sst/elements/memHierarchy/Sieve/tests/StatisticOutput.csv: No such file or directory
3 69 637 total
1,3d0
< ComponentName, StatisticName, StatisticSubId, StatisticType, SimTime, Rank, BinsMinValue.u64, BinsMaxValue.u64, BinWidth.u32, TotalNumBins.u32, Sum.u64, SumSQ.u64, NumActiveBins.u32, NumItemsCollected.u64, NumItemsBinned.u64, NumOutOfBounds-MinValue.u64, NumOutOfBounds-MaxValue.u64, Bin0:0-4095.u64, Bin1:4096-8191.u64, Bin2:8192-12287.u64, Bin3:12288-16383.u64, Bin4:16384-20479.u64, Bin5:20480-24575.u64
< sieve, histogram_reads, , Histogram, 0, 0, 24575, 4096, 6, 16395, 268468325, 2, 2, 2, 0, 0, 1, 0, 0, 0, 1, 0
< sieve, histogram_writes, , Histogram, 0, 0, 24575, 4096, 6, 12267, 83534309, 1, 2, 2, 0, 0, 0, 2, 0, 0, 0, 0
shunit.ASSERT: Reference does not Match Output

Simulation is complete, simulated time: 5.5 ns

~/jenkins/workspace/SST__zzQandD_sst-test_use_mpirun/5_2016-03-15_02-02-00/devel/trunk
TESTSUITE testSuite_memHSieve.sh: Total Suite Wall Clock Time 1 seconds

(This was from SST__zzQandD_sst-test #5, run from branch, useMpirRun)

The Ariel test Suite fails except on sst-test, because of faulty testing input.

The Sandy-bridge and Ivy-bridge tests were created using an existing static binary test object. That binary was created in 2014 (or before) on sst-test. It was built using pthreads download from m5threads. The binary object is still processed by Ariel seemingly happily on sst-test. The Ariel processing of it fails on Ubuntu 14.04. (The four other in the Suite, which are believed to pass, use a locally built stream object.)
@gvoskuilen: @nmhamster

Multi-file CramSim test data for 11 file download can be deleted when 7.0 is released.

CramSim does eleven checkouts, one per test. On December 14th, the Auto tester failed on the eighth (and all following) downloads. The mainline Nightly had also had a CramSim checkout failure. It failed on the first and then all it tried.
We need to get rid of the 11 file checkout. They are only about 6 kilobytes each before decompression (zip). Should tar them up and do a single checkout. This is an SQE only task—plus of course a repository change.

(I also want to add a stderr to stdout redirect in the CramSim Suite.)

Valgrind MemHSieve test hung

the test was manually killed after 12 hours. The test normally takes approximately 135 seconds.

The test, memHSieve is known to get occasional time limits. This case is an unacceptable failure of the time limit enforcer.
Here's the log file:
Capture-time-limit-May8.txt

Significant Slow down on El Capitan Xcode-8 from rogue tasks

Many "stream" tasks were being left behind following Ariel test time outs. It was assumed that these running tasks were causing the slow down. Restarting the VM cleared the problem, but it returned. The Ariel tests continued to time out, even if the stream task were immediately removed. It was then discovered that there had been two sstsim.x tasks running since November 21st. Removing those tasks allowed Ariel tests to run without time limits.

The issue remains. Where did the rouge tasks come from? What events cause them to be left behind? Is the problem related to Xcode-8, or is it just the location where the problem was encounter?

SimpleComponent test Failed on El Capitan with number of Ranks = 2

El Capitan nranks=2 SimpleComponent test

Failure that I have never noticed before on this test on any platform. Appears that the output from the two threads got intermixed.

There are a five test Suites that have code to manage this.

In this case, line-count/word-count/byte-count are identical. Two lines have shuffled output.

  • Ref file, last two lines (100,101);
    Component Finished.
    Simulation is complete, simulated time: 25 us

  • Output file of failing test – line 85:
    Simulation is complete, Component Finished.

  • And line 101:
    simulated time: 25 us

Auto tester does NOT fully detect Multi Thread errors.

Auto tester needs to add a (or some) multi-thread test(s) to it's list.

Currently the nightly runs are detecting one test failure on all multi-thread projects.

There are also two test failures on Multi Rank Projects, but this is not a concern because multi-rank is not included in the 6.0 release.

Need better detection of test Suites failures.

This is about a failure in the test Suite, not a failure in an individual test. The output for Jenkins is not generated until an individual test actually runs.

If a test Suite fails in an unanticipated manner, Jenkins likely doesn't detect it and so it's not flagged in the Summary records. It simply vanishes from the list of executed tests.

(When testing our test process on Out-of-Source build and denying write permission to the source, at least, 5 test Suites failed and silently vanished from the list.)

Trac: #275 Upon successful nightly build & test, publish make dist tarball and svn rev num

This ticket collects two related enhancements. Developers are often unsure when to do an svn update in their sandbox. They don't really know if the head of the trunk produced a good build or not. We need to publish in some very convenient place the svn revision number that was used for the last successfull build and test of the trunk during the nightly process.

In a similar vein, some non-developer users may need a version of the source code from the trunk in order to obtain a new feature or bug fix. They too are at risk if they simply check out the trunk head. Instead, provide a tarball of the last successful build and test of the trunk during the nightly process.

Email list servers called in website (sstsimulator.github.io) support page not accessible

On the sstsimulator.github.io website, the support page http://sstsimulator.github.io/SSTPages/SSTMainSupport/ calls out a number of email lists that the user can subscribe to. However the links are not accessible.

Here is one of the example links
https://sst-simulator.org/mailman/listinfo/sst-users

This is most likely a fallout of the direct DNS forward of sst-simulator.org to sstsimulator.github.io. If the DNS is changed from a direct forward to a CNAME record change, this problem most likely will be corrected.

Testing harness inserts duplicate thread count parameters in tests.

This appear to be totally benign, but should not be the way it's done. Is potentially confusing and a tiny change in process could alter the fact the the thread count used is the last onethat the harness inserts.

The process is not clean and continuing growth of options and needs increase the probability of future failure including that is not immediately recognized as wrong.

Time Limit processer on El Capitan failed to terminate memHSieve test.

Time limit processing is different between Linux and El Capitan.

Time Limit processor left job running (until manually killed) with a memHSieve test on 6.1 release branch on El Capitan.

TL Enforcer: TIME LIMIT test_memHSieve
TL Enforcer: test has exceed alloted time of 50 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_40708_41063

I am 41063, I was called from 40708, my parent PID is 41016
UID PID PPID C STIME TTY TIME CMD
502 40708 51283 0 2:40AM ?? 0:00.01 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 41016 40708 0 2:40AM ?? 0:00.02 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76

502 41072 41068 0 2:40AM ?? 0:00.03 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/intel64/bin/pinbin -p32 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/ia32/bin/pinbin -sigchld_handler 1 -follow_execv -t /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-elements/libexec/fesimple.dylib -p /sst_shmem_41068-0-16807 -v 1 -t 0 -i 1000000000 -c 16 -s 1 -m 1 -k 1 -d 0 -- ./ompsievetest
502 41187 41072 0 2:41AM ?? 0:11.06 ./ompsievetest
502 41301 41259 0 2:41AM ?? 0:00.03 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/intel64/bin/pinbin -p32 /usr/local/module-pkgs/pin/pin-2.14-71313-clang.5.1-mac/ia32/bin/pinbin -sigchld_handler 1 -follow_execv -t /Users/sstbuild/jenkins/workspace/SST__Nightly_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/221_221/local/sst-elements/libexec/fesimple.dylib -p /sst_shmem_41259-0-16807 -v 1 -t 0 -i 1000000000 -c 16 -s 1 -m 1 -k 1 -d 0 -- ./ompsievetest
502 41304 41301 0 2:41AM ?? 0:11.15 ./ompsievetest
502 41413 41063 0 2:41AM ?? 0:00.00 grep ompsievetest
this might better go in the Suite
Identified routine: gettimeofday, replacing with Ariel equivalent...
Replacement complete.
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
4 86 1127 omps_list
OMP_PID = 41187
41304
/Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/test/utilities/timeLimitEnforcer.sh: line 55: [: 41187: binary operator expected
Sat Jan 14 02:41:49 MST 2017

I am 41063, I was called from 41016, my parent PID is 41016
No corresponding child named "sst"

Look for a child named "python"
Identified routine: malloc/_malloc, replacing with Ariel equivalent...
Identified routine: free/_free, replacing with Ariel equivalent...
No corresponding child named "python"

Look for a child named "valgrind"
No corresponding child named "valgrind"

Sibling is 41067
Kill pid is 41068
502 41068 41067 0 2:40AM ?? 0:46.49 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-core/bin/sst -n 1 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/sst-elements/src/sst/elements/memHierarchy/Sieve/tests/sieve-test.py
502 41068 41067 0 2:40AM ?? 0:46.49 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/local/sst-core/bin/sst -n 1 /Users/sstbuild/jenkins/workspace/SST__6.1.0_beta_OutOfSource_MacOSX_10.11_OMPI_1.8_Boost_1.56_clang_700.1.76_mainline/20_20/devel/trunk/sst-elements/src/sst/elements/memHierarchy/Sieve/tests/sieve-test.py
It's still there! (41068)
Try a "kill -9"
502 41068 41067 0 2:40AM ?? 0:00.00 (sstsim.x)
Build was aborted
Aborted by Vandyke, John P

Move SQE reference files to Elements

Currently most reference files for Elements are located in the sst-sqe repo. This introduces a synchronization issue for the Auto-Tester as necessary changes to an element in repo sst-elements that changes its output will fail tests until the reference file in sst-sqe is changed.

Suggest that we move the reference files under the test directory for each specific element. Then (usually) only changes to the sst-element repo will be required.

Red Hat builder is getting access errors on /dev/hfi1_0

Many test failures on the COE_RHEL builder

sst-coerhel7.sandia.gov.107064hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out.

The above line was appearing in the log of the failed tests. The time out was not the failure that the test Suite was aborting. The line was an extra line in the output file, which caused the Suite to abort the test because out did not match the gold file

@jwilso

The memHSieve testSuite test for Ariel is not valid

The file, libariel.so, can be built and exist without actually working with PIN.

"memHSieve requires Ariel", but existence of libariel.so is not the appropriate test.

Consequently, there are currently MacOS and Ubuntu failures of the memHS test.

As a stopgap measure, the memHS test Suite is going to be modified to only run on sst-test

Ember Sweep time limit hang

Time limit on Ember Sweep test 80 on El Capitan, number of ranks=2, run #445.
This resulted in a hang, which I believe is a SQE problem, not an Ember problem.
"""
test_EmberSweep_080_a23d85edc49a8319faea14f6104ed70b nr=2
80 run, 0 have failed
torus --shape=2 PingPong iterations=1 messageSize=0
~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/emberSweep_folder ~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk
Simulation is complete, simulated time: 4.12 us

Test Passed
80: Wall Clock Time 1 sec. torus --shape=2 PingPong iterations=1 messageSize=0

~/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk

TL Enforcer: TIME LIMIT test_EmberSweep_080_a23d85edc49a8319faea14f6104ed70b
TL Enforcer: test has exceed alloted time of 900 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_10676_28931

I am 28931, I was called from 10676, my parent PID is 10709
UID PID PPID C STIME TTY TIME CMD
502 10676 73706 0 4:29AM ?? 0:00.02 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 10709 10676 0 4:29AM ?? 0:01.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh

502 54817 28931 0 4:54AM ?? 0:00.00 grep ompsievetest
this might better go in the Suite
0 0 0 /tmp/28931_omps_list
OMP_PID =
Thu May 18 04:54:31 MDT 2017

###############################################################
JOHNS sanity check
502 54769 53884 0 4:54AM ?? 0:00.04 mpirun -np 2 -map-by numa:pe=2 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54770 54769 0 4:54AM ?? 0:00.56 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54771 54769 0 4:54AM ?? 0:00.55 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
----------- first
502 54770 54769 0 4:54AM ?? 0:00.59 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
----------- all
502 54770 54769 0 4:54AM ?? 0:00.63 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 54771 54769 0 4:54AM ?? 0:00.62 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=no /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
54770
the pid of an sst is 54770
the pid of the mpirun is 54769
Check for Dead Lock
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[0:0] SST Core: Signal handers will be registed for USR1, USR2, INT and TERM...
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[0:0] SST Core: Signal handler registration is completed
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[1:0] SST Core: Signal handers will be registed for USR1, USR2, INT and TERM...
/Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testOutputs/test_ember_params.out:[1:0] SST Core: Signal handler registration is completed
###############################################################
I am 28931, I was called from 10709, my parent PID is 10709
No corresponding child named "sst"

Look for a child named "python"
No corresponding child named "python"

Look for a child named "valgrind"
No corresponding child named "valgrind"

Sibling is 28944
I am 28931, I was called from 28944, my parent PID is 10709
No corresponding child named "sst"

Look for a child named "python"
No corresponding child named "python"

Look for a child named "valgrind"
No corresponding child named "valgrind"

Kill pid is

ps: option requires an argument -- p
usage: ps [-AaCcEefhjlMmrSTvwXx] [-O fmt | -o fmt] [-G gid[,gid...]]
[-g grp[,grp...]] [-u [uid,uid...]]
[-p pid[,pid...]] [-t tty[,tty...]] [-U user[,user...]]
ps [-L]
Invoke the traceback routine
$SST_ROOT/test/utilities/stackback.py

Must specify PIDs of SST, or specify --all to find SST PIDs
Use --help for more information

Thu May 18 04:54:41 MDT 2017
Return to timeLimitEnforcer

Can not find process to terminate, pid =
I am 28931, my parent was 10709
UID PID PPID C STIME TTY TIME CMD
502 10676 73706 0 4:29AM ?? 0:00.02 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 10709 10676 0 4:29AM ?? 0:01.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 12818 20000 0 4:10AM ?? 0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson2016493627556830810.sh
502 12824 12818 0 4:10AM ?? 0:00.31 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 14313 1 0 4:31AM ?? 0:01.14 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 19951 1 0 Tue10AM ?? 0:03.43 /usr/sbin/cfprefsd agent
502 20000 1 0 Tue10AM ?? 4:49.19 /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -jar /Users/sstbuild/jenkins/slave.jar -jnlpUrl https://jenkins-srn.sandia.gov:8443/computer/SST%20Mac%20OSX%2010.11%20Xcode%207%20(2)/slave-agent.jnlp -secret 17e052036837b2f06631093b7f59eddc3a336352a67333231e5e10768feceaa2
502 28931 10709 0 4:39AM ?? 0:00.05 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 28944 10709 0 4:39AM ?? 0:00.00 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 28945 28944 0 4:39AM ?? 0:00.00 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_Multi_Rank_ElCapitan_full_n-2/445_445/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 28946 28944 0 4:39AM ?? 0:00.00 awk { if ( $3 == ENVIRON["__timerChild"] ) print $2 }
0 28947 28945 0 4:39AM ?? 0:00.00 ps -ef -u sstbuild
502 29431 73574 0 4:39AM ?? 0:00.03 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 29458 29431 0 4:39AM ?? 0:01.49 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 53498 1 0 2:04AM ?? 0:47.20 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53499 1 0 2:04AM ?? 0:45.98 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53500 1 0 2:04AM ?? 0:41.20 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53501 1 0 2:04AM ?? 0:46.84 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53504 1 0 2:04AM ?? 0:40.47 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 53820 12824 0 4:54AM ?? 0:00.02 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 53884 53820 0 4:54AM ?? 0:00.07 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 54896 29458 0 4:54AM ?? 0:00.01 /bin/bash /Users/sstbuild/jenkins/workspace/SST__Nightly_MultiThread_MacOS-n2/594_594/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
502 54899 29458 0 4:54AM ?? 0:11.17 sst -n 2 --model-options=--topo=fattree --shape=9,9:9,9:18 --cmdLine="Init" --cmdLine="AllPingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
502 54900 54896 0 4:54AM ?? 0:00.00 sleep 900
502 55035 53884 0 4:54AM ?? 0:00.01 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 55037 55035 0 4:54AM ?? 0:00.00 sleep 200
502 55040 53884 0 4:54AM ?? 0:00.05 mpirun -np 2 -map-by numa:pe=2 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 55041 55040 0 4:54AM ?? 0:01.96 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
502 55042 55040 0 4:54AM ?? 0:01.96 /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/local/sst-core/bin/sst --model-options --TraceType=text --UseDramSim=yes /Users/sstbuild/jenkins/workspace/SST__7.0.0_beta_Multi_Rank_ElCapitan_full_n-2/16_16/devel/trunk/sst-elements/src/sst/elements/prospero/tests/array/trace-common.py
0 55126 28931 0 4:54AM ?? 0:00.00 ps -f -U sstbuild
502 60675 1 0 Tue11AM ?? 0:00.13 /usr/sbin/distnoted agent
502 62995 1 0 Tue11AM ?? 0:00.22 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdflagwriter
502 72773 1 0 Tue11AM ?? 0:00.23 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker-sizing -c MDSSizingWorker -m com.apple.mdworker.sizing
502 73568 20000 0 4:05AM ?? 0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson6215251261761081075.sh
502 73574 73568 0 4:05AM ?? 0:00.16 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 73700 20000 0 4:05AM ?? 0:00.01 /bin/sh -xe /var/folders/k7/n40w4cyj4_l78z_fw6677ccm0000gp/T/hudson6861595538799566695.sh
502 73706 73700 0 4:05AM ?? 0:00.22 /bin/bash ../sqe/buildsys/bamboo.sh sstmainline_config_macosx_no_gem5 openmpi-1.8 boost-1.56 clang-700.1.76
502 98283 1 0 4:24AM ?? 0:01.11 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared
502 98648 1 0 4:24AM ?? 0:02.03 /System/Library/Frameworks/CoreServices.framework/Frameworks/Metadata.framework/Versions/A/Support/mdworker -s mdworker -c MDSImporterWorker -m com.apple.mdworker.shared

            EXIT without killing my parents 

Build was aborted
Aborted by Vandyke, John P
"""

Four test Suites missing from weekly Valgrind test

While examining a Messier test issue, it was noticed the Messier was not in the list for the weekly Valgrind test. More careful examination revels that Samba, memHierarchy-A and simpleCarWash Suites were also missing.

The developer resolved the Messier issue, from the Valgrind output.

memHA and simple Car Wash appear to be clean with respect to Valgrind.

Testing needs generic solution to case with a file per rank.

So far for tests that fail, because one file is expected and Multi-Rank spreads the output over a file per rank, individual Suites have had specific solutions inserted. Until memHSieve, these were all on stdout.

A subroutine, or Macro needs to be created to handle these cases, instead of spreading unique point solutions over the testSuites.

The trace back script, stackback.py,T fails on COE_RHEL-6 for lack of Python-2.7

May 2nd During EmberSweep #58:
'''
test_EmberSweep_058_238ff71fbe090af2a8d8365cdfe9015f nt=1
58 run, 0 have failed
torus --shape=8x8x8 PingPong iterations=10 messageSize=10000
~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/emberSweep_folder ~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk

TL Enforcer: TIME LIMIT test_EmberSweep_058_238ff71fbe090af2a8d8365cdfe9015f
TL Enforcer: test has exceed alloted time of 900 seconds.
Create Time Limit Flag file, /tmp/TimeFlag_7637_10133

I am 10133, I was called from 7637, my parent PID is 7670
UID PID PPID C STIME TTY TIME CMD
sstbuild 7637 98256 0 05:07 ? 00:00:00 /bin/bash /home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/testSuite_EmberSweep.sh
sstbuild 7670 7637 0 05:07 ? 00:00:00 /bin/bash /home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testSuites/testSuite_EmberSweep.sh

sstbuild 10255 10133 0 05:41 ? 00:00:00 grep ompsievetest
this might better go in the Suite
0 0 0 /tmp/10133_omps_list
OMP_PID =
Tue May 2 05:41:52 MDT 2017

Kill pid is 10136

      Invoke the traceback routine 

$SST_ROOT/test/utilities/stackback.py 10136

/usr/bin/env: python2.7: No such file or directory

Tue May 2 05:41:52 MDT 2017
Return to timeLimitEnforcer

sstbuild 10136 7670 0 05:26 ? 00:00:01 sst -n 1 --model-options=--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
sstbuild 10136 7670 0 05:26 ? 00:00:01 sst -n 1 --model-options=--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini" emberLoad.py
It's still there! (10136)
Try a "kill -9"
/home/sstbuild/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk/test/testTempFiles/bashIN: line 343: 10136 Killed sst -n 1 --model-options="--topo=torus --shape=8x8x8 --cmdLine="Init" --cmdLine="PingPong iterations=10 messageSize=10000 " --cmdLine="Fini"" emberLoad.py > tmp_file
Time Limit detected at 900 seconds
shunit.ASSERT: Time Limit detected at 900 seconds
~/jenkins/workspace/SST__Nightly_COE_RHEL_6_OMPI-1.8_Boost-1.56_mainline_gcc-4.8.2/526_526/devel/trunk
-------
'''

running "mpirun -np 2", two miranda tests time limit

Have not yet tried a major increase in time limit to see if they eventually finish.

miranda_singlestream went time limit at 200 seconds. (without multithread - 6 sec/ n=2 - 43 sec)

miranda_randomgen went time limit at nearly an hour, 3500 seconds. (w/o mt 130 sec/ n=2 - 495)

New failure mode for scheduler detailed Network test on Multi Rank

The Scheduler Detailed Network test was failing consistently on Multi-Rank tests, the same way as the test fails on COE-RHEL-7 and Ubuntu-16.04. See Issue 108. On June 9, the failure mode changed on Multi-Rank. This should be examined after Issue 108 in Ember is fixed and when Multi-Rank is fully supported.


SST_BEGIN_NEW_SUITE testSuite_scheduler_DetailedNetwork.sh
/home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/sst-elements/src

Started at Thu Jul 14 05:27:53 MDT 2016
Be Patient. This test runs over 4 minutes on sst-test
IGNORE the word "FATAL" in output messages
-------
test_scheduler_DetailedNetwork
Setting Multi rank count to 2
init_cmd = "sst ./%s" %(options.schedPythonFile)
execcommand = "mpirun -np 2 sst "
execcommand = "mpirun -np 2 sst --stop-at " + StopAtTime
FATAL: Failed to start job #1 at guaranteed time
Time: 255740 Guarantee: 110000
46 259 1704 /home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/test/testReferenceFiles/test_scheduler_DetailedNetwork.out
21 83 525 /home/jwilso/jenkins/workspace/SST__6.0.0_Pre_Multi_Rank_sst-test_full_n-2/5_2016-07-14_05-03-41/devel/trunk/test/testOutputs/test_DetailedNetwork.out
67 342 2229 total
Test 6 FAILED

Nightly Status Email Bug with Failed Builds

In the email from 25th Feb 2016, the Ubuntu build failed but the marking is TESTFAIL when it should be FAILED.

TESTFAIL : SST__COE_RHEL_7_OMPI-1.10_Boost-1.56_mainline

PASSED : SST__memH_using_Ariel

TESTFAIL : SST__Ubuntu_16.04B_OMPI_1.6.5_Boost_1.56_mainline

TESTFAIL : SST__x-Multithread-MacOS-n2

TESTFAIL : SST__x-Multithread-Ubuntu-n2

But further below:

FAILED : Job SST__Ubuntu_16.04B_OMPI_1.6.5_Boost_1.56_mainline: Build # 36
Job - Is Running: False
Job - Is Good: False
Job - Status: FAILURE
Job - Duration: 0:28:05.475000

in Bamboo.sh, failure to find the command, module, does not return error flag.

An error message is written to stderr, but script execution continues. ModuleEx detects when "module" does not find the desired specific module, but if the command module is not found or execute correctly no error code is returned.

Sample log output:
OpenMPI (openmpi-1.8) selected
/home/jwilso/jenkins/workspace/memH_without_openMP/7_7/devel/trunk/test/utilities/moduleex.sh: line 26: module: command not found
/home/jwilso/jenkins/workspace/memH_without_openMP/7_7/devel/trunk/test/utilities/moduleex.sh: line 26: module: command not found
bamboo.sh: Boost 1.56 selected

Trac: #202 Provide nightly tarballs

We should provide, on our download site, a nightly tarball of the current state of SSTs trunk repository - the result of running make dist. #152 could then download and use that tarball as the basis for running tests.

Ubuntu 16.04 not connected to Jenkins server

Started by user Vandyke, John P
[EnvInject] - Loading node environment variables.
Building remotely on SST Ubuntu 16.04 (ubuntu16.04 sstresource) in workspace /home/sstbuild/jenkins/workspace/SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline
FATAL: java.io.EOFException
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:303)
at hudson.remoting.Channel.terminate(Channel.java:847)
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:614)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at ......remote call to SST Ubuntu 16.04(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
at hudson.remoting.Request.call(Request.java:172)
at hudson.remoting.Channel.call(Channel.java:780)
at hudson.FilePath.act(FilePath.java:979)
at hudson.FilePath.act(FilePath.java:968)
at org.jenkinsci.plugins.gitclient.Git.getClient(Git.java:131)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:741)
at hudson.plugins.git.GitSCM.createClient(GitSCM.java:733)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1080)
at hudson.scm.SCM.checkout(SCM.java:485)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1269)
at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:607)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1738)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:410)
Caused by: java.io.EOFException
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:614)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
ERROR: Step ‘E-mail Notification’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
ERROR: Step ‘Delete workspace when build is done’ failed: no workspace for SST__Ubuntu_16.04_OMPI_1.10_Boost_1.56_mainline #96
Finished: FAILURE

Test Suites should detect and report data down-load failures.

Presently data down load failures in a test Suite results in test failure, but doesn't immediately report the cause. Examples: January 12, on Ubuntu-14.04 both Sirius Zodiac (run #31) and multiple CramSim tests. CramSim data download has an additional problem described in SQE issue #489

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.