I am getting weird test results, seemingly randomly occurring , agains

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes, could be a buffering issue. What if the is crashing? Do you collect its

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Seemingly random test issues in chown/00.t on Linux / TravisCI about pjdfstest HOT 7 CLOSED

pjd commented on September 26, 2024

Seemingly random test issues in chown/00.t on Linux / TravisCI

from pjdfstest.

Comments (7)

asomers commented on September 26, 2024

Sadly this kind of problem can be very difficult to track down when using TAP. The easiest thing to do might be to sprinkle chown/00.t with a lot of todomsg= statements. Those will show up in the output, allowing you to deduce which expectations were skipped.

from pjdfstest.

s-m-e commented on September 26, 2024

@asomers Thanks for the help. That's really tedious ;) Well, I automated it.

The patched script: chown_00.txt
What I get locally: numbered_local.txt
What I get on Travis: numbered_travis.txt

Diffing the two with diff numbered_local.txt numbered_travis.txt results in:

1316,1323c1316
< ok 1316 # TODO PJD COUNT 229
< ok 1317 # TODO PJD COUNT 230
< ok 1318 # TODO PJD COUNT 231
< ok 1319 # TODO PJD COUNT 232
< ok 1320 # TODO PJD COUNT 233
< ok 1321 # TODO PJD COUNT 234
< ok 1322 # TODO PJD COUNT 236
< ok 1323 # TODO PJD COUNT 237 
---
> ok 1316 # TODO PJD COUNT 229

In my patched test script, the (potential) issue is somewhere around line 690, i.e. here:

# [...]
	todo Linux "PJD COUNT 229"
	create_file ${type} ${n0}

	ctime1=`${fstest} lstat ${n0} ctime`
	sleep 1
	todo Linux "PJD COUNT 230"
	expect EPERM -u 65534 -- lchown ${n0} 65534 -1
	todo Linux "PJD COUNT 231"
	expect EPERM -u 65534 -g 65534 -- lchown ${n0} -1 65534
# [...]

Is there any sane reason that could explain why create_file would cause the test script to terminate? My suspicion is that I am just not catching the entire output of the test script ... something like a buffering issue. On the other hand, it's the same code that catches the output of all other test scripts as well and only chown/00.t seems (randomly) affected. However, chown/00.t is, as far as I can tell, the longest running test script of pjdfstest, which makes it again sound like a buffering issue. I'll investigate further tomorrow.

from pjdfstest.

asomers commented on September 26, 2024

Yes, could be a buffering issue. What if the script is crashing? Do you collect its exit status?

from pjdfstest.

s-m-e commented on September 26, 2024

@asomers Ok, I narrowed it down ... My fault, pjdfstest is doing what it is supposed to do.

Background: My test harness (i.e. substitute for prove) is written in Python and runs with normal user privileges. pjdfstest's tests require to run with super user privileges, however. They also spawn their own sub-processes. Because of the latter fact, I isolate pjdfstest's tests into new process groups (setsid or sudo -b) directly underneath the operating system's init process. This allows me to kill the entire process group without loosing my Python test harness if things go wrong. Why would I need to kill it? Well, every now and then I screw my filesystem code up and some test might become caught in an endless loop or similar.

The above architecture is slightly more complicated than what Python's standard library's subprocess.Popen was designed to do. It is also beyond the standard library to kill another process (group) with higher privileges. This is where I wrote a few convenience functions, which basically call the appropriate shell commands. subprocess.Popen can be launched with a timeout and so I did: 90 seconds. If the timeout expires, a special exception is raised. If this happens, a somewhat "lengthy" procedure of mine starts for figuring out the process group ID that needs to be killed. Once it has been determined, the corresponding pjdfstest test is killed and its output is collected from stdout and stderr.

Here is where the fun begins: The Python standard library ONLY collects output from stdout and stderr of a sub-process until the timeout expires. Then my "lengthy" killing procedure starts, which takes a couple of 10s to 100s of milliseconds. In the meantime, my pjdfstest test can be able to finish correctly. The return status is ok, I am "just" loosing output.

On Travis, chown/00.t would hit exactly the "sweet spot" of 90 seconds with a 50/50 probability. Reducing the timeout to, say, 85 seconds led to properly failing tests due to a timeout. Increasing the timeout to 95 seconds led to working tests. I increased it to 120 seconds by default.

Thanks for your help. The above write-up hopefully serves to prevent other people from falling into the same trap ...

from pjdfstest.

asomers commented on September 26, 2024

Glad you got it fixed. BTW, we pjdfstest maintainers were just discussing the possibility of rewriting pjdfstest to use GoogleTest (or something else) instead of TAP. What do you think? Do you like TAP and prove, or would something else work better?

from pjdfstest.

s-m-e commented on September 26, 2024

@asomers prove and TAP are bullet-proof, dead simple and easy to modify. Having tests or groups of tests of this kind in simple shell scripts is also nice and simple and, most importantly, allows easy, quick and dirty modifications during long debugging sessions ... meaning: I really like the simplicity of pjdfstest as it is.

My personal preference these days is pytest, even for non-Python projects. But that's just me. It's fairly easy to use, feature-rich and thanks to Python scriptable and extendable. On the other hand, if you know Perl, prove is not wrong either - but it does have less features out of the box than pytest does, as far as I can tell.

TAP as a reporting protocol sucks, no matter what framework you're using. If "sprinkling [scripts] with a lot of todomsg=" and similar workarounds had not been necessary, it would have saved me a lot of time and trouble in the past ;) If you want to keep the test suite as simple as it is, I'd say put some effort into a better reporting protocol, so people can keep writing their own wrappers around it. For convenience or as a proof of concept or as a starting point for your users, you could then offer "consumer" scripts or modules (substituting prove) in Perl, Python and/or other scripting languages which would, through the back door so to speak, offer integration options into many different test frameworks.

from pjdfstest.

ngie-eign commented on September 26, 2024

I think part of the problem with the current design of pjdfstest is that there are test scripts which do far too much (example: check # 867 out of 1000 failed — where is check #867 done?), and so debugging things or marking things as expected failures becomes noisome. This issue is test framework independent.

Also, quite bluntly, TAP is a simple protocol, but it’s pretty ill defined. For example, kyua supports TAP in a limited sense; someone recently posted a bug on the GH page how prove and kyua act differently when executing a test which was broken, suggesting that the introspection and filtering in kyua wasn’t complete.

The nice thing about prove is that (in this case, because shell scripts drive the tests), the dependencies are minimal.

@asomers: I think we should invest in breaking down the tests instead of investing in switching test frameworks, until there comes a point where it logically makes sense to do so from a time perspective.

from pjdfstest.

Seemingly random test issues in chown/00.t on Linux / TravisCI about pjdfstest HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent