Coder Social home page Coder Social logo

Comments (19)

inferno-chromium avatar inferno-chromium commented on August 19, 2024 1

filed tracking bugs, thanks!

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

I think our background processes might need to be throttled, and need to support this on both linux and mac.

from clusterfuzz.

jfoote avatar jfoote commented on August 19, 2024

I think I may have run into a related issue where polymer-bundler was causing a lightweight system to hang (GCE n1-standard-1, 1 vCPU, 3.75 GB memory). I removed the & here to cause the invocations to run serially as a workaround FWIW.

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

this should be fixed soon, Marty is looking into this.

from clusterfuzz.

mbarbella-chromium avatar mbarbella-chromium commented on August 19, 2024

This should be addressed by #146. Let me know if you still have any issues.

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

Speaking of OOMs, the python unit tests run in verbose mode (python butler.py py_unittest -t appengine -v) seem to get stuck and eventually killed by the OOM killer on the machine I mentioned in #134:

ok
test_lease_reach_latest (tests.appengine.common.tasks_test.LeaseTaskTest)
Test leasing a task and reaching the maximum lease time. ... WARNING:root:suspended generator _get_tasklet(context.py:344) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
WARNING:root:suspended generator get(context.py:760) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
Killed
Out of memory: Kill process 12437 (python) score 471 or sacrifice child
Out of memory: Kill process 12437 (python) score 489 or sacrifice child
Out of memory: Kill process 12437 (python) score 607 or sacrifice child

I'm not sure if it makes sense to fix it though because in general it doesn't seem to be a very good idea to try to run ClusterFuzz on lightweight machines.

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

2 things:

  1. For fuzzing, ClusterFuzz always runs on lightweight machines. Our 25,000 vms are all n1-standard-1 (1core, 4gb) linux and win is like n1-standard-2 (2core, 8 gb). There is no issue there at all and it does not run anything heavy wight there.

  2. This point you mentioned is about local development, like installing dependencies, running tests, deploying clusterfuzz. we tested even that with n1-standard-1 gce vm and everything still works fine.
    Something is up with virtualbox which we didn't test. Basically, any OOM killer stuff can be problematic, i am wondering what was your vm config ? number of cores ? memory ?
    python butler.py py_unittest -t appengine -v without the -m runs tests serially (not parallely), so that crashing makes me really doubt what is going on with that OOM killer. can you debug more ?

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

I'll try to take a closer look later. What I know so far is that the tests run in parallel mode (that is, with -m) pass too as long as -v is not used so it's "-v" that somehow triggers it all (I started to use "-v" to figure out why 16 tests failed. As it turned out, "jinja" couldn't find templates that I assume weren't there due to the "polymer-bundler" issue even though "python polymer_bundler.py" run at the beginning made sure that "App Engine templates are up to date". I just destroyed the machine, launched a new one (which had 4GB RAM and 2 CPUs), built everything from scratch and ran the tests again. They passed without "-v".

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

I got rid of the global logger by removing everything apart from return logging.getLogger() from get_logger and 2 tests started to fail properly. Then I found out that BOT_NAME should be set explicitly so that "heartbeat" (I don't know what that is) won't fail due to "Key path element must not be incomplete". With BOT_NAME set explicitly just one test failed:

FAIL: test_lease_reach_latest (tests.appengine.common.tasks_test.LeaseTaskTest)
Test leasing a task and reaching the maximum lease time.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vagrant/clusterfuzz/src/python/tests/appengine/common/tasks_test.py", line 207, in test_lease_reach_latest
    self.assertEqual(3, message.modify_ack_deadline.call_count)
AssertionError: 3 != 2

I have no idea what I was doing but it seems to me that it has nothing to do with Virtualbox and the OOM killer. Looks like the global logger along with a couple of tests throwing exceptions in different threads causes this.

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

Please see #154, i proposed this doc fix, we don't recommend running -u/-v with the -m switch. something breaks with logger in verbose mode with parallel test execution (that mode is untested and not recommended). -u/-v is only needed when a test fails, and recommend to use without -m to see which test was it. use -p to give regex for quick reproduction.

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

I didn't use -v with -m. I ran the tests with -m once only to make sure that they didn't get stuck and trigger the OOM killer (and they didn't).

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

More precisely, I ran python butler.py py_unittest -t appengine -v. And it works just fine without -v. And the tests don't get stuck if the global logger is removed and -v is used.

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

Regarding -p, it doesn't help much

$ python2 ./butler.py py_unittest -t appengine -p '*tasks_test*' -v
...
test_lease_exception (tests.appengine.common.tasks_test.LeaseTaskTest)
Test lease with an exception during the task. ... WARNING:root:suspended generator _get_tasklet(context.py:344) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
WARNING:root:suspended generator get(context.py:760) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
ok
test_lease_finish_before_latest (tests.appengine.common.tasks_test.LeaseTaskTest)
Test leasing a task and finishing before the latest time. ... WARNING:root:suspended generator _get_tasklet(context.py:344) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
WARNING:root:suspended generator get(context.py:760) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
ok
test_lease_reach_latest (tests.appengine.common.tasks_test.LeaseTaskTest)
Test leasing a task and reaching the maximum lease time. ... WARNING:root:suspended generator _get_tasklet(context.py:344) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
WARNING:root:suspended generator get(context.py:760) raised BadRequestError(Error code: INVALID_ARGUMENT. Message: Key path element must not be incomplete: [Heartbeat: ])
Killed

But with BOT_NAME set explicitly the tests just fail

BOT_NAME=bot python2 ./butler.py py_unittest -t appengine -p '*tasks_test*' -v
...
test_invalid_task (tests.appengine.common.tasks_test.RedoTestcaseTest)
Raise an exception on an invalid task. ... ok

======================================================================
FAIL: test_lease_reach_latest (tests.appengine.common.tasks_test.LeaseTaskTest)
Test leasing a task and reaching the maximum lease time.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vagrant/clusterfuzz/src/python/tests/appengine/common/tasks_test.py", line 207, in test_lease_reach_latest
    self.assertEqual(3, message.modify_ack_deadline.call_count)
AssertionError: 3 != 2

----------------------------------------------------------------------
Ran 14 tests in 3.952s

FAILED (failures=1, skipped=5)

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

Thanks for the last comment, we will take a look. filed #155.

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

#156 should fix it, found root cause. thanks for the BOT_NAME tip.

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

@evverx - can you try #156, found even the reason for the 3!=2 flake. we noticed it on our CI as well (didnt see locally due to flake).

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

@inferno-chromium thank you!

I've just removed the "global logger" kludge, pulled #156 and run

python2 ./butler.py py_unittest -t appengine
python2 ./butler.py py_unittest -t appengine -v
BOT_NAME=bot python2 ./butler.py py_unittest -t appengine
BOT_NAME=bot python2 ./butler.py py_unittest -t appengine -v

All the tests have passed. python2 ./butler.py py_unittest -t appengine -m -v seems to be working too.

from clusterfuzz.

inferno-chromium avatar inferno-chromium commented on August 19, 2024

Thanks a lot @evverx, really appreciate your efforts in trying ClusterFuzz, reporting bugs and taking the extra step with debugging help!

from clusterfuzz.

evverx avatar evverx commented on August 19, 2024

I'm not sure if this is helpful, but there're two other issues I've come across (that I don't think need fixing):

  • Several "timeout" tests appear to be flaky especially in verbose mode especially in VMs, for example, test_results_no_timeout fails from time to time with
FAIL: test_results_no_timeout (tests.core.system.new_process_test.PosixProcessTest)
Test process execution results.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vagrant/clusterfuzz/src/python/tests/core/system/new_process_test.py", line 128, in test_results_no_timeout
    self.assertLess(abs(results.time_executed - 1.0), self.TIME_ERROR)
AssertionError: 0.11416292190551758 not less than 0.1

----------------------------------------------------------------------
Ran 12 tests in 5.461s
  • python2 ./butler.py go_unittest tries to use GCP even when INTEGRATION isn't set and fails with
Running: gsutil defstorageclass get gs://clusterfuzz-testing-vagrant || gsutil mb -p clusterfuzz-testing gs://clusterfuzz-testing-vagrant
| BucketNotFoundException: 404 gs://clusterfuzz-testing-vagrant bucket does not exist.
| Creating gs://clusterfuzz-testing-vagrant/...
| AccessDeniedException: 403 *@gmail.com does not have storage.buckets.create access to project 472119376969.
| Return code is non-zero (1).
| Exit.

in "local" mode.

from clusterfuzz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.