Coder Social home page Coder Social logo

Comments (6)

tdcmeehan avatar tdcmeehan commented on June 29, 2024 1

I think this should be added to our CONTRIBUTING documentation--tests with sleep should be avoided as there are unpredictable GC or noisy neighbor problems in CI.

from presto.

elharo avatar elharo commented on June 29, 2024

Hypothesis: (just a guess so far) we might be comparing a set like a list. That is, the expected and actual values might some time match up and might sometimes be shuffled with respect to each other.

from presto.

aaneja avatar aaneja commented on June 29, 2024

The issue here is that, on occasion, the background thread that polls the mocked quick stats provider, gets blocked.
By running the same test 100 times (invocationCount=100) I see occasional invocations stall, and jconsole thread dump returns

Name: quick-stats-bg-fetch-0
State: RUNNABLE
Total blocked: 0  Total waited: 0

Stack trace: 
java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1117)
   - locked java.util.concurrent.ConcurrentHashMap$ReservationNode@352ea8e
java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
com.facebook.presto.hive.statistics.QuickStatsProvider.lambda$null$5(QuickStatsProvider.java:241)
com.facebook.presto.hive.statistics.QuickStatsProvider$$Lambda$571/1790495455.accept(Unknown Source)
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:792)
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2153)
com.facebook.presto.hive.statistics.QuickStatsProvider.lambda$getQuickStats$7(QuickStatsProvider.java:241)
com.facebook.presto.hive.statistics.QuickStatsProvider$$Lambda$566/437542280.apply(Unknown Source)
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   - locked java.util.concurrent.ConcurrentHashMap$ReservationNode@352ea8e
com.facebook.presto.hive.statistics.QuickStatsProvider.getQuickStats(QuickStatsProvider.java:236)
com.facebook.presto.hive.statistics.QuickStatsProvider.lambda$getQuickStats$3(QuickStatsProvider.java:162)
com.facebook.presto.hive.statistics.QuickStatsProvider$$Lambda$564/1723518396.get(Unknown Source)
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
com.facebook.airlift.concurrent.BoundedExecutor$$Lambda$565/2075352262.run(Unknown Source)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:750)

I'm looking into workarounds for this

from presto.

elharo avatar elharo commented on June 29, 2024

I continue see flakes in this test. E.g. at https://productionresultssa13.blob.core.windows.net/actions-results/a2475549-d728-4cb0-8ead-5e5ea24e5bc1/workflow-job-run-620a21eb-788c-555f-a35f-b85781ff7ced/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-03-07T19%3A29%3A27Z&sig=6YbzHAJk3vyvwqB58ac5lpznqo7up6ooq49VqdxWWRY%3D&sp=r&spr=https&sr=b&st=2024-03-07T19%3A19%3A22Z&sv=2021-12-02

2024-03-07T15:25:07.0471788Z [ERROR] Tests run: 2940, Failures: 1, Errors: 0, Skipped: 90, Time elapsed: 2,732.953 s <<< FAILURE! - in TestSuite
2024-03-07T15:25:07.0474052Z [ERROR] com.facebook.presto.hive.statistics.TestQuickStatsProvider.quickStatsBuildTimeIsBounded  Time elapsed: 0.531 s  <<< FAILURE!
2024-03-07T15:25:07.0492214Z java.lang.AssertionError: expected [PartitionStatistics{basicStatistics=HiveBasicStatistics{fileCount=OptionalLong[42], rowCount=OptionalLong[4242], inMemoryDataSizeInBytes=OptionalLong.empty, onDiskDataSizeInBytes=OptionalLong.empty}, columnStatistics={column=HiveColumnStatistics{integerStatistics=Optional[IntegerStatistics{min=OptionalLong[-2147483648], max=OptionalLong[2147483647]}], doubleStatistics=Optional.empty, decimalStatistics=Optional.empty, dateStatistics=Optional.empty, booleanStatistics=Optional.empty, maxValueSizeInBytes=OptionalLong.empty, totalSizeInBytes=OptionalLong.empty, nullsCount=OptionalLong[0], distinctValuesCount=OptionalLong.empty}}}] but found [PartitionStatistics{basicStatistics=HiveBasicStatistics{fileCount=OptionalLong.empty, rowCount=OptionalLong.empty, inMemoryDataSizeInBytes=OptionalLong.empty, onDiskDataSizeInBytes=OptionalLong.empty}, columnStatistics={}}]
2024-03-07T15:25:07.0499784Z 	at org.testng.Assert.fail(Assert.java:110)
2024-03-07T15:25:07.0500904Z 	at org.testng.Assert.failNotEquals(Assert.java:1413)
2024-03-07T15:25:07.0501779Z 	at org.testng.Assert.assertEqualsImpl(Assert.java:149)
2024-03-07T15:25:07.0502634Z 	at org.testng.Assert.assertEquals(Assert.java:131)
2024-03-07T15:25:07.0506246Z 	at org.testng.Assert.assertEquals(Assert.java:643)
2024-03-07T15:25:07.0507905Z 	at com.facebook.presto.hive.statistics.TestQuickStatsProvider.quickStatsBuildTimeIsBounded(TestQuickStatsProvider.java:365)
2024-03-07T15:25:07.0509695Z 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2024-03-07T15:25:07.0514457Z 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2024-03-07T15:25:07.0519030Z 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2024-03-07T15:25:07.0520336Z 	at java.lang.reflect.Method.invoke(Method.java:498)
2024-03-07T15:25:07.0524594Z 	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
2024-03-07T15:25:07.0528477Z 	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
2024-03-07T15:25:07.0530076Z 	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
2024-03-07T15:25:07.0531708Z 	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
2024-03-07T15:25:07.0533397Z 	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
2024-03-07T15:25:07.0538402Z 	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
2024-03-07T15:25:07.0543089Z 	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
2024-03-07T15:25:07.0545038Z 	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
2024-03-07T15:25:07.0546656Z 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2024-03-07T15:25:07.0548239Z 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2024-03-07T15:25:07.0549508Z 	at java.lang.Thread.run(Thread.java:750)
2024-03-07T15:25:07.0550300Z 

from presto.

aaneja avatar aaneja commented on June 29, 2024

@elharo The call stack show that this failing test is the quickStatsBuildTimeIsBounded, which is different than the one before (testReadThruCaching). We can use this issue to track this failure too, just wanted to point out the difference

In general -
I think the issue here is that with a VM based runner that can experience GC pauses, it's hard to predict a safe upper bound wait time to test asynchronous behavior. I think my only recourse is to rework the tests and try to make some weaker assertions, while still providing sufficient coverage. I will work on this

In the meanwhile, if you do see any test failures from any Github runners, please link it on this issue with the GH actions link, so I can track all failure points

from presto.

aaneja avatar aaneja commented on June 29, 2024

I haven't seen any CI failures for these quick stats tests in a while, so closing this issue.

Please feel free to open if you see in any Presto Githbub CI failures on mainline

from presto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.