Comments (7)
In Presto the buffer is not deleted on abort and getData requests for an aborted buffer are allowed.
I see. It would be nice to update documentation in https://prestodb.io/docs/current/develop/worker-protocol.html to clarify that.
Let me check facebookincubator/velox#9207
from presto.
A Velox problem?
[ERROR] com.facebook.presto.nativeworker.TestPrestoNativeGeneralQueriesThrift.testSystemTables Time elapsed: 0.877 s <<< FAILURE!
java.lang.AssertionError: Execution of 'actual' query failed: SELECT * FROM (SELECT DISTINCT regionkey FROM tmp_presto_8d2fe12547ca472093a6cdf415f47510) t INNER JOIN (SELECT regionkey FROM "tmp_presto_8d2fe12547ca472093a6cdf415f47510$partitions") p ON t.regionkey = p.regionkey
at org.testng.Assert.fail(Assert.java:98)
at com.facebook.presto.tests.QueryAssertions.assertQuery(QueryAssertions.java:178)
at com.facebook.presto.tests.QueryAssertions.assertQuery(QueryAssertions.java:106)
at com.facebook.presto.tests.AbstractTestQueryFramework.assertQuery(AbstractTestQueryFramework.java:154)
at com.facebook.presto.nativeworker.AbstractTestNativeGeneralQueries.testSystemTables(AbstractTestNativeGeneralQueries.java:1448)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: buffer != nullptr getData received after its buffer is deleted. Destination: 3, sequence: 0
at com.facebook.presto.tests.AbstractTestingPrestoClient.execute(AbstractTestingPrestoClient.java:124)
at com.facebook.presto.tests.DistributedQueryRunner.execute(DistributedQueryRunner.java:733)
at com.facebook.presto.tests.DistributedQueryRunner.execute(DistributedQueryRunner.java:701)
at com.facebook.presto.tests.QueryAssertions.assertQuery(QueryAssertions.java:175)
... 18 more
Caused by: VeloxRuntimeError: buffer != nullptr getData received after its buffer is deleted. Destination: 3, sequence: 0
at Unknown.# 0 _ZN8facebook5velox7process10StackTraceC1Ei(Unknown Source)
at Unknown.# 1
from presto.
It looks like a real problem in Velox / Prestissimo
I managed to capture a following sequence of events:
- A downstream task is scheduled before an upstream task is scheduled
- A downstream task sends a
getResult
request. Since the upstream task doesn't exist yet the request is getting parked and put on hold: https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/TaskManager.cpp#L875 - When an upstream task is scheduled it tries to fulfill pending
getResult
requests: https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/TaskManager.cpp#L524 - However in a meantime the downstream task has finished and sent a
abort
for it's buffers: https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/TaskManager.cpp#L328 - The
getDataForResultRequests
fails ongetData
Full stack trace:
VeloxRuntimeError: buffer != nullptr getData received after its buffer is deleted. Destination: 15, sequence: 0
at Unknown.# 0 facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)(Unknown Source)
at Unknown.# 1 void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)(Unknown Source)
at Unknown.# 2 facebook::velox::exec::OutputBuffer::getData(int, unsigned long, long, std::function<void (std::vector<std::unique_ptr<folly::IOBuf, std::default_delete<folly::IOBuf> >, std::allocator<std::unique_ptr<folly::IOBuf, std::default_delete<folly::IOBuf> > > >, long, std::vector<long, std::allocator<long> >)>, std::function<bool ()>)(Unknown Source)
at Unknown.# 3 facebook::velox::exec::OutputBufferManager::getData(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, unsigned long, long, std::function<void (std::vector<std::unique_ptr<folly::IOBuf, std::default_delete<folly::IOBuf> >, std::allocator<std::unique_ptr<folly::IOBuf, std::default_delete<folly::IOBuf> > > >, long, std::vector<long, std::allocator<long> >)>, std::function<bool ()>)(Unknown Source)
at Unknown.# 4 facebook::presto::(anonymous namespace)::getData(std::shared_ptr<facebook::presto::PromiseHolder<std::unique_ptr<facebook::presto::Result, std::default_delete<facebook::presto::Result> > > >, std::weak_ptr<facebook::presto::http::CallbackRequestHandlerState>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, long, facebook::presto::protocol::DataSize, facebook::velox::exec::OutputBufferManager&)(Unknown Source)
at Unknown.# 5 facebook::presto::TaskManager::getDataForResultRequests(std::unordered_map<long, std::shared_ptr<facebook::presto::ResultRequest>, std::hash<long>, std::equal_to<long>, std::allocator<std::pair<long const, std::shared_ptr<facebook::presto::ResultRequest> > > > const&)(Unknown Source)
at Unknown.# 6 facebook::presto::TaskManager::createOrUpdateTask(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, facebook::velox::core::PlanFragment const&, std::vector<facebook::presto::protocol::TaskSource, std::allocator<facebook::presto::protocol::TaskSource> > const&, facebook::presto::protocol::OutputBuffers const&, std::shared_ptr<facebook::velox::core::QueryCtx>, long)(Unknown Source)
at Unknown.# 7 facebook::presto::TaskManager::createOrUpdateTask(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, facebook::presto::protocol::TaskUpdateRequest const&, facebook::velox::core::PlanFragment const&, std::shared_ptr<facebook::velox::core::QueryCtx>, long)(Unknown Source)
at Unknown.# 8 std::_Function_handler<std::unique_ptr<facebook::presto::protocol::TaskInfo, std::default_delete<facebook::presto::protocol::TaskInfo> > (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long), facebook::presto::TaskResource::createOrUpdateTask(proxygen::HTTPMessage*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)::$_18>::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long&&)(Unknown Source)
at Unknown.# 9 _ZN5folly6detail8function14FunctionTraitsIFvRNS_7futures6detail8CoreBaseEONS_8Executor9KeepAliveIS7_EEPNS_17exception_wrapperEEE7callBigIZNS4_4CoreINS_4UnitEE11setCallbackIZNS4_10FutureBaseISH_E18thenImplementationIZNOS_6FutureISH_E9thenValueIZNS_3viaIZZN8facebook6presto12TaskResource22createOrUpdateTaskImplEPN8proxygen11HTTPMessageERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS13_EERKSt8functionIFSt10unique_ptrINSS_8protocol8TaskInfoESt14default_deleteIS1B_EERKS13_S1G_lEEENK3$_2clESW_RKSX_IS19_INS_5IOBufES1C_IS1M_EESaIS1O_EEPNSU_15ResponseHandlerESt10shared_ptrINSS_4http27CallbackRequestHandlerStateEEEUlvE_EENSN_INS_20isFutureOrSemiFutureIDTclclsr3stdE7declvalIT_EEEEE5InnerEEES9_OS21_EUlS26_E_EENSN_INS4_19valueCallableResultISH_S21_E10value_typeEEES26_EUlSA_ONS_3TryISH_EEE_NS4_25tryExecutorCallableResultISH_S2F_vEEEENSt9enable_ifIXntsrNT0_13ReturnsFutureE5valueENS2J_6ReturnEE4typeES26_S2J_NS4_18InlineContinuationEEUlSA_S2E_E_EEvS26_OS1V_INS_14RequestContextEES2N_EUlS6_SA_SC_E_EEvS6_SA_SC_RNS1_4DataE(Unknown Source)
at Unknown.# 10 void folly::detail::function::FunctionTraits<void ()>::callSmall<folly::Executor::KeepAlive<folly::Executor>::add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0>(folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0&&) &&::{lambda()#1}>(folly::detail::function::Data&)(Unknown Source)
at Unknown.# 11 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&)(Unknown Source)
at Unknown.# 12 folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)(Unknown Source)
at Unknown.# 13 void folly::detail::function::FunctionTraits<void ()>::callSmall<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&)(Unknown Source)
at Unknown.# 14 execute_native_thread_routine(Unknown Source)
at Unknown.# 15 start_thread(Unknown Source)
at Unknown.# 16 __clone3(Unknown Source)
from presto.
In Presto the buffer is not deleted on abort
and getData
requests for an aborted buffer are allowed. Same can be done in Velox (facebookincubator/velox#9207). It looks like some of the other operations are already allowed to come out of order (for example an acknowledge after a delete https://github.com/facebookincubator/velox/blob/main/velox/exec/OutputBuffer.cpp#L636).
An alternative would be to try to abort pending requests, but it might be more difficult due to inherent "raciness" of these operations. Also it is unclear if there are other sequences of requests possible that can lead to a query failure.
@mbasmanova @xiaoxmeng Thoughts?
from presto.
I'm seeing this issue in production from time to time as well: 20240324_170232_00988_y57ss
from presto.
In Presto the buffer is not deleted on abort and getData requests for an aborted buffer are allowed.
I see. It would be nice to update documentation in https://prestodb.io/docs/current/develop/worker-protocol.html to clarify that.
Noted, thanks!
from presto.
This is a known issue that happens ~10 times a day with failure message VeloxRuntimeError: buffer != nullptr getData received after its buffer is deleted. Destination: <N>, sequence: 0
where N is a integer. Thanks @arhimondr for debugging this complex issue.
from presto.
Related Issues (20)
- CI job SingleStore tests failing HOT 1
- Add ARM64 Support for Building Prestissimo Docker Image on Mac M1 HOT 3
- Backport https://github.com/prestodb/presto/pull/22926 into 0.285, 0.286 and 0.287
- Pushdown (partial) rowNumber under join
- Flaky test: TestMemoryManager.testReservedPoolDisabledMultiCoordinator
- Add documentation for Geospatial types in main types page HOT 1
- For each agg function with input param as <T>, Add an equivalent agg function with input param as array<T>
- [docs] Combine the descriptions of session property with configuration property for history based optimization
- singlestore-dockerized-tests job is failing often HOT 1
- Getting error while building in intelli idea HOT 1
- How to build a custom connector?
- How to build and run presto in intellij idea? HOT 1
- [native] Flaky test Taskmanager.buildSpilledDirectory Failrue
- Writer scaling fails for Parquet with smaller files HOT 5
- Flaky test: TestNoisySumGaussianLongAggregation.testNoisySumGaussianLongClippingSomeNoiseScaleWithinSomeStd() HOT 1
- Iceberg $changelog read fails on table with only one snapshot version.
- Pushdown partial TopN and RowNumber into UNION
- Inline cosntant cross joins
- Allow Presto Coordinator to ignore (not throw) negative runtime metrics.
- Update the MongoDB connector to support binData data type HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from presto.