Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between about druid HOT 7 OPEN

stamboli commented on June 13, 2024

Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between

from druid.

Comments (7)

abhishekagarwal87 commented on June 13, 2024

was the approximate distinct count turned off when you ran this query?

from druid.

stamboli commented on June 13, 2024

Yes In environment file I have druid_sql_planner_useApproximateCountDistinct=false

…

________________________________ From: Abhishek Agarwal ***@***.***> Sent: 20 March 2024 22:13 To: apache/druid ***@***.***> Cc: stamboli ***@***.***>; Author ***@***.***> Subject: Re: [apache/druid] Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between (Issue #16178) was the approximate distinct count turned off when you ran this query? — Reply to this email directly, view it on GitHub<#16178 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAXRAAJMXUTCYDF5CSWOGKTYZG4E3AVCNFSM6AAAAABE72CV22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGAZDSMJVGU>. You are receiving this because you authored the thread.Message ID: ***@***.***>

from druid.

stamboli commented on June 13, 2024

environment.txt

from druid.

abhishekagarwal87 commented on June 13, 2024

can you set druid.sql.planner.useGroupingSetForExactDistinct to true and see if that fixes the issue? This bug might be same as what's being discussed here - apache/calcite#3735 (comment)

from druid.

abhishekagarwal87 commented on June 13, 2024

Though I am surprised how did this query even work in 25.0.0 without you setting druid.sql.planner.useGroupingSetForExactDistinct. It would have failed outright.

from druid.

stamboli commented on June 13, 2024

No luck with this setting too :(
Surprisingly as explained above even without this flag or so single aggregation at a time works

from druid.

stamboli commented on June 13, 2024

Looking at your test case I formed query based on it which works.
SELECT
COUNT(DISTINCT "City") FILTER (WHERE ("SampleSaleData"."__time" >= '2022-01-12T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-13T00:00:00.000Z')) AS "P2-DistinctCities",
COUNT(DISTINCT "City") FILTER (WHERE ("SampleSaleData"."__time" >= '2022-01-05T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-06T00:00:00.000Z')) AS "P2-DistinctCities"
FROM
SampleSaleData "SampleSaleData"

But this query, very specific to druid. The solution we are building need to work with multiple DB this query is not working with MySQL as well as Snowflake, the queries are built dynamically, so a very specific druid SQL needs to be built. Until now CASE based query was worked with other traditional standard DBs used to work with druid as well.
So overall now this is failure is specifically related to CASE statements with multiple such aggregations

from druid.

Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between about druid HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent