Coder Social home page Coder Social logo

Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between about druid HOT 7 OPEN

stamboli avatar stamboli commented on June 13, 2024
Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between

from druid.

Comments (7)

abhishekagarwal87 avatar abhishekagarwal87 commented on June 13, 2024

was the approximate distinct count turned off when you ran this query?

from druid.

stamboli avatar stamboli commented on June 13, 2024

from druid.

stamboli avatar stamboli commented on June 13, 2024

environment.txt

from druid.

abhishekagarwal87 avatar abhishekagarwal87 commented on June 13, 2024

can you set druid.sql.planner.useGroupingSetForExactDistinct to true and see if that fixes the issue? This bug might be same as what's being discussed here - apache/calcite#3735 (comment)

from druid.

abhishekagarwal87 avatar abhishekagarwal87 commented on June 13, 2024

Though I am surprised how did this query even work in 25.0.0 without you setting druid.sql.planner.useGroupingSetForExactDistinct. It would have failed outright.

from druid.

stamboli avatar stamboli commented on June 13, 2024

No luck with this setting too :(
Surprisingly as explained above even without this flag or so single aggregation at a time works

from druid.

stamboli avatar stamboli commented on June 13, 2024

Looking at your test case I formed query based on it which works.
SELECT
COUNT(DISTINCT "City") FILTER (WHERE ("SampleSaleData"."__time" >= '2022-01-12T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-13T00:00:00.000Z')) AS "P2-DistinctCities",
COUNT(DISTINCT "City") FILTER (WHERE ("SampleSaleData"."__time" >= '2022-01-05T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-06T00:00:00.000Z')) AS "P2-DistinctCities"
FROM
SampleSaleData "SampleSaleData"

But this query, very specific to druid. The solution we are building need to work with multiple DB this query is not working with MySQL as well as Snowflake, the queries are built dynamically, so a very specific druid SQL needs to be built. Until now CASE based query was worked with other traditional standard DBs used to work with druid as well.
So overall now this is failure is specifically related to CASE statements with multiple such aggregations

from druid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.