Coder Social home page Coder Social logo

Comments (6)

jtaylor-sfdc avatar jtaylor-sfdc commented on July 30, 2024

Yes, from beginning of table to specified end point of 2013-02-17. Were you expecting something else?

from phoenix.

ryang-sfdc avatar ryang-sfdc commented on July 30, 2024

This is probably too speculative, but could region boundaries be employed to provide a lower bound?

from phoenix.

jtaylor-sfdc avatar jtaylor-sfdc commented on July 30, 2024

I don't see how region boundaries help you in this case if date is not the leading part of the key. For example, if you have HOST CHAR(2), EVENT_DATE DATE as your schema, then the MAX(date) could be in any region and in any row, since the order depends on the HOST first. For example:
AA date
AB date - 1
AC date + 1
BA date - 2
ZZ date + 3

In the above case, the date could be anything, and the rows would still sort in that order.
Stats only help in this case to balance the load on parallelization. We couldn't really rely on the max/min in the stats being the answer to a query, because it'll be updated asynchronously at some configurable time interval. Something may have snuck in as min/max after the last stats gathering was done.

from phoenix.

jtaylor-sfdc avatar jtaylor-sfdc commented on July 30, 2024

There is one case I can think of where region boundaries may be helpful. If the HOST value is more like an enum with a few limited values, then the same HOST value might repeat for multiple region boundaries. In that case, for something like MAX(event_date), you could skip regions where HOST repeats until the last one. For example, say the region boundaries are:
NA 11111111
NA 22222222
NA 33333333
TX 11111111
TX 22222222
ZZ 11111111

You could skip the first two NA regions, since you'd know the MAX(event_date) would be in them. Then you could skip the first TX region too. etc.

from phoenix.

ryang-sfdc avatar ryang-sfdc commented on July 30, 2024

In this case, the rowkey is led by the date column. So ideally, only one region needs to be scanned, not the entire table up to the specified ceiling.

The only questionable part for me of depending on region boundaries is how to handle retrying speculation that's been made invalid by splits and whatnot.

from phoenix.

jtaylor-sfdc avatar jtaylor-sfdc commented on July 30, 2024

If date is leading the row key, we can definitely do a better job, but it would really only apply if max is the only thing being selected. Right now for your original query, we'd scan every region up to to_date('2013-02-17 00:00:00'). We really only need to look at the region containing to_date('2013-02-17 00:00:00'). Would be good to generalize this. I'll morph this issue into that.

from phoenix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.