Coder Social home page Coder Social logo

Comments (4)

jtaylor-sfdc avatar jtaylor-sfdc commented on July 30, 2024

At a high level, we'll provide a way of declaring that a table is "salted" in our DDL and then transparently insert a byte at the beginning of the row key on upsert. For any query, we'll insert an "or" of each possible bucket value where we form the start/stop of the scan. This will cause the query to get run everywhere. We'll have a special case when we know the full row key not to do this, but to insert the correct byte.

Detail on how to implement this:

  1. Define a SALT_BUCKETS property that can be defined in the CREATE TABLE statement to declare how many buckets there are:

    CREATE TABLE foo (
        host VARCHAR,
        date DATE
        CONSTRAINT pk PRIMARY KEY (host, date))
    SALT_BUCKETS=20
    
  2. Pull this out of the properties Map in MetaDataClient.createTable and create a new SALT_BUCKETS column in SYSTEM.TABLE to store this.

  3. On any Put (PRowImpl), hash the row key mod-ed by the saltBuckets and store this as the first byte of the row key.

  4. When we compute what the start/stop scan key is (WhereOptimizer), first add an "or" case (i.e. a List<byte[]> of each possible bucket byte, using the new mechanism being added by @ryang-sfdc for this issue) before we start computing the key. Include a special case at the end for the case of a fully qualified row key, since we don't need the "or" in that case.

  5. In PTable, add a single byte PColumn as the first column for the salt bucket value.

  6. In RowProjector, add one to the index passed in to skip over the salt bucket column.

Down the road, we can allow ALTER TABLE to increase, but not decrease the number of salt buckets. This would make is so that a point get would need to go everywhere too, though, so it's debatable that we'd want to allow it.

from phoenix.

tonyhuang avatar tonyhuang commented on July 30, 2024

Hi James, couple questions for this:

  1. Are we still going to guarantee that the order of return is the same as the order of insertion for a table with buckets turned on?
  2. Do we need to do anything when the statement has no "where" clause with it?

from phoenix.

jtaylor-sfdc avatar jtaylor-sfdc commented on July 30, 2024

Good questions. No, I think we can return the rows in any order (otherwise, we'd have to sort as you've alluded to which would be too expensive. If the user wanted them sorted, then could add an order by clause.
With no where clause, as far as I can think, I don't think we'd need to do anything.

from phoenix.

jtaylor-sfdc avatar jtaylor-sfdc commented on July 30, 2024

Implemented now. Great job, @tonyhuang

from phoenix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.