Comments (4)
At a high level, we'll provide a way of declaring that a table is "salted" in our DDL and then transparently insert a byte at the beginning of the row key on upsert. For any query, we'll insert an "or" of each possible bucket value where we form the start/stop of the scan. This will cause the query to get run everywhere. We'll have a special case when we know the full row key not to do this, but to insert the correct byte.
Detail on how to implement this:
-
Define a SALT_BUCKETS property that can be defined in the CREATE TABLE statement to declare how many buckets there are:
CREATE TABLE foo ( host VARCHAR, date DATE CONSTRAINT pk PRIMARY KEY (host, date)) SALT_BUCKETS=20
-
Pull this out of the properties Map in MetaDataClient.createTable and create a new SALT_BUCKETS column in SYSTEM.TABLE to store this.
-
On any Put (PRowImpl), hash the row key mod-ed by the saltBuckets and store this as the first byte of the row key.
-
When we compute what the start/stop scan key is (WhereOptimizer), first add an "or" case (i.e. a List<byte[]> of each possible bucket byte, using the new mechanism being added by @ryang-sfdc for this issue) before we start computing the key. Include a special case at the end for the case of a fully qualified row key, since we don't need the "or" in that case.
-
In PTable, add a single byte PColumn as the first column for the salt bucket value.
-
In RowProjector, add one to the index passed in to skip over the salt bucket column.
Down the road, we can allow ALTER TABLE to increase, but not decrease the number of salt buckets. This would make is so that a point get would need to go everywhere too, though, so it's debatable that we'd want to allow it.
from phoenix.
Hi James, couple questions for this:
- Are we still going to guarantee that the order of return is the same as the order of insertion for a table with buckets turned on?
- Do we need to do anything when the statement has no "where" clause with it?
from phoenix.
Good questions. No, I think we can return the rows in any order (otherwise, we'd have to sort as you've alluded to which would be too expensive. If the user wanted them sorted, then could add an order by clause.
With no where clause, as far as I can think, I don't think we'd need to do anything.
from phoenix.
Implemented now. Great job, @tonyhuang
from phoenix.
Related Issues (20)
- Got TableNotFoundException when upgrading from Phoenix 2.2.x HOT 1
- Not able to see the table in hbase
- java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
- encounter Error: (state=08000,code=101), when we create index on bigtable HOT 19
- Creating Empty Column Families with CREATE Table HOT 1
- why am i select data slow? HOT 2
- Add REGEXP_LIKE buit-in function HOT 2
- When I run the Phoenix over my hbase cluster I meet the warning below HOT 2
- phoenix map hbase table , phoenix data content is not correct HOT 1
- ERROR 1012 (42M03): Table undefined HOT 6
- How to use phoenuix to map to an Existing HBase Table
- what situation does index works? HOT 1
- how can i use UPSERT VALUES? HOT 1
- Exception on upserting data on table with using upsert select
- Query a Secure HBase cluster through Phoenix In Java code HOT 2
- Offtopic Question: Bloom Filter Implementation In Apex
- Phoenix View for HBase is not updating
- Operations on table throw exception: ArrayIndexOutOfBoundsException & DoNotRetryIOException
- Phoenix issue-Distribution-IBM BigInsights- Hbase(1.1.1)-Phoenix 4.7 HOT 1
- Phoenix View on pre-existing HBase namespace table? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phoenix.