Coder Social home page Coder Social logo

phoenix's Introduction

logo

Phoenix is a SQL skin over HBase, delivered as a client-embedded JDBC driver, powering the HBase use cases at Salesforce.com. Phoenix targets low-latency queries (milliseconds), as opposed to batch operation via map/reduce.

We've moved to Apache

Phoenix is now an Apache incubator project. See the announcement here and come visit our new home here.

phoenix's People

Contributors

aaraujo avatar anoopsjohn avatar apurtell avatar arunsingh16 avatar colorant avatar elilevine avatar fakeb0b0b0b avatar haitaoyao avatar ivarley avatar james-taylor avatar jamesrtaylor avatar jmlvanre avatar jtaylor-sfdc avatar jyates avatar kutschm avatar lhofhansl avatar mravi avatar mujtabachohan avatar ndimiduk avatar prashantkommireddi avatar ryang-sfdc avatar saikiran2012 avatar samarthjain avatar simontoens avatar srau avatar svc-scm avatar tonyhuang avatar zizon avatar zmehra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phoenix's Issues

slow query

SELECT max(val) FROM mytable WHERE date >= TO_DATE(?) AND date <= TO_DATE(?) AND host LIKE 'baz-%' AND host NOT LIKE '%foobar%' GROUP BY ROUND(date,'minute',?)

Row key is composed of a non-nullable host followed by a date.

Compile WHERE k IN (1,2,3) into batched get

When an IN (or the equivalent OR) appears in a query using the leading row key columns, compile it into a batched get to more efficiently retrieve the query results. Currently, the scan key will be set to the min value and max value in the IN instead.

Support HBase v. 0.94.0

  1. When use squirrel client, I got these exception:

java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.salesforce.phoenix.exception.PhoenixIOException: ERROR 101 (08000): Unexpected IO exception.
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.awaitConnection(OpenConnectionCommand.java:132)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.access$100(OpenConnectionCommand.java:45)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand$2.run(OpenConnectionCommand.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: com.salesforce.phoenix.exception.PhoenixIOException: ERROR 101 (08000): Unexpected IO exception.
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.executeConnect(OpenConnectionCommand.java:171)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.access$000(OpenConnectionCommand.java:45)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand$1.run(OpenConnectionCommand.java:104)
... 6 more
Caused by: com.salesforce.phoenix.exception.PhoenixIOException: ERROR 101 (08000): Unexpected IO exception.
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:707)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:725)
at com.salesforce.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:324)
at com.salesforce.phoenix.compile.CreateTableCompiler$1.execute(CreateTableCompiler.java:81)
at com.salesforce.phoenix.jdbc.PhoenixStatement$ExecutableCreateTableStatement.executeUpdate(PhoenixStatement.java:274)
at com.salesforce.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:666)
at com.salesforce.phoenix.util.SchemaUtil.initMetaData(SchemaUtil.java:330)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:806)
at com.salesforce.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:151)
at com.salesforce.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:158)
at net.sourceforge.squirrel_sql.fw.sql.SQLDriverManager.getConnection(SQLDriverManager.java:133)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.executeConnect(OpenConnectionCommand.java:167)
... 8 more
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
at org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228)
at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:166)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker.invoke(ExecRPCInvoker.java:79)
at $Proxy7.createTable(Unknown Source)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl$8.call(ConnectionQueryServicesImpl.java:729)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl$8.call(ConnectionQueryServicesImpl.java:726)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$4.call(HConnectionManager.java:1463)
... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at com.salesforce.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:49)
at com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:343)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.regionserver.HRegion.exec(HRegion.java:4770)
at org.apache.hadoop.hbase.regionserver.HRegionServer.execCoprocessor(HRegionServer.java:3457)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376)
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:307)
... 12 more

at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:995)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at $Proxy6.execCoprocessor(Unknown Source)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1.call(ExecRPCInvoker.java:75)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1.call(ExecRPCInvoker.java:73)
at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:163)
... 10 more

Env of test:
hbase-0.94.5 (1 master, 2 regionserver)
squirrel 3.4.0
java 6

  1. When command:
    ./psql.sh localhost ../examples/stock_symbol.sql

I got these info:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

And then does not have any response at all.

I do not know what is the problem. Please give me some advice.

Support the ability for a client to do query more

Clients often need the ability to "page" through query results that have more row results than are displayable on a single screen. SQL provides LIMIT and OFFSET which is often used for this purpose, however supporting OFFSET in a performant manner is not possible with HBase. Instead, an alternate way of supporting query-more functionality (suggested by @lhofhansl) is through row values constructors which has been in the SQL spec since SQL-92. It has PostGres support as well with the benefits explained well here .

Drop table timeout with large number of rows

DROP TABLE times out with large number of rows (million+). Increasing timeout does not seems the right solution as it can take 10+ mins to delete table with couple of million rows. One solution could be to use HBase drop table instead of deleting rows.

Take advantage of region boundaries for min/max queries

Rowkey is led by date non null.

explain select max(date) from foobar where date < to_date('2013-02-17 00:00:00')

[["PLAN"],
["CLIENT PARALLEL 7-WAY RANGE SCAN OVER FOOBAR TO (2013-02-17 00:00:00) EXCLUSIVE"],
[" SERVER FILTER BY FirstKeyOnlyFilter"],
[" SERVER AGGREGATE INTO SINGLE ROW"]]

Does this plan imply we scan from the beginning of the table up to the specified endpoint?

Will not connect to a non-default port

If I try connecting to a ZK node that is not on port 2181, it ignores the connect string's port and still tries using the default 2181.

$ java -jar phoenix-1.0-client.jar zk2:2222 examples/stock_symbol.sql

13/01/30 18:12:42 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk2:2181 sessionTimeout=180000 watcher=hconnection
13/01/30 18:12:42 INFO zookeeper.ClientCnxn: Opening socket connection to server /192.168.0.52:2181
13/01/30 18:12:42 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 25176@nn1
13/01/30 18:12:42 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
13/01/30 18:12:42 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

Use fuzzy row filter to optimize LIKE queries

Use the technique outlined by Alex Baranau in his blog to optimize queries like this:

SELECT * FROM web_stats WHERE domain LIKE 'foo%' AND date >= :1 AND date < :2

assuming PK is domain+date. In this case, the scan would have a start key of 'foo' and a stop key of 'fop' and the filter would

  1. jump to [domain column value] + [:1]
  2. include all while [domain column value] is the same and [date column value] < :2
  3. continue doing the above while more rows

In the case where the cardinality of domains starting with 'foo' are low, and the cardinality of the dates is high, this will improve performance a lot.

Support nested child rows

Unlike with standard relational databases, HBase allows you the flexibility of dynamically creating as many key values in a row as you'd like. Phoenix could leverage this by providing a way to model child rows inside of a parent row. The child row would be comprised of the set of key values whose column qualifier is prefixed with a known name and appended with the primary key of the child row. Phoenix could hide all this complexity, and allow querying over the nested children through joining to the parent row.

Have "phoenix.query.maxGlobalMemoryBytes" equivalent for specifying it as a percentage

This is total amount of memory that all threads may use specified in bytes. This will have to be changed each time, for instance in case new hardware (with more or less RAM) is used. It would be useful to have a property that specifies a percentage of memory instead of fixed bytes.

For eg - phoenix.query.maxGlobalMemoryPercentage=60 (specifies % of total memory all threads may use)

Can not connect Hbase

Dear ,

I use the following command to connect to zk using phoenix. But I can not connect to the hbase and make test.

java -jar phoenix-1.0-client.jar 192.168.1.198:2181 examples/stock_symbol.sql

The following error returned from phoenix.

And when I use SquirreLSQL, It comes the same exception. Please help me find why. Thanks a lot.

Hbase version: 0.94.1
Zk Version: 3.4.3

The exception is as following:

13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:host.name=wbtest02
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_09
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/java/jdk1.7/jre
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:java.class.path=phoenix-1.0-client.jar
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:java.compiler=
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:os.version=3.5.0-17-generic
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/hbase
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.1.198:2181 sessionTimeout=180000 watcher=hconnection
13/02/22 16:53:13 INFO zookeeper.ClientCnxn: Opening socket connection to server /192.168.1.198:2181
13/02/22 16:53:13 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 32682@wbtest02
13/02/22 16:53:13 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
13/02/22 16:53:13 INFO zookeeper.ClientCnxn: Socket connection established to wbtest02/192.168.1.198:2181, initiating session
13/02/22 16:53:13 INFO zookeeper.ClientCnxn: Session establishment complete on server wbtest02/192.168.1.198:2181, sessionid = 0x13ce72e6de4023f, negotiated timeout = 40000
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.1.198:2181 sessionTimeout=180000 watcher=hconnection
13/02/22 16:53:13 INFO zookeeper.ClientCnxn: Opening socket connection to server /192.168.1.198:2181
13/02/22 16:53:13 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
13/02/22 16:53:13 INFO zookeeper.ClientCnxn: Socket connection established to wbtest02/192.168.1.198:2181, initiating session
13/02/22 16:53:13 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 32682@wbtest02
13/02/22 16:53:13 INFO zookeeper.ClientCnxn: Session establishment complete on server wbtest02/192.168.1.198:2181, sessionid = 0x13ce72e6de40240, negotiated timeout = 40000
13/02/22 16:53:13 INFO client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x13ce72e6de4023f
13/02/22 16:53:13 INFO zookeeper.ZooKeeper: Session: 0x13ce72e6de4023f closed
13/02/22 16:53:13 INFO zookeeper.ClientCnxn: EventThread shut down
13/02/22 16:53:13 WARN client.HConnectionManager$HConnectionImplementation: Error executing for row
java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processExecs(HConnectionManager.java:1453)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:605)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:639)
at com.salesforce.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:315)
at com.salesforce.phoenix.compile.CreateTableCompiler$1.execute(CreateTableCompiler.java:78)
at com.salesforce.phoenix.jdbc.PhoenixStatement$ExecutableCreateTableStatement.executeUpdate(PhoenixStatement.java:271)
at com.salesforce.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:657)
at com.salesforce.phoenix.util.SchemaUtil.initMetaData(SchemaUtil.java:314)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:720)
at com.salesforce.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:95)
at com.salesforce.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:100)
at java.sql.DriverManager.getConnection(DriverManager.java:579)
at java.sql.DriverManager.getConnection(DriverManager.java:243)
at com.salesforce.phoenix.util.PhoenixRuntime.main(PhoenixRuntime.java:154)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
at org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228)
at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:166)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker.invoke(ExecRPCInvoker.java:79)
at $Proxy7.createTable(Unknown Source)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl$8.call(ConnectionQueryServicesImpl.java:643)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl$8.call(ConnectionQueryServicesImpl.java:640)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$4.call(HConnectionManager.java:1441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at com.salesforce.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:49)
at com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.regionserver.HRegion.exec(HRegion.java:4887)
at org.apache.hadoop.hbase.regionserver.HRegionServer.execCoprocessor(HRegionServer.java:3478)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:308)
... 11 more

at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy6.execCoprocessor(Unknown Source)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1.call(ExecRPCInvoker.java:75)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1.call(ExecRPCInvoker.java:73)
at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:163)
... 10 more

java.sql.SQLException: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:623)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:639)
at com.salesforce.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:315)
at com.salesforce.phoenix.compile.CreateTableCompiler$1.execute(CreateTableCompiler.java:78)
at com.salesforce.phoenix.jdbc.PhoenixStatement$ExecutableCreateTableStatement.executeUpdate(PhoenixStatement.java:271)
at com.salesforce.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:657)
at com.salesforce.phoenix.util.SchemaUtil.initMetaData(SchemaUtil.java:314)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:720)
at com.salesforce.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:95)
at com.salesforce.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:100)
at java.sql.DriverManager.getConnection(DriverManager.java:579)
at java.sql.DriverManager.getConnection(DriverManager.java:243)
at com.salesforce.phoenix.util.PhoenixRuntime.main(PhoenixRuntime.java:154)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
at org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228)
at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:166)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker.invoke(ExecRPCInvoker.java:79)
at $Proxy7.createTable(Unknown Source)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl$8.call(ConnectionQueryServicesImpl.java:643)
at com.salesforce.phoenix.query.ConnectionQueryServicesImpl$8.call(ConnectionQueryServicesImpl.java:640)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$4.call(HConnectionManager.java:1441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.DoNotRetryIOException: SYSTEM.TABLE: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at com.salesforce.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:49)
at com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.regionserver.HRegion.exec(HRegion.java:4887)
at org.apache.hadoop.hbase.regionserver.HRegionServer.execCoprocessor(HRegionServer.java:3478)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.hbase.regionserver.HRegion.getLock(Ljava/lang/Integer;[BZ)Ljava/lang/Integer; from class com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl
at com.salesforce.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:308)
... 11 more

at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy6.execCoprocessor(Unknown Source)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1.call(ExecRPCInvoker.java:75)
at org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1.call(ExecRPCInvoker.java:73)
at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:163)
... 10 more

Support equi-joins

Support joins, starting with hash joins. Some work on this has already been done. See HashCache* and HashJoiningRegionObserver.

error reporting for unexpected NOT NULL constraint

NOT NULL is not supported for non-rowkey columns. Instead of rejecting a CREATE TABLE statement on the basis of a syntax error, e.g., "expected right parenthesis but found NOT", could a more direct error message be provided?

Support Secondary Indexes

Allow users to create indexes through a new CREATE INDEX DDL command and then behind the scenes build multiple projections of the table (i.e. a copy of the table using re-ordered or different row key columns). Phoenix will take care of maintaining the indexes when DML commands are issued and will choose the best table to use at query time.

Gather and maintain stats for HBase tables in a designated HBase table

Our current stats gathering is way too simplistic - it's only keeping a cache per client connection to a cluster for the min and max key for a table. Instead, we should:

  1. have a system table that stores the stats
  2. create a coprocessor that updates the stats during compaction (i.e. using the preCompactSelection, postCompactSelection, preCompact, postCompact methods)
  3. keep a kind of histogram - the key boundary of every N bytes within a region. Perhaps we can do a delta update on minor compaction and a complete update on major compaction.
  4. keep the min key/max key of a table in the stats table too

Support HFile generation from map/reduce jobs

Using the connectionless option for the PhoenixDriver, plus running the DDL so that the connection knows the structure of the table into which you're upserting, Phoenix may be used to generate an HFile from a map/reduce job (see ConnectionlessUpsertTest for an example).

However, the PhoenixRuntime.getUncommittedMutations(Connection) is returning the list of mutations in an unexpected order when a null is being set as the column value.

Allow in-place schema evolution

Phoenix supports adding and removing columns through the ALTER TABLE DDL command, but changing the data type of, or renaming, an existing column is not yet supported.

Support COUNT DISTINCT

Supporting COUNT DISTINCT will require returning more state to the client for the final merge operation. For this case, we need to keep a map in the aggregation coprocessors of each distinct value and their count. Then the key of the rows returned by the coprocessor will include the bytes of the distinct value. The merge on the client side would more or less stay the same, since the client merge (GroupedAggregatingResultIterator) is doing a final aggregation based on the keys of the rows it sees.

Another additional built-in function that could be added would be an approximate count distinct function - there are some interesting algorithms for this used by other open source projects.

Implement TABLESAMPLE clause

Support the standard SQL TABLESAMPLE clause by implementing a filter that uses a skip next hint based on the region boundaries of the table to only return n rows per region.

Use stats to guide query parallelization

We're currently not using stats, beyond a table-wide min key/max key cached per client connection, to guide parallelization. If a query targets just a few regions, we don't know how to evenly divide the work among threads, because we don't know the data distribution. This other issue is targeting gather and maintaining the stats, while this issue is focused on using the stats.

The main changes are:

  1. Create a PTableStats interface that encapsulates the stats information (and implements the Writable interface so that it can be serialized back from the server).
  2. Add a stats member variable off of PTable to hold this.
  3. From MetaDataEndPointImpl, lookup the stats row for the table in the stats table. If the stats have changed, return a new PTable with the updated stats information. We may want to cache the stats row and have the stats gatherer invalidate the cache row when updated so we don't have to always do a scan for it. Additionally, it would be idea if we could use the same split policy on the stats table that we use on the system table to guarantee co-location of data (for the sake of caching).
  4. modify the client-side parallelization (ParallelIterators.getSplits()) to use this information to guide how to chunk up the scans at query time.

This should help boost query performance, especially in cases where the data is highly skewed. It's likely the cause for the slowness reported in this issue: #47.

Support HBase 0.94.4 and above

HBase 0.94.4 introduced some none backward compatible interface changes to RegionScanner for which Phoenix has implementations. Detect this and do the right thing when running on HBase 0.94.4 or above. We can also take advantage of these interface changes to make our aggregate queries run faster.

Allow columns to be defined at query time

Sometimes defining a schema up front is not feasible. Instead, a subset of columns may be specified up front when creating the table while the rest would be specified at query time. One way of specifying this could be to define the columns in parens after the table in the FROM clause like this:

SELECT col1,col2,col3 FROM my_table(col2 VARCHAR, col3 INTEGER) WHERE col3 > 10

From an implementation point of view, this would not be too hard to do. Phoenix caches metadata for a table on the client-side through the PTable interface. We could create a new implementation associated with the statement context that delegates to the statically defined one, but allows new columns to be added.

Support transparent salting of row key

To prevent hot spotting on writes:

  • “Salt” row key on upsert by mod-ing with cluster size
  • Query for fully qualified key by inserting salt byte
  • Range scan by concatenating results of scan over all possible salt bytes

Or alternately

  • Define column used for hash to derive row key prefix

Add support for binary keys

It's not mentioned anywhere in documentation how Phoenix handles binary keys.
I am wondering if it's possible to create scans based on some user defined factory classes that knows table key format.

Multi-byte character support not working on Mac

See the following test failures that occur when run on a Mac (they pass on Linux, though):

Failed tests: testSubstrFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest): expected: but was: testRegexReplaceFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest): expected:<#[ # \u011c\u011e \u03d7\u03d8\u03db?]?> but was:<#[? # # #]?>
testRegexpSubstrFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest): expected:<?[]?\u0115\u011c\u011e> but was:<?[??]?\u0115\u011c\u011e>
testLengthFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest): SELECT length('\u025a\u0266\u0270\u0278') FROM BTABLE LIMIT 1 expected:<[4]> but was:<[8]>

Exceptions below:

java
com.salesforce.phoenix.end2end.VariableLengthPKTest
testSubstrFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest)
org.junit.ComparisonFailure: expected:<\u0192[é]> but was:<\u0192[]>
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at
com.salesforce.phoenix.end2end.VariableLengthPKTest.testSubstrFunction(VariableLengthPKTest.java:960)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

testRegexReplaceFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest)
org.junit.ComparisonFailure: expected:<#[ # \u0192ú\u0192û \u0153ó\u0153ò\u0153õ\u0153]¢> but was:<#[è

# #]¢>

at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at
com.salesforce.phoenix.end2end.VariableLengthPKTest.testRegexReplaceFunction(VariableLengthPKTest.java:1019)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

testRegexpSubstrFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest)
org.junit.ComparisonFailure: expected:<\u0192[]í\u0192ï\u0192ú\u0192û> but was:<\u0192[è\u0192]í\u0192ï\u0192ú\u0192û>
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at
com.salesforce.phoenix.end2end.VariableLengthPKTest.testRegexpSubstrFunction(VariableLengthPKTest.java:1081)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

testLengthFunction(com.salesforce.phoenix.end2end.VariableLengthPKTest)
org.junit.ComparisonFailure: SELECT length('\u2026ö\u2026¶\u2026\u221e\u2026\u220f') FROM BTABLE LIMIT
1 expected:<[4]> but was:<[8]>
at org.junit.Assert.assertEquals(Assert.java:125)
at
com.salesforce.phoenix.end2end.VariableLengthPKTest.testLengthFunction(VariableLengthPKTest.java:1336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

Fix findbug warnings

Run this command:
mvn compile site
Check out the warnings generated here:
target/site/findbugs.html

Support derived tables

Add support for derived queries of the form:
SELECT * FROM ( SELECT company, revenue FROM Company ORDER BY revenue) LIMIT 10

Adding support for this requires a compile time change as well as a runtime execution change. The first version of the compile-time change could limit aggregation to only be allowed in the inner or the outer query, but not both. In this case, the inner and outer queries can be combined into a single query with the outer select becoming just a remapping of a subset of the projection from the inner select. The second version of the compile-time change could handle aggregation in the inner and outer select by performing client side (this is likely a less common scenario).

For the runtime execution, change the UngroupedAggregateRegionObserver would be modified to look for a new "TopNLimit" attribute with an int value in the Scan. This would control the maximum number of values for the coprocessor to hold on to as the scan is performed. Then the GroupedAggregatingResultIterator would be modified to handle keeping the topN values received back from all the child iterators.

Support semi/anti-joins

A semi-join between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs.

An anti-join is the opposite of a semi-join and is written using the NOT EXISTS or NOT IN constructs.

There's a pretty good write-up here on semi/anti joins.

Collect usage and performance metrics

I'd like to know how much cpu, physical io, logical io, wait time, blocking time, transmission time was spent for each thread of execution across the hbase cluster, within coprocessors, and within the client's phoenix threadpools for each query.

Here are some of the problems I want to solve:

  1. every component has one or more configurable threadpools, and I have no idea how to gather data to make any decisions.
  2. queries that I think should be fast turn out to be dog slow, e.g., select foo from bar where foo like 'abc%' group by foo Without attaching a profiler to hbase, which most people won't bother with, it's not clear why it's slow.

ORDER BY with LIMIT evaluated in wrong order

Using the stock symbol example, if we do these queries:

SELECT SYMBOL FROM STOCK_SYMBOL ORDER BY SYMBOL DESC LIMIT 3;
SELECT SYMBOL FROM STOCK_SYMBOL ORDER BY SYMBOL ASC LIMIT 3;

One would expect different symbols in each set. Instead we get this result:

SYMBOL     
---------- 
GOOG       
CRM        
AAPL       
SYMBOL     
---------- 
AAPL       
CRM        
GOOG       

Which implies that the LIMIT was applied first, then ORDER BY. I could be wrong but I believe most databases would not evaluate these expressions in that order.

Show Table

It would be nice if there was a SHOW TABLES; command available.

-Christopher

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.