Comments (6)
No, I'm afraid it does not.
from happybase.
The short answer is no. The long answer is, well, a bit longer. :) Let me explain.
HBase is not a relational database, but a fancy distributed multidimensional key/value store (not completely correct, but this description suffices for now). HappyBase is just a nice API around the Thrift interface exposed by HBase, so that it can be used outside Java processes running on your cluster. The HBase Thrift interface, in turn, exposes (part of) the HBase API, which is modelled after the HBase data model (column families, qualifiers, timestamps, row keys).
This HBase data model is more "low level" than traditional feature-rich RDBMS. Aggregation is not part of the HBase data model, nor contained in its access methods. This means features like count() and sum() have to be implemented client-side, i.e. in your application.
If you're not aggregating massive amounts of data, e.g. because you limited your selection already to a specific range of your data (design your row keys well!), a Table.scan() with some client-side logic is what you need. Something like this might work for you (note: untested!):
import struct
import happybase
c = happybase.Connection(...)
t = c.table('mytable')
unpacker = struct.Struct('>I').unpack
s = t.scan(row_prefix='foobar')
total = sum(unpacker(row_data['some-column']) for row_key, row_data in s)
print "The sum is %d" % s
On the other hand, if you need aggregate information over massive amounts of data, you should either calculate it up front by keeping counters (e.g. using table.counter_inc()
), or calculate it using MapReduce jobs.
If you're just interested in counting the rows in a HBase table because you want to know how much data you have: the HBase shell has a 'count' command for that. It just opens a scanner and counts all rows, so this is a very expensive (I/O heavy) operation.
from happybase.
Thanks wbolster. I have 2 questions
-
Does happybase support coprocessor that comes with hbase?
-
I am new to python. When I run the code you gave(I know it is untested), I get the below error.
The column cf1:input1 is integer. You can fix it by looking at it.
$/usr/happybase/bin/python s1.py
Traceback (most recent call last):
File "s1.py", line 9, in
total = sum(unpacker((data['cf:input1'])) for key, data in s)
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
Thanks
from happybase.
- No, HappyBase is limited to the HBase Thrift API. Coprocessors run inside your cluster; Thrift is mostly for access from the outside.
- Sorry, it should have read
unpacker(...)[0]
instead. Obviously this only works if your numbers are stored as unsigned 32-bit big endian integers (the struct format in my example).
from happybase.
@wbolster I have more than 2 lakh records in Hbase i need to get total count of that table, right now i'm going with FirstKeyOnlyFilter() and KeyOnlyFilter()
but it gives me poor performance. Is there any way i could get the total count of an Hbase table?
I'm using Python3.5, Happybase 1.2.0
from happybase.
this is a hbase question, not happybase. but im afraid the answer is no...
from happybase.
Related Issues (20)
- Fix simple typo: specifed, -> specified,
- How does happyhbase fuzzy query rowkey HOT 3
- Support Pre-Split when create table ?
- HBase remotely connecting to python project HOT 7
- Support gevent?
- suppot count table rows? HOT 1
- org.apache.hadoop.hbase.NamespaceNotFoundException: HOT 4
- [feature] Support reconnect host when connectionpool raise error HOT 2
- thriftpy2.protocol.exc.TProtocolException: Bad protocol id in the message: 72 HOT 4
- happybase 1.2.0 supports hbase 2.2.5 ? HOT 1
- How can I do a query for specific columns by regex-statement HOT 2
- TTransportException: TTransportException(type=4, message='TSocket read 0 bytes') HOT 4
- TypeError: __str__ returned non-string (type bytes) hbase HOT 6
- import happybase error HOT 1
- Does it support snapshot management? HOT 1
- Is there a way to update / modify TTL (time to leave) on already created table ? HOT 1
- Hbase compatibility
- Table Put - How Do We Assign And Use A Variable For the 'Row Key' HOT 8
- compact_table major no work
- 使用happybase创建连接池并扫描整张表一段时间以后报错 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from happybase.