Comments (18)
Complete answer/example:
scanner = table.scan(
row_start=b'aaa',
row_stop=b'bbb',
filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()',
)
for row_key, data in scanner:
pass # do something with row_key
from happybase.
Just figured out that filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()
is even better for your use case (counting rows).
from happybase.
No, this is not possible, and functionality like that is not in the Thrift API either.
I think you should rethink your design. Scanning rows like you suggested is horribly inefficient, since it results in a lot of useless I/O on the region servers (the data is still read from disk, even though it will not be used). A better option is to keep aggregate counters when inserting data (use Table.counter_inc() for that) and build your pagination using that information.
from happybase.
Oh, and for the 'next page' link you should remember the last row key from the current page and scan from that row onwards.
from happybase.
Hi, out of curiosity: is your problem solved?
from happybase.
Not really. My problem is getting row keys between given range, and not all row keys.
so it would look like get_all_row_keys(start_row, end_row):
and returns [row_key_1, row_key_2,....row_key_last_index]
I was looking at KeyOnlyFilter() http://hbase.apache.org/book/thrift.html but that gives column keys too
from happybase.
Have you looked at the part about scanners in the tutorial? That can be used to specify start and stop keys. Combine it with FirstKeyOnlyFilter to avoid sending complete rows (but only a single cell per row) over the wire. I think it's not going to get any better than that with the current Thrift API (and not with the Java API either).
from happybase.
Missed FirstKeyOnlyFilter, looks like that should return only row_keys and first column key and value, which is definitely better than getting all column keys (and values)
from happybase.
Code example (untested):
scanner = table.scan(row_start=b'aaa', row_stop=b'bbb', filter=b'FirstKeyOnlyFilter()')
row_keys = [key for key, data in scanner]
from happybase.
Yeah, I tested using my data and it works
from happybase.
Cool thanks!!
from happybase.
I have just opened issue #14. Ideas and patches welcome. :)
from happybase.
haha! Sure, but I like what you have currently. Thin client which acts as "pass through" to Thrift service on hbase server. This way, you don't have to update python-client-api whenever Thrift service updates list of commands/filters it supports.
I will add some examples on how to construct proper filter string expressions
from happybase.
Great, thanks. I agree with you about keeping up to date, but some helper functions might be useful nonetheless, mostly for properly escaping binary data and so on.
from happybase.
Yeah, I probably should have updated this thread. But I was already using the above compound filter
from happybase.
I assumed so, but I posted it anyway for posteriority and for others on the internet who may stumble upon this issue. :)
from happybase.
table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()')
works like a charm!
Thank you @wbolster!
from happybase.
table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()')
This the most efficient I found till now. Thanks wbolster
from happybase.
Related Issues (20)
- Fix simple typo: specifed, -> specified,
- How does happyhbase fuzzy query rowkey HOT 3
- Support Pre-Split when create table ?
- HBase remotely connecting to python project HOT 7
- Support gevent?
- suppot count table rows? HOT 1
- org.apache.hadoop.hbase.NamespaceNotFoundException: HOT 4
- [feature] Support reconnect host when connectionpool raise error HOT 2
- thriftpy2.protocol.exc.TProtocolException: Bad protocol id in the message: 72 HOT 4
- happybase 1.2.0 supports hbase 2.2.5 ? HOT 1
- How can I do a query for specific columns by regex-statement HOT 2
- TTransportException: TTransportException(type=4, message='TSocket read 0 bytes') HOT 4
- TypeError: __str__ returned non-string (type bytes) hbase HOT 6
- import happybase error HOT 1
- Does it support snapshot management? HOT 1
- Is there a way to update / modify TTL (time to leave) on already created table ? HOT 1
- Hbase compatibility
- Table Put - How Do We Assign And Use A Variable For the 'Row Key' HOT 8
- compact_table major no work
- 使用happybase创建连接池并扫描整张表一段时间以后报错 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from happybase.