Coder Social home page Coder Social logo

Comments (18)

wbolster avatar wbolster commented on August 13, 2024 2

Complete answer/example:

scanner = table.scan(
    row_start=b'aaa',
    row_stop=b'bbb',
    filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()',
)

for row_key, data in scanner:
    pass  # do something with row_key

from happybase.

wbolster avatar wbolster commented on August 13, 2024 1

Just figured out that filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter() is even better for your use case (counting rows).

from happybase.

wbolster avatar wbolster commented on August 13, 2024

No, this is not possible, and functionality like that is not in the Thrift API either.

I think you should rethink your design. Scanning rows like you suggested is horribly inefficient, since it results in a lot of useless I/O on the region servers (the data is still read from disk, even though it will not be used). A better option is to keep aggregate counters when inserting data (use Table.counter_inc() for that) and build your pagination using that information.

from happybase.

wbolster avatar wbolster commented on August 13, 2024

Oh, and for the 'next page' link you should remember the last row key from the current page and scan from that row onwards.

from happybase.

wbolster avatar wbolster commented on August 13, 2024

Hi, out of curiosity: is your problem solved?

from happybase.

vskr avatar vskr commented on August 13, 2024

Not really. My problem is getting row keys between given range, and not all row keys.

so it would look like get_all_row_keys(start_row, end_row):
and returns [row_key_1, row_key_2,....row_key_last_index]

I was looking at KeyOnlyFilter() http://hbase.apache.org/book/thrift.html but that gives column keys too

from happybase.

wbolster avatar wbolster commented on August 13, 2024

Have you looked at the part about scanners in the tutorial? That can be used to specify start and stop keys. Combine it with FirstKeyOnlyFilter to avoid sending complete rows (but only a single cell per row) over the wire. I think it's not going to get any better than that with the current Thrift API (and not with the Java API either).

from happybase.

vskr avatar vskr commented on August 13, 2024

Missed FirstKeyOnlyFilter, looks like that should return only row_keys and first column key and value, which is definitely better than getting all column keys (and values)

from happybase.

wbolster avatar wbolster commented on August 13, 2024

Code example (untested):

scanner = table.scan(row_start=b'aaa', row_stop=b'bbb', filter=b'FirstKeyOnlyFilter()')
row_keys = [key for key, data in scanner] 

from happybase.

vskr avatar vskr commented on August 13, 2024

Yeah, I tested using my data and it works

from happybase.

vskr avatar vskr commented on August 13, 2024

Cool thanks!!

from happybase.

wbolster avatar wbolster commented on August 13, 2024

I have just opened issue #14. Ideas and patches welcome. :)

from happybase.

vskr avatar vskr commented on August 13, 2024

haha! Sure, but I like what you have currently. Thin client which acts as "pass through" to Thrift service on hbase server. This way, you don't have to update python-client-api whenever Thrift service updates list of commands/filters it supports.

I will add some examples on how to construct proper filter string expressions

from happybase.

wbolster avatar wbolster commented on August 13, 2024

Great, thanks. I agree with you about keeping up to date, but some helper functions might be useful nonetheless, mostly for properly escaping binary data and so on.

from happybase.

vskr avatar vskr commented on August 13, 2024

Yeah, I probably should have updated this thread. But I was already using the above compound filter

from happybase.

wbolster avatar wbolster commented on August 13, 2024

I assumed so, but I posted it anyway for posteriority and for others on the internet who may stumble upon this issue. :)

from happybase.

 avatar commented on August 13, 2024
table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()')

works like a charm!
Thank you @wbolster!

from happybase.

bhalgat20 avatar bhalgat20 commented on August 13, 2024

table.scan(filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()')

This the most efficient I found till now. Thanks wbolster

from happybase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.