Coder Social home page Coder Social logo

Comments (6)

kirang89 avatar kirang89 commented on August 13, 2024

No, I'm afraid it does not.

from happybase.

wbolster avatar wbolster commented on August 13, 2024

The short answer is no. The long answer is, well, a bit longer. :) Let me explain.

HBase is not a relational database, but a fancy distributed multidimensional key/value store (not completely correct, but this description suffices for now). HappyBase is just a nice API around the Thrift interface exposed by HBase, so that it can be used outside Java processes running on your cluster. The HBase Thrift interface, in turn, exposes (part of) the HBase API, which is modelled after the HBase data model (column families, qualifiers, timestamps, row keys).

This HBase data model is more "low level" than traditional feature-rich RDBMS. Aggregation is not part of the HBase data model, nor contained in its access methods. This means features like count() and sum() have to be implemented client-side, i.e. in your application.

If you're not aggregating massive amounts of data, e.g. because you limited your selection already to a specific range of your data (design your row keys well!), a Table.scan() with some client-side logic is what you need. Something like this might work for you (note: untested!):

import struct
import happybase

c = happybase.Connection(...)
t = c.table('mytable')

unpacker = struct.Struct('>I').unpack
s = t.scan(row_prefix='foobar')
total = sum(unpacker(row_data['some-column']) for row_key, row_data in s)

print "The sum is %d" % s

On the other hand, if you need aggregate information over massive amounts of data, you should either calculate it up front by keeping counters (e.g. using table.counter_inc()), or calculate it using MapReduce jobs.

If you're just interested in counting the rows in a HBase table because you want to know how much data you have: the HBase shell has a 'count' command for that. It just opens a scanner and counts all rows, so this is a very expensive (I/O heavy) operation.

from happybase.

mike20007 avatar mike20007 commented on August 13, 2024

Thanks wbolster. I have 2 questions

  1. Does happybase support coprocessor that comes with hbase?

  2. I am new to python. When I run the code you gave(I know it is untested), I get the below error.
    The column cf1:input1 is integer. You can fix it by looking at it.

$/usr/happybase/bin/python s1.py
Traceback (most recent call last):
File "s1.py", line 9, in
total = sum(unpacker((data['cf:input1'])) for key, data in s)
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'

Thanks

from happybase.

wbolster avatar wbolster commented on August 13, 2024
  1. No, HappyBase is limited to the HBase Thrift API. Coprocessors run inside your cluster; Thrift is mostly for access from the outside.
  2. Sorry, it should have read unpacker(...)[0] instead. Obviously this only works if your numbers are stored as unsigned 32-bit big endian integers (the struct format in my example).

from happybase.

dinesh1218 avatar dinesh1218 commented on August 13, 2024

@wbolster I have more than 2 lakh records in Hbase i need to get total count of that table, right now i'm going with FirstKeyOnlyFilter() and KeyOnlyFilter() but it gives me poor performance. Is there any way i could get the total count of an Hbase table?
I'm using Python3.5, Happybase 1.2.0

from happybase.

wbolster avatar wbolster commented on August 13, 2024

this is a hbase question, not happybase. but im afraid the answer is no...

from happybase.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.