Coder Social home page Coder Social logo

capjure's Introduction

capjure

capjure is a persistence helper for HBase. It is written in the Clojure language, and supports persisting of native hash-maps. The way capjure works is to shred the contents of the hash-map being saved into pieces, based on the keys of the hash-map. Each piece is then mapped to a cell in HBase – the is picked based on the map’s keys and some capjure configuration.

This is best illustrated through examples – see below.

Contributions are welcome, as are recommendations for how to improve things.

Usage

These are the things you need to do to get capjure working -

  • include capjure on the classpath
  • decide on values for two vars – the first being: * hbase-master * and the second: * primary-keys-config * (no spaces, I put them in there to avoid textile)
  • set up a binding form for all capjure related calls – bind the above two vars within it

hbase-master

This var should be bound to a string containing the hostname and port of the HBase master that you want to store objects into. The format is like this examples -

"http://hbase-master.domain.com:60000"

primary-keys-config

This one is a little more complicated. This configuration object is used when capjure tries to persist a nested hash-map. If you don’t have this use-case, you can skip this.

primary-keys-config is basically a clojure map with two keys – :encoders, and :decoders. Each represents an object defined by the capjure provided function config-keys. Encoders are used to prepare values for persistence into HBase, while decoders are used to reverse the operation during reads out of HBase.

config-keys

This function takes multiple config objects (each for one key) – and these config objects are created using the config-for helper function.

config-for

This is where all the work happen. config-for accepts three parameters, the first two are – a top-level key and a qualifier key (think of this as an inner key, in case of a nested hash-map where the value of a top-level key is itself another hash-map).

encoders

The third paramter is a function of one argument. What this function does depends on whether config-for is being used to specify configuration for an encoder or a decoder. In the case of an encoder, the argument that this function will be passed will be the value of the key. The return value of the function is used as the column-name of the HBase table during storage. (The outer-key, itself, is used as the column-family). This return value should use the value of the inner hash – in other words, encode it into the string used as the column-name (see examples below).

decoders

When creating a configuration object for decoders, the third parameter is simply another function that reverses what the corresponding encoder did. Thus, it is a function that accepts a single parameter (the value that the encoder had produced) – and should return the value which had been encoded into it.

These two complementary functions allows the keys (to be more specific, the values) that are used as ‘primary’ to be encoded in and decoded out of the HBase table.

Example

All this can be a bit confusing – but in practice, its really quite easy. I hope the following examples will show how -

Let’s assume we want to persist the following car objects:


  :cars => [  
    {:make => 'honda', :model => 'fit', :license => 'ah12001'},  
    {:make => 'toyota', :model => 'yaris', :license => 'xb34544'}]


These two cars need to be persisted into a single row of the cars table (in HBase). capjure will convert them into a form that looks like this:


{  
  "cars_model:ah12001" => "fit",  
  "cars_make:ah12001" => "honda",  
  "cars_model:xb34544" => "yaris",  
  "cars_make:xb34544" => "toyota"  
}   

This is done by using a configuration that looks like this -


(def encoders (config-keys  
  (config-for :cars :license  (fn [car-map]  
                           (car-map :license))))  

(def decoders (config-keys  
  (config-for :cars :license  (fn [value]  
                                value)))  

(def keys-config {:encode encoders :decode decoders})  

API

the binding form

As described earlier, capjure needs two vars bound whenever one of the API functions are used. That looks like -


(binding [*hbase-master* "hbase.test-site.net:60000" *primary-keys-config* keys-config] 
  ;capjure stuff goes here
)

Functions of interest

This is the most commonly used call – to push things into HBase -


(binding [*hbase-master* "hbase.test-site.net:60000" *primary-keys-config* keys-config]  
    (capjure-insert some-json-object "hbase_table_name" "some-row-id"))  

and this is the reverse of that -


(binding [*hbase-master* "hbase.test-site.net:60000" *primary-keys-config* keys-config]  
    (read-as-hydrated "hbase_table_name" "some-row-id"))  

Other convenience functions

Here are some other useful functions


row-exists? [hbase-table-name row-id-string]  
cell-value-as-string [row column-name]  
read-all-versions-as-strings [hbase-table-name row-id-string number-of-versions column-family-as-string]  
read-cell [hbase-table-name row-id column-name]  
rowcount [hbase-table-name & columns]  
delete-all [hbase-table-name & row-ids-as-strings]  

There are more, including some to get column-families information, and some to clone tables, etc.

capjure

= capture + clojure. Ahahaha.

Copyright 2009 Amit Rathore

capjure's People

Contributors

amitrathore avatar mudphone avatar zk avatar mtm avatar sivajag avatar

Stargazers

 avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.