Coder Social home page Coder Social logo

ycui1984 / pg_thrift Goto Github PK

View Code? Open in Web Editor NEW
10.0 1.0 0.0 128 KB

Thrift support for postgreSQL

Makefile 0.40% C 90.36% PLpgSQL 9.24%
thrift zson thrift-support jsonb pg-thrift protobuf compresses dictionary pg-protobuf compact postgresql

pg_thrift's Introduction

Build Status

Thrift support for PostgreSQL (pg_thrift)

Motivation

One of advantages of document-oriented databases like MongoDB or Couchbase over RDBMSs is an ability to change the data scheme easily, fast and often. The traditional approach in RDBMS world involves doing an expensive ALTER TABLE operation, slow upgrade of an existing data, and stuff like this. This approach is often slow and inconvenient for application developers.

To solve this issue PostgreSQL provides JSON and JSONB datatypes. Unfortunately JSONB has a disadvantage of storing all documents keys, which is a lot of redundant data.

One possibility to to reduce JSONB redundancy is to use zson extension. It compresses JSONB documents using shared dictionary of common strings that appear in all or most documents. This approach has its limitations though. Particularly, since data schema evolves, the dictionary has to be updated from time to time. Also zson can affect the build-in mechanism of PostgreSQL of compressing data using PGLZ algorithm since this mechanism uses some heuristics to recognize data that compresses well. Thus sometimes zson can reduce the overall performance.

There is another extension(pg_protobuf). Basically it provides Protobuf support for PostgreSQL. It seems to solve all the issues described above and doesn't have any disadvantages of zson extension.

The idea of this project is to create a similar extension that would provide Thrift support. Some users may prefer Thrift to Protobuf or just use it by historical reasons. This project is a bit more complicated than pg_protobuf since unlike Protobuf Thrift support various encoding protocols. This project implements two encoding protocols (binary and compact) based on the document here (https://erikvanoosten.github.io/thrift-missing-specification/#_integer_encoding)

Install

Step1. Install PostgreSQL

brew install postgresql

Step2. Start PostgreSQL server

brew info postgres
postgres -D /usr/local/var/postgres

Step3. Connect with the default DB

psql postgres

Step4. Build plugin

make && make install

Step5. Run tests

make install && make installcheck

Step6. Confirm plugin has been installed

postgres=# select * from pg_available_extensions where name = 'pg_thrift';

Step7. Load pg_thrift extension

postgres=# create extension pg_thrift;

Step8. Confirm plugin has been loaded

postgres=# \dx

API

Thrift Binary Protocol API:

thrift_binary_get_bool          /* get bool from struct bytea */
thrift_binary_get_byte          /* get byte from struct bytea */
thrift_binary_get_double        /* get double from struct bytea */
thrift_binary_get_int16         /* get int16 from struct bytea */
thrift_binary_get_int32         /* get int32 from struct bytea */
thrift_binary_get_int64         /* get int64 from struct bytea */
thrift_binary_get_string        /* get string from struct bytea */
thrift_binary_get_struct_bytea  /* get struct bytea from struct bytea */
thrift_binary_get_list_bytea    /* get array of bytea from struct bytea */
thrift_binary_get_set_bytea     /* get array of bytea from struct bytea */
thrift_binary_get_map_bytea     /* get array of bytea from struct bytea */

parse_thrift_binary_boolean     /* get bool from bytea */
parse_thrift_binary_string      /* get string from bytea */
parse_thrift_binary_bytes       /* get bytes from bytea */
parse_thrift_binary_int16       /* get int16 from bytea */
parse_thrift_binary_int32       /* get int32 from bytea */
parse_thrift_binary_int64       /* get int64 from bytea */
parse_thrift_binary_double      /* get double from bytea */
parse_thrift_binary_list_bytea  /* get array of bytea from bytea */
parse_thrift_binary_map_bytea   /* get array of bytea from bytea */

Thrift Compact Protocol API:

thrift_compact_get_bool         /* get bool from struct bytea */
thrift_compact_get_byte         /* get byte from struct bytea */
thrift_compact_get_double       /* get double from struct bytea */
thrift_compact_get_int16        /* get int16 from struct bytea */
thrift_compact_get_int32        /* get int32 from struct bytea */
thrift_compact_get_int64        /* get int64 from struct bytea */
thrift_compact_get_string       /* get string from struct bytea */
thrift_compact_get_struct_bytea /* get struct bytea from struct bytea */
thrift_compact_get_list_bytea   /* get array of bytea from struct bytea */
thrift_compact_get_set_bytea    /* get array of bytea from struct bytea */
thrift_compact_get_map_bytea    /* get array of bytea from struct bytea */

parse_thrift_compact_boolean    /* get bool from bytea */
parse_thrift_compact_string     /* get string from bytea */
parse_thrift_compact_bytes      /* get bytes from bytea */
parse_thrift_compact_int16      /* get int16 from bytea */
parse_thrift_compact_int32      /* get int32 from bytea */
parse_thrift_compact_int64      /* get int64 from bytea */
parse_thrift_compact_double     /* get double from bytea */
parse_thrift_compact_list_bytea /* get array of bytea from bytea */
parse_thrift_compact_map_bytea  /* get array of bytea from bytea */

Thrift Binary Type

To ease the use of thrift type, custom data types are created. User provide json format as input, thrift bytes are stored. The custom type supports binary protocol now, but should be easy to extend to compact protocol.

thrift_binary_in                /* json to thrift binary bytes */
thrift_binary_out               /* thrift binary to json bytes */

API Use Case1. Parse field (using compact protocol):

--struct(id=[1, 2, 3, 4, 5])
SELECT parse_thrift_compact_int32(UNNEST(thrift_compact_get_set_bytea(E'\\x1a58020406080a00' :: bytea, 1)));
  parse_thrift_compact_int32
 ----------------------------
                           1
                           2
                           3
                           4
                           5
 (5 rows)


-- struct(id=123, phones=["123456", "abcdef"])  //item1
-- struct(id=456, phones=["123456", "abcdef"])  //item2
-- struct(id=123, items=[item1, item2])
SELECT parse_thrift_compact_string(UNNEST(thrift_compact_get_list_bytea(UNNEST(thrift_compact_get_list_bytea(E'\\x15f601192c15f601192b0c3132333435360c61626364656600159007192b0c3132333435360c6162636465660000', 2)), 2)));
  parse_thrift_compact_string
 -----------------------------
  123456
  abcdef
  123456
  abcdef
 (4 rows)

API Use Case2. Creating Index Based on Thrift Bytes (using binary protocol):

create extension pg_thrift;

create table thrift_index (x bytea);
-- store random data in the schema
-- struct { 1: string }
insert into thrift_index
select E'\\x0b00010000000c' || convert_to(substring(md5('' || random() || random()), 0, 14), 'utf-8') || E'\\x00' from generate_series(1,10000);

create table thrift_no_index (x bytea);
insert into thrift_no_index
select E'\\x0b00010000000c' || convert_to(substring(md5('' || random() || random()), 0, 14), 'utf-8') || E'\\x00' from generate_series(1,10000);

create index thrift_string_idx on thrift_index using btree(thrift_binary_get_string(x, 1));

select thrift_binary_get_string(x, 1) from thrift_index order by thrift_binary_get_string(x, 1) limit 10;

explain select thrift_binary_get_string(x, 1) from thrift_index order by thrift_binary_get_string(x, 1) limit 10;
---------------------------------------------------------------------------------------------------
Limit  (cost=0.29..0.88 rows=10 width=22)
   ->  Index Scan using thrift_string_idx on thrift_index  (cost=0.29..595.28 rows=10000 width=22)
(2 rows)

explain select thrift_binary_get_string(x, 1) from thrift_no_index order by thrift_binary_get_string(x, 1) limit 10;
----------------------------------------------------------------------------------
Limit  (cost=405.10..405.12 rows=10 width=22)
  ->  Sort  (cost=405.10..430.10 rows=10000 width=22)
        Sort Key: (thrift_binary_get_string(x, 1))
        ->  Seq Scan on thrift_no_index  (cost=0.00..189.00 rows=10000 width=22)
(4 rows)

API Use Case3. Using custom type:

postgres=# create table thrift_example(x thrift_binary);
CREATE TABLE
postgres=# insert into thrift_example values('{"type" : "int16", "value" : 60}');
INSERT 0 1
postgres=# select * from example;
              x              
-----------------------------
 {"type":"int16","value":60}
(1 row)

pg_thrift's People

Contributors

charles-cui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.