Coder Social home page Coder Social logo

vtenc's Introduction

VTEnc

VTEnc is a compression algorithm for sorted lists of unsigned integers. This C library provides an interface to use VTEnc with different data types (8, 16, 32 and 64 bits) and with both lists and sets.

Results

This is a work-in-progress section. Last updated on: 2019-11-10

For reference, VTEnc has been tested on a laptop Ubuntu Desktop 19.04 with a Core i7-6700HQ CPU @ 2.60GHz x 8, using different data sets and comparing it with other integer compression algorithms. See Tests and Benchmarks below for further information.

  • Timestamps (ts.txt):
Algorithm Encoded Size Ratio % Encoding Speed Decoding Speed
VTEnc 21,686 0.0038 60.9155 G/s 734.54 M/s
Delta+FastPFor256 1,179,312 0.20 2.00714 G/s 4.75146 G/s
Delta+FastPFor128 2,306,544 0.40 1.9029 G/s 4.82925 G/s
Delta+BinaryPacking 4,552,280 0.79 8.5867 G/s 5.77439 G/s
Delta+VariableByte 144,285,504 25.0 4.86063 G/s 5.09461 G/s
Delta+VarIntGB 180,356,880 31.25 6.75428 G/s 9.2638 G/s
Copy 577,141,992 100.0 10.4087 G/s -
  • gov2.sorted:
Algorithm Encoded Size Ratio % Encoding Speed Decoding Speed
VTEnc 2,889,599,350 12.08 - -
memcpy 23,918,861,764 100.0 - -

API

The documented interface is found in the header file vtenc.h.

Examples

Encoding example:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "vtenc.h"

int main()
{
  const uint8_t in[] = {13, 14, 29, 39, 65, 80, 88, 106, 152, 154, 155, 177};
  const size_t in_len = sizeof(in) / sizeof(in[0]);
  const size_t out_cap = vtenc_list_max_encoded_size_u8(in_len); /* output capacity */
  uint8_t *out=NULL;
  size_t out_len=0;
  VtencErrorCode code;

  /* allocate `out_cap` bytes */
  out = (uint8_t *) malloc(out_cap * sizeof(uint8_t));
  if (out == NULL) {
    fprintf(stderr, "allocation error\n");
    return 1;
  }

  /* encode `in` list into `out` stream of bytes */
  code = vtenc_list_encode_u8(in, in_len, out, out_cap, &out_len);
  if (code != VtencErrorNoError) {
    fprintf(stderr, "failed encoding with error code: %u\n", code);
    free(out);
    return 1;
  }

  /* here `out` holds the encoded output, which is `out_len` bytes long */

  free(out);

  return 0;
}

Decoding example:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "vtenc.h"

int main()
{
  const uint8_t in[] = {0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x90, 0x26,
    0x24, 0x8d, 0x75, 0xfd, 0x95, 0x83, 0x9b, 0x03};
  const size_t in_len = sizeof(in) / sizeof(in[0]);
  size_t out_len = vtenc_list_decoded_size_u8(in, in_len); /* output length */
  uint8_t *out=NULL;
  VtencErrorCode code;

  /* allocate `out_len` items */
  out = (uint8_t *) malloc(out_len * sizeof(uint8_t));
  if (out == NULL) {
    fprintf(stderr, "allocation error\n");
    return 1;
  }

  /* decode `in` stream of bytes into `out` list */
  code = vtenc_list_decode_u8(in, in_len, out, out_len);
  if (code != VtencErrorNoError) {
    fprintf(stderr, "failed decoding with error code: %u\n", code);
    free(out);
    return 1;
  }

  /* here `out` holds the decoded list of integers, of size `out_len` */

  free(out);

  return 0;
}

Limitations

For now, this library has only been tested on Linux and MacOS systems. It shouldn't be too difficult to support Windows as well and I'm planing to do so in future releases.

Building

To compile the library, run make in root directory. That will generate the static and the shared libraries (.a and .so files), ready to be included in your own projects.

Tests

This library is well covered with unit tests and with a few little programs to test different data sets. To build both the programs and the unit tests, run make test in the root directory. The following executable files will be created:

  • tests/unit/unit_tests: invoke it to run all the unit tests.

  • tests/timestamps: program to test ts.txt, a text file with a big list of timestamps.

  • tests/gov2: program to test gov2.sorted file, which is part of the "Document identifier data set" created by D. Lemire.

Benchmarks

The tables shown in the Results section are the result of running the tests included in the integer-compression-benchmarks repository.

License

This code is licensed under MIT license.

vtenc's People

Contributors

vteromero avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.