Coder Social home page Coder Social logo

graph-serializers's People

Contributors

cmuramoto avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

graph-serializers's Issues

Improve Performance of Reading/Writing Ascii Characters when using compression

The basic algorithm for writing a compressed String uses a naive approach of reading one character at a time, which is inherently slow:

for (int i = 0; i < len; i++) {
    c = value[i];
    if (c <= 0x007F) {
        writeByte((byte) c);
    } else if (c > 0x07FF) {
        writeByte((byte) (0xE0 | c >> 12 & 0x0F));
        writeByte((byte) (0x80 | c >> 6 & 0x3F));
        writeByte((byte) (0x80 | c >> 0 & 0x3F));
    } else {
        writeByte((byte) (0xC0 | c >> 6 & 0x1F));
        writeByte((byte) (0x80 | c >> 0 & 0x3F));
    }
}

Likewise, reading text uses a similar approach, reading byte by byte:

while (ix < len) {
c = readByte() & 0xFF;

    switch (c >> 4) {
    case 0:
    case 1:
    case 2:
    case 3:
    case 4:
    case 5:
    case 6:
    case 7:
        chars[ix++] = (char) c;
        break;
    case 12:
    case 13:
        chars[ix++] = (char) ((c & 0x1F) << 6 | readByte() & 0x3F);
    break;
    case 14:
        chars[ix++] = (char) ((c & 0x0F) << 12 | (readByte() & 0x3F) << 6 | (readByte() & 0x3F) << 0);
        break;
    default:
        break;
    }
}

The code for writing Ascii characters can be sort of vectorized by reading/writing 64 bits at a time with Unsafe. For example, each tuple of 8 Ascii characters can be stored in a single long that can be written in one shot with Unsafe::putLong. To read these characters and store then in a char[] we can then take spread the lower and upper 32 bits, by representing them as two 64 bit tuples consisting of four 16-bit (zero-extended 8-bit) values. For example, supose we have:

char[] c={'A','B','C','D'};

These characters can be read in a single step and then encoded as 32-bit:

//Read 4 16-bit tuples in a single step
//19140586183458881=[0000000001000100,0000000001000011,0000000001000010,0000000001000001]
long v = U.getLong(c,ARRAY_CHAR_BASE_OFFSET);
//If Ascii, encode as 4 8-bit tuple
//1145258561;//[01000100,01000011,01000010,01000001]
long txt= (v&0XFF) | (v>>16&0XFF)<<8 | (v>>16&0XFF)<<16 | (v>>24&0XFF)<<24;  

Then, in order to write back these contents to a char[], we recover the value by zero-extending the 4 16-bit tuple:

long txt=1145258561;//[01000100,01000011,01000010,01000001]
//[0000000001000100,0000000001000011,0000000001000010,0000000001000001]
//zero-extend
long v=(txt & 0xFF) | ((txt & 0XFF00) << 8) | ((txt & 0XFF0000) << 16) | ((txt & 0xFF000000) << 24)


Unsafe.putLong(c,ARRAY_CHAR_BASE_OFFSET,v);//writes 4-chars in a single step

There are three approaches choices that can be made:

  1. A fail-fast: We try to read/write Ascii, if we detect a non-ascii character, we fall back to the ordinary slow path, from scratch.
  2. A 'write-until Ascii': We attempt to use the vectorized code as much as possible, then we fall back to the slow path starting from the 'failed' position.
  3. A encode-all: Attempt to vectorize the code even for non-ascii characters. This is much more complicated since we can face word-tearing like issues, e.g., a character that is 2 or 3 bytes might be split at the end of a word and the beginning of another.

The fail-fast approach is certainly simple, but we'll end up loosing performance if a non-Ascii match is found far from the start of the String. The write-until is more complicated since we are required to keep track of how many successful Ascii characters we have written, so it will be more branched.

Support Type-Level Hierarchy Declaration

As it is now, in order to optimize collection/maps with polymorphic entries, one must declare the hierarchy directly on the field:

class Node {

class Red extends Node {

}

class Black extends Node {

}

class Foo {

@Optimize
List<@Hierarchy(types={Node.Red.class,Node.Black.class}) Node> leaves;

@Collection(optimize=true,shape=@Shape(hierarchy=@Hierarchy(types={Node.Red.class,Node.Black.class})))
List<Node> roots;

}

Of course this gets boring and repetitive. To cope with this we could use Type-Level declaration and some conventions, e.g., if Node declares its children as:

@Hierarchy(types={Node.Red.class,Node.Black.class})
class Node {

}

then Foo can be equivalently be declared as:

class Foo {

@Optimize
List<Node> leaves;

@Collection(optimize=true)
List<Node> roots;

}

The following conventions should be used:

  1. Any declarations of either @shape with @hierarchy (inside @Collection or @Map) or @hierarchy, in TYPE_USE scope, will override declarations of @hierarchy in the TYPE scope.
  2. We shall only try to use @hierarchy of TYPE scope if
    1. No type information could be extracted from declarations (or absence of thereof) of @shape or @hierarchy
    2. Type wasn't declared as a @leafnode in TYPE_USE scope.

By concentrating the declaration of @hierarchy in less places, possibly only in the topmost class for the most common cases, we can reduce the overhead of parsing hierarchies on a field by field basis.

Add Basic Support for Compression - Collections, Arrays and Maps

Allow fine grained control over compression for collections and maps.

Consider for example:

public class Foo{

@Map(optimize=true)
Map<String,@Compress @Hierarchy(complete=true,types={Long.class,String.class} ) Object> map;

}

The resulting serializer emitted by the code generator should try to use compressing serializers, if possible, to serialize map values.

There are lots of combinations to be considered:

  1. Non-optimized versions
    1. - Simple Types, addressed by loadPrecompiled on bootstrap
    2. - Pojos. One could emit a Serializer as it were marked with @fields(compressByDefault=true)
  2. Optimized versions
    1. - As 1.1, using the static invocation mechanism.
    2. - As 1.2, using the static invocation mechanism.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.