Coder Social home page Coder Social logo

apache / incubator-fury Goto Github PK

View Code? Open in Web Editor NEW
2.6K 41.0 164.0 9.53 MB

A blazingly fast multi-language serialization framework powered by JIT and zero-copy.

Home Page: https://fury.apache.org/

License: Apache License 2.0

Java 70.68% Shell 0.66% JavaScript 0.41% TypeScript 5.42% Python 6.07% Starlark 0.62% C++ 5.10% Cython 4.18% Rust 1.81% Go 3.00% Scala 0.58% C 1.47%
cross-language fast jit multiple-language serialization zero-copy java python cpp golang

incubator-fury's People

Contributors

ayushrakesh avatar bytemain avatar caicancai avatar chaokunyang avatar cn-at-osmit avatar dependabot[bot] avatar farmerworking avatar hieu-ht avatar iamahens avatar knutwannheden avatar laglangyue avatar leeco-cloud avatar liangliangsui avatar mof-dev-3 avatar munoon avatar nandakumar131 avatar pandalee99 avatar phogh avatar pjfanning avatar pragmatwice avatar rainsongain avatar s31k31 avatar shivam250702 avatar smoothieewastaken avatar springrain avatar theweipeng avatar tisonkun avatar vesense avatar vidhijain27 avatar xiguashu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

incubator-fury's Issues

[Java] multi-key weak map support

Is your feature request related to a problem? Please describe.
JDK java.util.WeakHashMap only support one key as weak key, but sometimes we may need a weak map with key is an array of multiple weak items. In such cases, creating a temporary weak key and putting it into WeakHashMap is not feasible, because the temporary key is not strongly-referenced. We need a new weak map which support multi-key weak key natively.

[Java] Add StringBuilder/StringBuffer serializer

Is your feature request related to a problem? Please describe.
Add StringBuilder/StringBuffer serializer

Describe the solution you'd like
Convert StringBuilder/StringBuffer to String, then serializing it using StringSerializer.

Describe alternatives you've considered
Convert to/from String may have some cost, bettern solution is tackle inner data structure of StringBuilder/StringBuffer directly.
But StringBuilder/StringBuffer serialization is not common, we can using the conversion first, then optimize later if truely needed.
Additional context
#89

[Java] CI support for java

Is your feature request related to a problem? Please describe.
Add java ci support
Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] extract captured variables in lambda

Is your feature request related to a problem? Please describe.
Support extract captured variables in lambda, we can use this feature in codegen to extract dependent expressions when split big methods into small methods.

Describe the solution you'd like
When lambda is Serializable, we can use java.lang.invoke.SerializedLambda#getCapturedArg to extract captured variables

[Java] add buffer callback

Is your feature request related to a problem? Please describe.
When there is a buffer which can be zero-copy serialized, Buffer callback should be invoked to handle this buffer.

If buffer callback returns false, the given buffer is out-of-band, thus zero-copied.

Additional context
#85

[Java] write/read duplciated enum string only once

Is your feature request related to a problem? Please describe.
When serializing multiple objects of same type, classname will be written to buffer multiple times. There should be a way to write classname only once, and in later classname writing, an id should be written.

Such classname are enumable string, there should be an abstraction to write such string only once.

Additional context
#70

[Java] add optimized map implementation

Is your feature request related to a problem? Please describe.
Serialziation will have many hash loopup:

  • look up serializer based on object type
  • loop up reference if ref tracking is enabled

We need a very fast map implementation to avoid map lookup become bottleneck

Describe the solution you'd like
Use linear probing and fib rehash

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] Extensible classloader support

Is your feature request related to a problem? Please describe.
Fury Java JIT will genereate byte codes for generated serializer class, which will be loaded as an class in a new or existed classloader.

Class define and loading should ensure it won't create too much new classloaders, and new classes are eligiable to gc, and doesn't pollute exsting classloaders.

Additional context

#28 #33

[Docs] fix readme syntax

Is your feature request related to a problem? Please describe.
Readme has some syntax and not readable

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] unsafe field accessor

Is your feature request related to a problem? Please describe.
Getting field value by reflection is slow, using unsafe sun.misc.Unsafe#getXXX(java.lang.Object, long) is much faster

[Java] faster auto-growing object array

Is your feature request related to a problem? Please describe.
Java ArrayList is slower:

  • get/set index checks which may be unnecessary sometimes.
  • clear is not fast if list is too long
  • Allocate a new arraylist will grow from the bigining which may incur extra copy and memory allocation cost.

We should implement a faster auto-growing object array.

Describe the solution you'd like

  • Implement an ObjectArray which hold Object[] array inernally.
  • Skip index checks
  • Use System.arraycopy from an null elemente array for clear

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] add long map support

Is your feature request related to a problem? Please describe.
Map with long type key using java.util.HashMap will incur boxing cost, a new map implementation is needed.

Describe the solution you'd like
Implement a new map with long[] key array and Object[] value array. Using linear probing and Fibonacci hashing.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
#42

[Java] IllgealArgumentException when IF operator has Return child

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

 String code = new Expression.If(
        ExpressionUtils.eq(Expression.Literal.ofInt(1), new Expression.Reference("classId", PRIMITIVE_SHORT_TYPE, false)),
        new Expression.Return(Expression.Literal.True),
        new Expression.Return(Expression.Literal.False)).genCode(new CodegenContext()).code();

Screenshots
image
Environment (please complete the following information):

  • OS: [e.g. Linux/Ubuntu]
  • JDK [e.g. jdk8]
  • Python
  • Gcc/Clang
  • Go
  • NodeJS
  • Fury Version [e.g. 22]

Additional context
Add any other context about the problem here.

[Java] cross-language type id

Is your feature request related to a problem? Please describe.
Add type id consisitent between languages

Describe the solution you'd like
Based on arrow type id: arrow/type_fwd.h
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] Janino compiler backend

Is your feature request related to a problem? Please describe.
Fury jit will generate java code based expression tree, we need a way to compile java code into bytecode.

Describe the solution you'd like
We can use janino compiler to compile java code into bytecode since it's faster than jdk compiler.

Describe alternatives you've considered
javax.tools.JavaCompiler is also feasible, but too slow and generated classfile only.

Additional context
Janino compiler doesn't support generics, the generated code shouldn't contains generics.

#28

[Java] Meta shared mode serialization

Is your feature request related to a problem? Please describe.
For class forward/backward compatibility, fury needs to send class meta to peer everytime, which is time-consuming, and consume more bandwidth.

Describe the solution you'd like
If the serialization sender and receiver are serialized serially in a certain context (TCP connection), then some metadata (class name, field name, final field type information, etc.) can be shared between multiple requests in that context. These type information will be sent to the other end during the first serialization in that context. This way, the other end can rebuild the same deserializer based on the type information, so that it can still deserialize correctly when the fields on the serialization and deserialization sides are inconsistent. At the same time, unnecessary metadata serialization overhead can be reduced in subsequent serialization.

Additional context
#197

[Java] java license auto format has no blank line before package declation

Describe the bug
Java license auto format has no blank line before package declation, which conflict with checkstyle plugin
To Reproduce

mvn -T10  clean license:format                            
mvn -T10  clean checkstyle:check                       

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. Linux/Ubuntu]
  • JDK [e.g. jdk8]
  • Python
  • Gcc/Clang
  • Go
  • NodeJS
  • Fury Version [e.g. 22]

Additional context
Add any other context about the problem here.

[Java] basic serialization framework

Is your feature request related to a problem? Please describe.
Implement java serialization framework for fury. JIT serialization are not contained in this issue.

Describe the solution you'd like
Serialization framework includes following classes:

  • Fury: serialization entrance for the users
  • ClassResolver: read/write class
  • ReferenceResolver: tracking reference
  • EnumStringResolver: write/read duplciated string only once
  • SerializationContext: add some context-related information, so that the serializers can set up relation between serializing different objects
  • Generics: Java generics to speed up serialzation and reduce size

[Java] support speedup inner serialization by using outer generics info

Is your feature request related to a problem? Please describe.
For class with nested generics such as:

class Foo {
  List<Integer> intLists;
  Map<String, List<Long>> map;
}

If we push Integer type to ListSerializer and String, List<Long> to MapSerializer, then ListSerializer will know every element is an Integer, there will be no need to query element serializer and write element type every time serializing those elements, thus much space/time efficient.

MapSerializer can use same mechanism. Also when serializing List<Long> value, MapSerializer can push Long to ListSerializer, which make nested list serialization more efficient too.

Java generics is erasured at runtime,List type won't have element type. We need a way to push and propagate those erasured generics along the serialization.

Describe the solution you'd like

  • Using guava TypeToken to extract generics
  • Create an Generics to record generics hierarchy and current generics
  • Create an GenericType to tracking children generics and binding serializer to reduce map loopup cost

Additional context
#70

[Java] memory read/write buffer support

Is your feature request related to a problem? Please describe.
Serialization will contain much memory read/write, a convinient and highly-efficient util is necessary:

  • provide read/write index
  • support heap/off-heap memory.
  • support varint.
  • binary compare, swap, and copy methods.
  • little-endian access.

Describe the solution you'd like
Using sun.misc.Unsafe for efficient memory operations, combine off-heap/heap memory together to avoid viritual methods call cost.
If heap buffer is null, Unsafe will locate to off-heap memory offset, otherwise locate to heap memory address.

Describe alternatives you've considered
Make memory buffer as an interface and off-heap/heap buffer as implementation is feasible, but will incur viritual methods call which is unaccepable for such perf-critical scene.

Additional context
Add any other context or screenshots about the feature request here.

[Java] basic type inferrence support

Is your feature request related to a problem? Please describe.
Java is a strong-typed language, class fields have types and generics. By using those type info, serialization performance and size can be improved notably.

Type inferrence performance is critical, since first serialziation will infer object fields type info. If inferrence is slow, there may be burr when serving requests, which is unacceptable.

Describe the solution you'd like

  • extract java generics based on guava TypeToken
  • parallel mutil-threaded generics parsing
  • guava generics parsing speedup
  • descriptors cache
  • ignore fields annotated by @Ignore

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
#29

[Java] Reference tracking support

Is your feature request related to a problem? Please describe.
Java object graph may have shared or circular reference between each other.
Serialization should support tracking such reference to avoid writing duplciate data or recursion error.

At the same time, reference tracking will need a map to track ref, which is pretty slow, althogh we can use optimized map in io.fury.collection. So there should be an option to disable ref tracking.

Describe the solution you'd like
ReferenceResolver is an abstract interface. MapReferenceResolver tracking reference by map, NoReferenceResolver just ignore reference.

Describe alternatives you've considered
Binding a reference resolver for every type, i.e. implement a hierarchical resolver may have better performance at some cases?

Additional context
#70

[Java] add java serializer interface

Is your feature request related to a problem? Please describe.
add java serializer interface, a new inter type support will only need to implement the serializer for that type.

Describe the solution you'd like

public abstract class Serializer<T> {
  public void write(MemoryBuffer buffer, T value) {
    throw new UnsupportedOperationException();
  }

  public T read(MemoryBuffer buffer) {
    throw new UnsupportedOperationException();
  }
  public void crossLanguageWrite(MemoryBuffer buffer, T value) {
    throw new UnsupportedOperationException();
  }

  public T crossLanguageRead(MemoryBuffer buffer) {
    throw new UnsupportedOperationException();
  }
}

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
#70

[Java] JIT codegen framework

Is your feature request related to a problem? Please describe.
Add java jit framework for speed serialization.

Describe the solution you'd like
image
The implementation will be divided into:

Describe alternatives you've considered

Additional context

[Java] add int array to avoid boxing cost

Is your feature request related to a problem? Please describe.
JDK ArrayList<Integer> has boxing overhead, which is unacceptable for perf critical serialization scene. An auto-growing IntArray is needed in such cases.

Describe the solution you'd like
Implement an auto-growing IntArray which hold a int[] internally.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] setup basic java code structure

Setup basic java code structure:

  • fury-core: core memory/collection/jit/serialization support
  • fury-format: readable/writable binary format
  • fury-test-core: reusable test utils across modules
  • fury-testsuite: complex test suites beyond unit tests

Expression IR for express code logic

Common IR: ValueExpression、ListExpression、Literal、Reference、Empty、Block、FieldValue、SetField、Cast、Invoke、StaticInvoke、NewInstance、NewArray、AssignArrayElem、If、IsNull、Not、Comparator、Arithmetic、Add、Subtract、ForEach、ZipForEach、ForLoop、ListFromIterable、Return

[Java] Fast string serialization support

Is your feature request related to a problem? Please describe.
String is very common in serialization, but due to its variable length and mutiple encoding, string serialization is pretty slow, sometimes is becomes the bottle of whole serialization. We need a way for fast string serialization.

The bottle mainly consists of:

  • String data serialization copy: copy inner char[] / byte[] outside for serialization.
  • String encoding: encoding char[] / byte[] into ascii/unicode16/utf8
  • String decoding: decoding binary into ascii/unicode16 char[]/byte[]
  • String creation copy cost: java.lang.String will copy provided char[]/byte[] for immutability.

Describe the solution you'd like

  • Use sun.misc.Unsafe for extract inner char[] / byte[]
  • Support ascii/unicode16/utf8 to minimize encoding cost
  • Add encoding flag in data to support multiple encoding
  • Use java.lang.invoke.MethodHandle to avoid invoke package-level zero0-copy constructor with minimal cost

[Java] add tuples support

Is your feature request related to a problem? Please describe.
Java lacks of tuple support, which is common in other languages such as cpp/python/golang, and is useful as an common data structure for use by users and by fury itself.

Describe the solution you'd like
Add tuple2/tuple3 support for now, other tuple classes can be added later.

Describe alternatives you've considered

Additional context

[Java] add java api annotation to mark api stability

Is your feature request related to a problem? Please describe.
Fury is in rapid development, and serialization is used commonly.
we need a way to remind the users which api are stable and which is expected to change.

Describe the solution you'd like
add java api annotation to mark api stability:

  • @Public is stable
  • @Internal is subject to change

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] add string utils

Is your feature request related to a problem? Please describe.
The java jit codegen needs to generate java code string, which need some string utils such as format/stripBlankLines/capitalize/uncapitalize/isBlank.
Describe the solution you'd like
Copy capitalize/uncapitalize/isBlank from common-lang, implement others.

Describe alternatives you've considered
Add common-lang is OK, but will introduce an dependency which we try to avoid since serialziation is so commonly used.

Additional context

[Doc] debugging doc

Is your feature request related to a problem? Please describe.
Binary protocol bug is hard to debug, when there is a bug in implementation, crash will happen sometimes. A detailed debugging doc is necessary for trouble shotting.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] disable fury java logging more easily

Is your feature request related to a problem? Please describe.
Fuy will print some logs for diagnostics, although it's useful, but sometimes may be annoying. We should support disabling logging.

Describe the solution you'd like
When logging disabled, switch to org.slf4j.helpers.NOPLogger#NOP_LOGGER

Describe alternatives you've considered
configure log4j2.xml/log4j2.properties for io.fury package

Additional context

[Java] add reflection common utils support

Is your feature request related to a problem? Please describe.
Serialization will use reflections frequently in codegen or serialization, a reflection utils will be convinient for code reuse

[Java] Optimize StringBuilder/StringBuffer serialization

Is your feature request related to a problem? Please describe.
In #93, we implement StringBuilder/StringBuffer serialization by converting to/from java.lang.String, which have some copy cost. A better solution is tackle inner data structure to avoid this copy.

Additional context
#92 #93

[Java] zero-copy support

Is your feature request related to a problem? Please describe.
Support zero-copy to avoid large buffer serialization cost

Describe the solution you'd like
Python pickle5 out-of-band serialization is zero-copied, fury can implement similar protocol, but in a cross-language way.

[Java] JDK 17 string derialization zero-copy

Is your feature request related to a problem? Please describe.
Due to strong encapsulation in JDK17 is enabled by default, we can't get String zero-copy constructor without some hacks, the deserialization of string in JDK17 will have an extra copy when creating String object.

Additional context
#90

[Java] add unsafe memory util support

Is your feature request related to a problem? Please describe.
Serialization has frequent memory operations, efficient memory access is necessary for performance, JDK unsafe is an efficient util for this case

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

[Java] support serializing enum by string

Is your feature request related to a problem? Please describe.
#97 implements enum serialization by writing enum ordinal, this is fast. But when enum constants are reordered, deserialization will get wrong value.

Describe the solution you'd like
Support serialization by enum string, but in a configurable way.

By default, serialization enum using ordinal. But can be configured to using enum string for serialization,
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
#96 #97

[Community] Getting involved guide

Is your feature request related to a problem? Please describe.
There should be a way to guide users to get involved

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.