Coder Social home page Coder Social logo

migrant's Introduction

Migrant

Copyright (c) 2012-2022 Antmicro

View on Antmicro Open Source Portal

Introduction

Migrant is a serialization framework for .NET and Mono projects. Its aim is to provide an easy way to serialize complex graphs of objects, with minimal programming effort.

Directory organization

There are three main directories:

  • Migrant - contains the source code of the library;
  • Tests - contains unit tests (we're using NUnit);
  • PerformanceTester - contains performance assessment project;

There are two solution files - Migrant.sln, the core library, and MigrantWithTests.sln, combining both tests project and Migrant library.

Usage

Here we present some simple use cases of Migrant. They are written in pseudo-C#, but can be easily translated to other CLI languages.

Simple serialization

var stream = new MyCustomStream();
var myComplexObject = new MyComplexType(complexParameters);
var serializer = new Serializer();

serializer.Serialize(myComplexObject, stream);

stream.Rewind();

var myDeserializedObject = serializer.Deserialize<MyComplexType>(stream);

Open stream serialization

Normally each serialization data contains everything necessary for serialization like type descriptions. Those data, however, consume space and sometimes one would like to do consecutive serializations where each next one would reuse context just as if they happen as one. In other words, serializing one object and then another would be like serializing tuple of them. This also means that if there is a reference to an old object, it will be used instead of a new object serialized second time. Given open stream serialization session is tied to the stream. API is simple:

var serializer = new Serializer();
using(var osSerializer = serializer.ObtainOpenStreamSerializer(stream))
{
    osSerializer.Serialize(firstObject);
    osSerializer.Serialize(secondObject);
}

Here's deserialization:

var serializer = new Serializer();
using(var osSerializer = serializer.ObtainOpenStreamDeserializer(stream))
{
    var firstObject = osSerializer.Deserialize<MyObject>();
    var secondObject = osSerializer.Deserialize<MyObject>();
}

But what if you'd like to deserialize all object of a given type until the end of stream is reached? Here's the solution:

osDeserializer.DeserializeMany<T>()

which returns lazy IEnumerable<T>.

As default, all writes to the stream are buffered, i.e. one can be only sure that they are in the stream after calling Dispose() on open stream (de)serializer. This is, however, not useful when underlying stream is buffered independently or it is a network stream attached to a socket. In such cases one can disable buffering in Settings. Note that buffering also disables padding which is normally used to allow speculative reads. Therefore data written with buffered and unbuffered mode are not compatible.

Reference preservation

Let's assume you do open stream serialization. In first session, some object, let's name it A, is serialized. In the second session, among others, A is serialized again - the same exact instance. What should happen? We prefer when reference is preserved, so that when A is serialized in second session, actually only its id goes to the second session - pointing to the object from the first session. However, to provide such feature we have to keep reference to A between sessions - to check whether it is the same instance as previously serialized. This means that until open stream serializer is disposed, a reference to A is held, hence it won't be collected by GC. To prevent this unpleasant situation we could use weak references. Unfortunately, as it turned out, with frequent serialization of small objects this can completely kill Migrant's performance - which is a core value in our library. So, to sum it up we offer user to choose from three mentioned options with the help of Settings class. The possible values of an enum ReferencePreservation are:

  • DoNotPreserve - each session is treated as separate serialization considering references;
  • Preserve - object identity is preserved but they are strongly referenced between sessions;
  • UseWeakReference - best of both worlds conceptually, but can completely kill your performance.

Choose wisely.

Deep clone

var myComplexObject = new MyComplexType(complexParameters);
var myObjectCopy = Serializer.DeepCopy(myComplexObject);

Simple types to bytes

var myLongArray = new long[] { 1, 2, ... };
var myOtherArray = new long[myLongArray.Length];
var stream = new MyCustomStream();

using(var writer = new PrimitiveWriter(stream))
{
   foreach(var element in myLongArray)
   {
      writer.Write(element);
   }
}

stream.Rewind();

using(var reader = new PrimitiveReader(stream))
{
   for(var i=0; i<myLongArray.Length; i++)
   {
      myOtherArray[i] = reader.ReadInt64();
   }
}

Surrogates

var serializer = new Serializer();
var someObject = new SomeObject();
serializer.ForObject<SomeObject>().SetSurrogate(x => new AnotherObject());
serializer.Serialize(someObject, stream);

stream.Rewind();

var anObject = serializer.Deserialize<object>(stream);
Console.WriteLine(anObject.GetType().Name); // prints AnotherObject

One can also use a Type based API, i.e.

serializer.ForObject(typeof(SomeObject)).SetSurrogate(x => new AnotherObject());

What's the usage? The generic surrogates, which can match to many concrete types. Such surrogates are defined on open generic types. Here is an example:

serializer.ForObject(typeof(Tuple<>)).SetSurrogate(x => x.GetType().GetFields(BindingFlags.Instance | BindingFlags.NonPublic)[0].GetValue(x));

Version tolerance

What if some changes are made to the layout of the class between serialization and deserialization? Migrant can cope with that up to some extent. During creation of serializer you can specify settings, among which there is a version tolerance level. In the most restrictive (default) configuration, deserialization is possible if module ID (which is GUID generated when module is compiled) is the same as it was during serialization. In other words, serialization and deserialization must be done with the same assembly.

However, there is a way to weaken this condition by specifing flags from a special enumeration called VersionToleranceLevel:

  • AllowGuidChange - GUID values may differ between types which means that it can come from different compilations of the same library. This option alone effectively means that the layout of the class (fields, base class, name) does not change.
  • AllowFieldAddition - new version of the type can contain more fields than it contained during serialization. They are initialized with their default values.
  • AllowFieldRemoval - new version of the type can contain less fields than it contained during serialization.
  • AllowInheritanceChainChange - inheritance chain can change, i.e. base class can be added/removed.
  • AllowAssemblyVersionChange - assembly version (but not it's name or culture) can differ between serialized and deserializad types.

As GUID numbers may vary between compilations of the same code, it is obvious that more significant changes will also cause them to change. To simplify the API, AllowGuidChange value is implied by any other option.

Collections handling

By default Migrant tries to handle standard collection types serialization in a special way. Instead of writing to the stream all of private meta fields describing collection object, only items are serialized - in a similar way to arrays. During deserialization a collection is recreated by calling proper adders methods.

This approach limits stream size and allows to easily migrate between versions of .NET framework, as internal collection implementation may differ between them.

Described mechanisms works for the following collections:

  • List<>
  • ReadOnlyCollection<>
  • Dictionary<,>
  • HashSet<>
  • Queue<>
  • Stack<>
  • BlockingCollection<>
  • Hashtable

There is, however, an option to disable this feature and treat collections as normal user objects. To do this a flag treatCollectionAsUserObject in the Settings object must be set to true.

ISerializable support

By default Migrant does not use special serialization means for classes or structs that implement ISerializable. If you have already prepared your code for e.g. BinaryFormatter, then you can turn on ISerializable support in Settings. Note that this is suboptimal compared to normal Migrant's approach and is only thought as a compatibility layer which may be useful for the plug-in serialization framework substitution. Also note that it will also affect framework classes, resulting in a suboptimal performance. For example:

Dictionary, one million elements. Without ISerializable support: 0.30s, 13.25MB With ISeriazilable support: 6.94s, 13.25MB

Dictionary, 10000 instances with one element each. Without ISerializable support: 44.79ms, 184.01KB With ISeriazilable support: 0.91s, 6.36MB

To sum it up, such a support is meant to be phased out during time in your project.

IXmlSerializable support

As with ISerializable Migrant can also utilize implementation of IXmlSerializable. In that case data is written to memory stream using XmlSerializer and then taken as a binary blob (along with type name).

(De)serializing the Type type

Directly serializing Type is not possible, but you can use surrogates for that purpose.

One solution is to serialize the assembly qualified name of the type:

class MainClass
{
	public static void Main(string[] args)
	{
		var serializer = new Serializer();
		serializer.ForObject<Type>().SetSurrogate(type => new TypeSurrogate(type));
		serializer.ForSurrogate<TypeSurrogate>().SetObject(x => x.Restore());

		var typesIn = new [] { typeof(Type), typeof(int[]), typeof(MainClass) };

		var stream = new MemoryStream();
		serializer.Serialize(typesIn, stream);
		stream.Seek(0, SeekOrigin.Begin);
		var typesOut = serializer.Deserialize<Type[]>(stream);
		foreach(var type in typesOut)
		{
			Console.WriteLine(type);
		}
	}
}

public class TypeSurrogate
{
	public TypeSurrogate(Type type)
	{
		assemblyQualifiedName = type.AssemblyQualifiedName;
	}

	public Type Restore()
	{
		return Type.GetType(assemblyQualifiedName);
	}

	private readonly string assemblyQualifiedName;
}

If you would also like to use the same mechanisms of version tolerance that are used for normal types, you can go with this code:

class MainClass
{
	public static void Main(string[] args)
	{
		var serializer = new Serializer();
		serializer.ForObject<Type>().SetSurrogate(type => Activator.CreateInstance(typeof(TypeSurrogate<>).MakeGenericType(new [] { type })));
		serializer.ForSurrogate<ITypeSurrogate>().SetObject(x => x.Restore());

		var typesIn = new [] { typeof(Type), typeof(int[]), typeof(MainClass) };

		var stream = new MemoryStream();
		serializer.Serialize(typesIn, stream);
		stream.Seek(0, SeekOrigin.Begin);
		var typesOut = serializer.Deserialize<Type[]>(stream);
		foreach(var type in typesOut)
		{
			Console.WriteLine(type);
		}
	}
}

public class TypeSurrogate<T> : ITypeSurrogate
{
	public Type Restore()
	{
		return typeof(T);
	}
}

public interface ITypeSurrogate
{
	Type Restore();
}

Features

Migrant is designed to be easy to use. For most cases, the scenario consists of calling one method to serialize, and another to deserialize a whole set of interconnected objects. It's not necessary to provide any information about serialized types, only the root object to save. All of the other objects referenced by the root are serialized automatically. It works out of the box for value and reference types, complex collections etc. While serialization of certain objects (e.g. pointers) is meaningless and may lead to hard-to-trace problems, Migrant will gracefully fail to serialize such objects, providing the programmer with full information on what caused the problem and where is it located.

The output of the serialization process is a stream of bytes, intended to reflect the memory organization of the actual system. This data format is compact and thus easy to transfer via network. It's endianness-independent, making it possible to migrate the application's state between different platforms.

Many of the available serialization frameworks do not consider complex graph relations between objects. It's a common situation that serializing and deserializing two objects A and B referencing the same object C leaves you with two identical copies of C, one referenced by A and one referenced by B. Migrant takes such scenarios into account, preserving the identity of references, without further code decoration. Thanks to this, a programmer is relieved of implementing complex consistency mechanisms for the system and the resulting binary form is even smaller.

Migrant's ease of use does not prohibit the programmer from controlling the serialization behaviour in more complex scenarios. It is possible to hide some fields of a class, to deserialize objects using their custom constructors and to add hooks to the class code that will execute before or after (de)serialization. With little effort it is possible for the programmer to reimplement (de)serialization patterns for specific types.

Apart from the main serialization framework, we provide a mechanism to translate primitive .NET types (and some other) to their binary representation and push them to a stream. Such a form is very compact - Migrant uses the Varint encoding and the ZigZag encoding. For example, serializing an Int64 variable with value 1 gives a smaller representation than Int32 with value 1000. Although CLS offers the BinaryWriter class, it is known to be quite clumsy and not very elegant to use.

Another extra feature, unavailable in convenient form in CLI, is an ability to deep clone given objects. With just one method invocation, Migrant will return an object copy, using the same mechanisms as the rest of the serialization framework.

Serialization and deserialization is done using on-line generated methods for performance (a user can also use reflection instead if he wishes).

Migrant can also be configured to replace objects of given type with user provided objects during serialization or deserialization. The feature is known as Surrogates.

Performance benchmarks against other popular serialization frameworks are yet to be run, but initial testing is quite promising.

Download

To download a precompiled version of Migrant, use NuGet Package Manager.

Compilation

To compile the project, open the solution file in your IDE and hit compile. You may, alternatively, compile the library from the command line:

msbuild Migrant.sln

or, under Mono:

xbuild Migrant.sln

Coding style

If you intend to help us developing Migrant, you're most welcome!

Please adhere to our coding style rules. For MonoDevelop, they are included in the .sln files. For Visual Studio we provide a .vssettings file you can import to your IDE.

In case of any doubts, just take a look to see how we have done it.

Licence

Migrant is created by Antmicro and released on an MIT licence, which can be found in the LICENCE file in this directory.

migrant's People

Contributors

cellmer avatar jakubjatczak avatar jpierson avatar konrad-kruczynski avatar kzwarycz avatar mars-low avatar mateusz-holenko avatar mateuszkarlic avatar mgielda avatar mszprejda avatar p-woj avatar piotrzierhoffer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

migrant's Issues

Warning during compilation

ObjectReader.cs(643,24): warning CS0414: The private field AntMicro.Migrant.ObjectReader.inlineRead' is assigned but its value is never used`

Way to save method

Proposed solution: method name + types of all parameter. We could also utilize usual cache thing here.

Verify the code of WriteMethodGenerator.GenerateWriteReference

It looks that there is no other option for formal type to be an actual type than the class to be sealed (or have a value type which is not an option here).

The question is why the CheckTransient is called twice sometimes? (When formalTypeIsActualType and formalType is transient).

Consider removing reflection like construction in ReadMethodGenerator

What I mean:

GenerateCodeCall<FieldInfo, object>((fi, target) => {
var ctorAttribute = (ConstructorAttribute)fi.GetCustomAttributes(false).First(x => x is ConstructorAttribute);
fi.SetValue(target, Impromptu.InvokeConstructor(fi.FieldType, ctorAttribute.Parameters));
                        });

but on the master branch (should be similar).

Surrogates

The surrogate is the object which is inserted in serialized data instead of some object. We are able to insert surrogate for an object of a given type and, conversely, insert an object for a surrogate (that is, first operation is performed during serialization and second during deserialization). API would be callback based, something like that:

  • serialization: SurrogateForObject<T>(Func<T, object> callback), where callback provides the surrogate;
  • deserialization: ObjectForSurrogate<T>(Func<T, object> callback), where callback provides object.

Naturally callback has to provide object that is assignable to formal type of a field, array element etc.

Callbacks should be provided in an immutable way, perhaps in Settings.

Collections fail to serialize

With

System.Collections.Generic.Dictionary<System.String,System.Collections.Generic.List<Emul8.Core.Symbol>> doesn't implement interface System.Collections.Generic.ICollection<System.Object>

(stack trace)
at (wrapper dynamic-method) System.Collections.Generic.Dictionary2<string, System.Collections.Generic.List1>.Write_Dictionary`2 (AntMicro.Migrant.ObjectWriter,AntMicro.Migrant.PrimitiveWriter,object) <0x00031>
at AntMicro.Migrant.ObjectWriter.WriteObjectInner (object) <0x0007c>

Address TODO comment

And I quote:
TODO: find all the usages and use verify or sth there
in line 40 of StampHelper.cs

Version tolerance: map between auto generated properties and their backing fields

So, currently we connect fields between version by their names. This is probably good approach most of the time but fails in the presence of auto generated properties and their backing fields. So we have to connect those two. I think the approach is to:

  • detect backing fields - I think it should be easy and one can simply judge basing on their name
  • find connection between backing fields and properties - not so straightforward, but we can scan properties, get IL of them and find which fields do they update - should be doable; in class stamp we would of course save name of the property, not field
  • OR we could maybe just use properties in such situation - save their name in the class stamp, but also use them as methods to obtain and save value - maybe better approach

What do you think?

Add support for ZigZag encoding

There should be an option to force the Serializer to use ZigZag encoding to shrink the size of negative number representation.

Deserializer should automatically detect used encoding.

Execute all tests on Windows

Please test Migrant on Windows, as if I remember correctly, we still haven't done that since last major change, i.e. generated deserialization.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.