Coder Social home page Coder Social logo

tape's Introduction

Tape

Welcome to the tape repository! This readme will focus on how tape is architectured and on how it works internally. For more general information about what tape is, see this other readme.

Tape is divided into the following layers:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ tape assist                         ┃
┣━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┫
┃ adapter generation ┃ taped-packages ┃
┣━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━┫
┃ adapter framework                   ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ block framework                     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
  • Block framework: Turns Blocks (declarative low-level primitives like Uint8Block or ListBlock) into bytes and the other way around.
  • Adapter framework: Provides the primitives for writing, registering and looking up adapters.
  • Adapter generation: Code generation of adapters based on annotations like @TapeClass. Also checks and ensures backwards-compatibility.
  • Taped-packages: Ecosystem of packages named taped_... for types from pub.dev packages. Maintained by the community.
  • Tape assist: Tool helping you annotate classes, as well as looking for and initializing taped-packages.

Apart from some community-maintained taped-packages, most of the layers exist within these repository. The top folders represent pub.dev packages:

  • tape contains things used during runtime to actually encode the values. That's the block and adapter framework.
  • tapegen contains things related to code generation and improving experience during development – adapter generation and tape assist.

Writing custom adapters

Most of the time, using the adapter generator is fine. Here are some cases where you might need to create your own:

  • You're using a package that has a custom type and no taped-package exists for it.
  • You're publishing a package with a custom type and you want users to be able to tape that type without littering your original package with adapters.
  • You want the encoding to be more efficient.

If your type is a class, just extend TapeClassAdapter<TheType>. In the toFields(TheType object) method, return a Fields class that maps all these fields to unique ids:

class AdapterForTheType extends TapeClassAdapter<TheType> {

  @override
  Fields toFields(TheType object) {
    return Fields({
      0: object.someString,
      1: object.dynamicField,
      2: Int8(object.smallInt),
    });
  }

  @override
  TheType fromFields(Fields fields) {
    return TheType(
      someString: fields.get<String>(0, orDefault: ''),
      dynamicField: fields.get<dynamic>(1, orDefault: null),
      smallInt: fields.get<Int8>(2, orDefault: Int8.zero).toInt(),
    );
  }
}

Publishing a taped-package

If you want to publish the package to pub.dev, consider naming it taped_<name of the original package>. For example, if your package is named sample, it would be taped_sample.
Adhering to this naming scheme allows tape assist to automatically find that package and suggest it to users when they add a @TapeClass annotation to a class that contains a field of a type from your package.

Also, you should give your adapter a negative type id to not interfere with the adapters created by the end-user. File a PR for reserving a type id in the table of reserved type ids.

Additionally, add a tape.dart to your package root (so it can be imported with import 'package:taped_sample/tape.dart';) with the following content:

extension InitializeSample on TapeApi {
  void initializeSample() {
    registerAdapters({
      // Use your reserved type ids here.
      -4: AdapterForTheType(),
      -5: AdapterForOtherType(),
      ...
    });
  }
}

Behind the scenes: Searching for the right adapter

Note: This is not up-to-date.

Adapters are stored in a tree, like the following:

root node for objects to serialize
├─ virtual node for Iterable<Object>
│  ├─ AdapterForRunes
│  │  └─ AdapterForNull
│  └─ ...
├─ virtual node for int
│  ├─ AdapterForUint8
│  ├─ AdapterForInt8
│  ├─ AdapterForUint16
│  ├─ AdapterForInt16
│  ├─ AdapterForUint32
│  ├─ AdapterForInt32
│  └─ AdapterForInt64
├─ virtual node for bool
│  ├─ AdapterForTrueBool
│  └─ AdapterForFalseBool
├─ virtual node for String
│  ├─ AdapterForStringWithoutNullByte
│  └─ AdapterForArbitraryString
├─ AdapterForDouble
└─ ...

You can always get such a tree visualization of the adapter tree by calling TypeRegistry.debugDumpTree().

Additionally, the TypeRegistry contains a map of shortcuts from types to nodes in the tree.

Behind the scenes: How is data encoded

See The Life of a Fruit.

tape's People

Contributors

marcelgarus avatar jonaswanke avatar

Stargazers

Sacha Arbonel avatar  avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

tape's Issues

Create repository for database

I'd love to tinker with building a rock-solid database based on tape.
I could think of two architecturally different approaches to a database:

  1. Write a low-level wrapper in another language, like Rust.
  2. Write everything in Dart.

Here, I want to shed some light on the tradeoffs of these two approaches.

Obviously, Rust itself is much more performant than Dart. Also, its advanced memory management features make it safe to quickly spawn lightweight threads operating on the same data structure instead of falling back on Dart's monolithic Isolate model and having to deal with message passing.
For example, that would make it possible to execute a query by spawning numerous threads that search the database in parallel.

Using a low-level wrapper also means we could use established database solutions like SQLite or indexdb.
These also provide amazing performance for queries as well as advanced indexing capabilities.

That being said, database performance is usually I/O-bound and Dart's RandomAccessFile would allow us to implement some indexing and query capabilities in Dart as well.
Together with spawning a separate isolate for each database instance, this would probably also result in reasonable performance, although probably a magnitude slower than a battle-tested native solution.

A great advantage of pure Dart, on the other hand, is that it doesn't require any native configuration or dealing with native build systems – the code automatically runs (almost) everywhere where Dart runs, be that Windows, Linux, MacOS, Android, iOS, or Fuchsia. Only web would need to be handled differently.

Writing everything in Dart would also mean less overhead dealing with the boundary between native code and Dart.

Implement Codec

The Codec class fits Tape's use case and is also implemented by, for example, JSON.

Clean up packages

  • Remove unused files
  • Think about whether to include pubspec.lock or .packages

Decide on a name for taped-packages

Assume a package is named sample. Should the corresponding package containing TapeAdapters be named

  • taped_sample
  • sample_taped
  • sample_for_tape
  • something else?

Also, how should the documentation talk about all of these packages? Should they be called taped-packages? Just packages for tape? tape-ready packages? tape extension packages?

Make registry more modular

Currently, if someone wants to publish a package that internally uses tape to serialize types, the adapters also pollute the main namespace – either the types have to be registered (which is useless if they're only used internally) or they use positive type ids, forcing users to register them.

Users should be able to create their own private TapeRegistry contained inside a package and use that to register and serialize types.

Support prefixed imports

i.e. generating code for classes with import 'other_file.dart' as other; and using other.SomeType

Provide taped-package suggestion hints

If tape assist is running and users create a @TapeClass that has a field of an external type, i.e. Flutter's Color, encoding won't work without registering an AdapterFor<Color>. It would be amazing if instead of failing at runtime, we could see that an AdapterForColor is missing in the tape.dart, lookup the package of the type (package:flutter), check if a taped_flutter package exists on pub.dev, and suggest that to the user.

Update tape.lock

After generating the adapters, we should update the tape.lock file to reflect the changes.

Implement tape init

  • generate a tape.dart (only if it doesn't exist yet)
  • hook into the newly generated tape.dart in the main.dart
  • update tape version in the pubspec
  • add build_runner and tapegen as dev_dependencies in pubspec
  • run pub get if pubspec changed

Implement tapegen assist

  • generate annotations where needed
  • add onDefault: parameters to @TapeField annotations that don't have them
  • register TapeAdapters in tape.dart (unless marked with @doNotRegister)

Document values

People looking at the encoding may be confused by its trade-offs.
We should document the values of tape, which are (off the top of my head) in this particular order:

  • Provide a type-safe encoding of all possible Dart objects
  • Usability / great developer experience
  • Encoding speed
  • Encoding size

A new serialization format

The current serialization format is pretty close to the one hive uses, but we have the chance to develop our own custom, better format!

Here's how it currently works: For each adapter, the adapter id and the number of bytes are encoded followed by the actual bytes. That leaves adapters lots of freedom (they can just write bytes and read bytes). But at the cost of what?

The primary drawback is that the encoding is not self-descriptive. It's top down-defined (adapters below us can do whatever they want, we just give them the space to do so), rather than bottom-up (we define a few primitives that all adapters can use).
How would a bottom-up encoding look like? Just like JSON, we could define some custom basic building blocks as well as elements that combine/compose others. These elements could be the standard JSON ones (int, String, bool, List, Map), but also more low-level elements (think uint8, uint16, Uint8List, …) as well as more high-level elements (type-safe class, enum).
The downside is that we leave adapters less room for optimization. But I believe that's a good trade-off to make because users usually don't define custom adapters anyway and adapters can still use Uint8List as an escape hatch to define their own custom format.
Being able to debug and inspect the encoding is really important for stability. Encoding size comes somewhat secondary, especially since we can still compress the encoding.

How a self-descriptive encoding could work

Let's look at the usual example class:

@TapeClass class Fruit {
  Fruit(this.name, this.amount);

  @TapeField(0) final String name;
  @TapeField(1) final int amount;
}

Also, let's assume that these are the type ids of the used types (All the encoding that follows is hex-encoded, so two letters/digits are a byte):

Fruit:  00 00 00 00 00 00 00 00
String: ff ff ff ff ff ff ff ff
int:    ff ff ff ff ff ff ff fe

How would the old and new encoding encode the value Fruit('foo', 9)?

This is how the old encoding looks:

00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 37  | 00 00 00 00 | ff ff ff ff ff ff ff ff | 00 00 00 00 00 00 00 07 | 00 00 00 03 | 66 6f 6f | 00 00 00 01  | ff ff ff ff ff ff ff fe | 00 00 00 00 00 00 00 08 | 00 00 00 00 00 00 00 09 |
Fruit type id           | encoding size (55 bytes) | name field  | String type id          | encoding size (7 bytes) | length      | f  o  o  | amount field | int type id             | encoding size (8 bytes) | integer value           |

In the new encoding, there is a fixed set of elements that can be encoded (class, int, String, enum etc.) and only some of them (class, enum) have type ids. So there's a level above those types. Let's call them elements and their respective ids element ids.
Because there's only a small set of elements, these ids only need one byte:

class:  00
int:    01
String: 02

The encoding would then look something like this:

00    | 00 00 00 00 00 00 00 00 | 00 00 00 00 | 02     | 00 00 00 03 | 66 6f 6f | 00 00 00 02  | 01  | 00 00 00 00 00 00 00 09 |
class | Fruit type id           | name field  | string | length      | f  o  o  | amount field | int | integer value           |

How would adapters change?

Rather than having imperative adapters that read and write to a TapeReader or TapeWriter, the new adapters would be declarative and simply declare the fields that the class has. (I didn't think about enums yet, but the old encoding doesn't work with them either. They would work quite similarly though.)

Here's a refresher on how the old adapter looks:

class AdapterForFruit extends AdapterFor<Fruit> {
  void write(TapeWriter writer, Fruit obj) {
    writer
      ..writeFieldId(0)
      ..write(obj.name)
      ..writeFieldId(1)
      ..write(field.amount);
  }

  Fruit read(TapeReader reader) {
    final fields = <int, dynamic>{
      for (; reader.hasAvailableBytes;) reader.readFieldId(): reader.read(),
    };
    return Fruit(
      fields[0] as String,
      fields[1] as int,
    );
  }
}

And here's the new one:

class AdapterForFruit extends AdapterFor<Fruit> {
  Fields toFields(Fruit obj) {
    return Fields({
        0: obj.name,
        1: obj.amount,
    });
  }

  Fruit fromFields(Fields fields) {
    return Fruit(
      // (with default values)
      fields.get<String>(at: 0, or: ''),
      fields.get<int>(at: 1, or: 2),
    );
  }
}

Benefits

  • Buffer lengths don't need to be saved anymore because the encoding is self-describing. If a type gets removed, there's not a blob of bytes that we don't understand, but instead we can still parse the structure (what's a class, which fields does it have etc.), we just don't know the Dart type to associate with that structure.
  • The format is debuggable (just like with JSON, you can look at the fields and values all the way down, even if you don't know the types that are being encoded. Instead of strings and brackets, this is a binary-packed format with type ids though. But one could think of offering a website where you could drop such an encoding and it would print the tree structure.
  • Adapters cannot misbehave and corrupt the encoding.
  • The APIs for clients are higher-level and thereby easier to use.

Implement tapegen doctor

  • show the OS and version
  • show the Dart version
  • show the Flutter version, if applicable irrelevant
  • show the tape version
  • show the tapegen version
  • show the versions of all used taped-packages
  • show a list of all TapeAdapters defined in the project
    • show their name
    • show the file path
    • show their registration status (including id)
    • show the tree hierarchy
  • show a GitHub link if it's a (public?) Git repo
  • show link to issue page with template selected
    Irrelevant in tapegen doctor itself. This should happen somewhere else, where an error actually occurs. Here, we don't have enough information to decide on an appropriate issue template.

Implement tapegen help

  • provide a general help screen
  • provide help for init
  • provide help for assist
  • provide help for doctor
  • provide help for help (easteregg opportunity)

Think about where to put the CLI code

Tape consists of two packages (tape and tapegen). In which package should the CLI live (the CLI is basically just a bin folder next to the lib folder)?

Intuitively, I would put it in tape. That would make commands simpler (pub run tape assist instead of pub run tapegen assist) and would also work if tapegen isn't installed. This might be useful for people who don't want to rely on adapter generation (like, authors of taped-packages), but still want features apart from annotation autocomplete, like generating the boilerplate code.
This would also help people who didn't read the readme correctly and forgot to add build_runner and/or tapegen as dev_dependencies. If they open an issue, they could be asked to run pub run tape doctor, which could tell them about the missing dependencies.

A downside is that some indirect (non-dev) dependencies would be added to packages, including (but not limited to) analyzer, dart_style, args, some of which are pretty big dependencies. Using them as dependencies rather than dev_dependencies has some downsides, outlined on dart.dev:

Using dev dependencies makes dependency graphs smaller. That makes pub run faster, and makes it easier to find a set of package versions that satisfies all constraints.

Another option would be to put everything in tapegen, and while that would lead to no extra dependencies (all the needed packages are in dev_dependencies), it would force people who don't use the annotations to also depend on tapegen for convenience functions.

The third option would be to split the functionality up into both packages.
Some commands like pub run tape init would be supported by tape itself while others like pub run tape assist would not work if tapegen isn't installed, and otherwise just call the corresponding tapegen command.
This would increase complexity, but we wouldn't need to include the extremely big packages as dependencies because we wouldn't be parsing Dart code. While analyzer and dart_style are not required in this case, we'd still need yaml, args, and some other minor ones though.

Note that we actually would benefit from parsing the Dart code in some cases. For example, we could implement a pub run tape check that checks if all TapeAdapters are registered.

Support subclasses

We should support serializing classes that extend other classes, like this:

class A {
  A(this.foo);

  final int foo;
}

class B {
  B({@required int foo, this.bar}) : super(foo);

  final int bar;
}

class C {
  C(int baz) : super(baz);
}

There are two parts to a solution for subclassing: Making it possible to write useful adapters for such classes and being able to generate such adapters automatically.

The first part – choosing an encoding

This part seems comparatively easy, but even this one is more difficult than it seems:

If we would simply agree that the superclass and subclass share a field id namespace, that would lead to difficult management problems: Imagine, A uses field id 0 and B uses field id 1. If A adds a new field id, it would need to choose id 2 (the next free one). Then, if B would add a new field, it would need to use field id 3 – already, things are pretty complicated (A is responsible for fields 0 and 2, B for 1 and 3) and that's only with two classes! Imagine having to find new field ids for new fields in a class with multiple inheritance levels below and above. That sounds like a pure nightmare to me.

Instead, the most promising solution I came up with so far would be to give each class its own field id namespace and have a block that is able to save multiple hierarchies – for example, if our class has two classes above it in the hierarchy, we could use a ListBlock containing three FieldsBlock, the first one containing the fields of the top-most class, the second one the fields of the class below that and the third one the fields of our actual class. Field ids remain used only in their own class, which makes for a much more hygienic solution.

Instead of saving them all in a ListBlock, we could also introduce a new block that saves a tuple – our own field ids and the parent class turned into a block. During decoding, we simply let the parent block take care of the decoding and then read useful information from the created object. That would certainly be the most loosely coupled solution – we would not depend on the inner workings of the parent class anymore, only on its public interface. If the parent class decides to change its inheritance hierarchy, choose to encode itself using a custom-built adapter or whatever, we don't care. We just use the parent class object as-is. Of course, this doesn't work for abstract parent classes (argh! This is so complicated!).
Another downside of this approach is that we create more objects than necessary. In the example given at the top, in the AdapterForB we would create an instance of A, just to get its foo property.

I'll have to think about the encoding a bit more. Until then, here are my thoughts so far on the second part of the problem:

The second part – auto-generating adapters

In order to support automatically creating superclasses, we need to understand the inner structure of the superclass. At least, we need to be able to parse its representation in the block layer to be able to turn it into a constructor.
We need to know which fields A has so that we can pass the right parameters into As constructor so that the super call uses the right parameters.
Sadly, there's no perfect way to do this. Up until now, we used constructors based on which of its parameters were initializing formals – that's the this.foo syntax in constructor parameters.
That doesn't work anymore with subclasses, because they accept values like String bar and then pass them to their superclass's constructor via super(bar).
That makes matching a bit more difficult. For example, the subclass could have the constructor

Bar({
  String first,
  String second,
  this.baz,
}) :
    super(first: second, second; first);

or even use calls like super(first: baz, second: baz). It's really difficult to decide what to do in such cases. It'll probably take some time until I figure this out…

When to specify default values?

Tape should be compatible with other encoding versions. When removing a field, a previous decoder should use a default value for the field. How should we go about that?

  • a: Force users to specify default values upfront for every field.
  • b: Force users to specify default values when they remove fields.

Arguments:

  • ++b: When removing fields, developers have more experience and probably choose better default values.
  • -a: Changing default values may lead to weird behavior. For example, having a bool field with default value true in version one of your app, false in version two, and removed from that point on leads to two versions interpreting some bytes differently.
    • If users change the defaul value, we could make them aware that they did and warn them about this behavior.
  • ++a: We could just remove the fields entirely without littering the code with removed fields or littering the encoded bytes with values. There is no technical debt in removing a field.

Decision: a

This is an open design decision. If you have any other arguments or options that weren't considered yet, please don't hesitate to comment.

Add tool for removing tape

Let's be real. If developers realize that tape is not the right tool for the job, we want to support them too. It would be great if there was a tapegen remove command (or something similar) that

  • removes all tape annotations from all classes
  • removes the tape.dart file
  • removes the initializeTape() call from main
  • removes imports of package:tape/tape.dart
  • removes tape from dependencies as well as taped-packages
  • removes tapegen from dev_dependencies
  • if there's no other dev dependency apart from build_runner, remove that too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.