Coder Social home page Coder Social logo

datatype99's Introduction

Datatype99

Safe, intuitive algebraic data types with exhaustive pattern matching & compile-time introspection facilities. No external tools required, pure C99/C++11.

Highlights

  • Type-safe. Improperly typed variants and non-exhaustive pattern matching are caught at compile-time.

  • Portable. Everything you need is a standard-conforming C99/C++11 preprocessor.

  • Predictable. Datatype99 comes with formal code generation semantics, meaning that the generated data layout is guaranteed to always be the same.

  • Comprehensible errors. Despite that Datatype99 is built upon macros, compilation errors are usually comprehensible.

Installation

  1. Download Datatype99 and Metalang99 (minimum supported version -- 1.2.0).
  2. Add datatype99 and metalang99/include to your include paths.
  3. #include <datatype99.h> beforehand.

PLEASE, use Datatype99 only with -ftrack-macro-expansion=0 (GCC) or something similar, otherwise it will throw your compiler to the moon. Precompiled headers are also very helpful.

Usage

(The full example: examples/binary_tree.c.)

A sum type is created using the datatype macro. I guess you have already caught the syntax but actually there exist one more kind of a variant: an empty variant which is expressed simply as (Foo). It holds no data.

Pattern matching is likewise intuitive. Just a few brief notes:

  • To match an empty variant, write of(Foo) { ... }.
  • To match the default case, i.e. when all other cases failed, write otherwise { ... }.
  • To ignore a variable inside of, write _: of(Foo, a, b, _, d).

Also, you can introspect your sum types at compile-time; see examples/derive/ for the examples.

Happy hacking!

Syntax and semantics

Having a well-defined semantics of the macros, you can write an FFI which is quite common in C.

EBNF syntax

<datatype>      ::= "datatype(" [ <derive-clause> "," ] <datatype-name> { "," <variant> }+ ")" ;
<record>        ::= "record(" [ <derive-clause> "," ] <record-name> { "," <field> } ")" ;
<datatype-name> ::= <ident> ;
<record-name>   ::= <ident> ;

<variant>       ::= "(" <variant-name> { "," <type> }* ")" ;
<field>         ::= "(" <type> "," <field-name> ")" ;
<variant-name>  ::= <ident> ;
<field-name>    ::= <ident> ;

<derive-clause> ::= "derive(" <deriver-name> { "," <deriver-name> }* ")" ;
<deriver-name>  ::= <ident> ;

<match>         ::= "match(" <lvalue> ")" { <arm> }+ ;
<matches>       ::= "matches(" <expr> "," <ident> ")" ;
<if-let>        ::= "ifLet(" <lvalue> "," <variant-name> "," <ident> { "," <ident> }* ")" <stmt> ;
<of>            ::= "of(" <variant-name> { "," <ident> }* ")" <stmt> ;
<otherwise>     ::= "otherwise" <stmt> ;

Semantics

(It might be helpful to look at the generated data layout of examples/binary_tree.c.)

datatype

  1. Before everything, the following type definition is generated:
typedef struct <datatype-name> <datatype-name>;
  1. For each non-empty variant, the following type definition is generated (the metavariable <type> ranges over a corresponding variant's types):
typedef struct <datatype-name><variant-name> {
    <type>0 _0;
    ...
    <type>N _N;
} <datatype-name><variant-name>;
  1. For each non-empty variant, the following type definitions to types of each field of <datatype-name><variant-name> are generated:
typedef <type>0 <variant-name>_0;
...
typedef <type>N <variant-name>_N;
  1. For each variant, the following type definition to a corresponding sum type is generated:
typedef struct <datatype-name> <variant-name>SumT;
  1. For each sum type, the following tagged union is generated (inside the union, only fields to structures of non-empty variants are generated):
typedef enum <datatype-name>Tag {
    <variant-name>0Tag, ..., <variant-name>NTag
} <datatype-name>Tag;

typedef union <datatype-name>Variants {
    char dummy;

    <datatype-name><variant-name>0 <variant-name>0;
    ...
    <datatype-name><variant-name>N <variant-name>N;
} <datatype-name>Variants;

struct <datatype-name> {
    <datatype-name>Tag tag;
    <datatype-name>Variants data;
};
Note on char dummy;

(char dummy; is needed to make the union contain at least one item, according to the standard, even if all variants are empty. Such a datatype would enforce strict type checking unlike plain C enums.)

  1. For each variant, the following function called a value constructor is generated:
inline static <datatype-name> <variant-name>(...) { /* ... */ }
  1. Now, when a sum type is fully generated, the derivation process takes place. Each deriver is invoked sequentially, from left to right, as
ML99_call(DATATYPE99_DERIVE_##<deriver-name>, v(<datatype-name>), variants...)

where

  • variants... is a list of variants represented as two-place tuples: (<variant-name>, types...), where
    • types... is a list of types of the corresponding variant.

Put simply, a deriver is a Metalang99-compliant macro which is meant to automatically generate something global for a sum type, like interface implementations or almost any other stuff. If you are acquainted with Rust, Datatype99's derive macros are conceptually the same as the derive attribute. From my experience, derive macros allow for really nice, declarative, type-safe APIs.

record

record represents a record type: it is simply a struct for which the derivation process is defined.

  1. The following structure is generated:
typedef struct <record-name> {
    <type>0 <field-name>0;
    ...
    <type>N <field-name>N;
} <record-name>;
  1. Each deriver is invoked sequentially, from left to right, as
ML99_call(DATATYPE99_RECORD_DERIVE_##<deriver-name>, v(<record-name>), fields...)

where

  • fields... is a list of fields represented as two-place tuples: (<type>, <field-name>).

match

match has the expected semantics: it sequentially tries to match the given instance of a sum type against the given variants, and, if a match has succeeded, it executes the corresponding statement and moves down to the next instruction (match(val) { ... } next-instruction;). If all the matches have failed, it executes the statement after otherwise and moves down to the next instruction.

A complete match construct results in a single C statement.

of

of accepts a matched variant name as a first argument and the rest of arguments comprise a comma-separated list of bindings.

  • A binding equal to _ is ignored.
  • A binding not equal to _ stands for a pointer to a corresponding data of the variant (e.g., let there be (Foo, T1, T2) and of(Foo, x, y), then x has the type T1 * and y is T2 *).

There can be more than one _ binding, however, non-_ bindings must be distinct.

To match an empty variant, write of(Bar).

matches

matches just tests an instance of a sum type for a given variant. If the given instance corresponds to the given variant, it expands to truthfulness, otherwise it expands to falsehood.

ifLet

ifLet tries to match the given instance of a sum type against the given variant, and, if a match has succeeded, it executes the corresponding statement.

Think of ifLet(<expr>, <variant-name>, vars...) { /* ... */ } as of an abbreviation of

match(<expr>) {
    of(<variant-name>, vars...) { /* ... */ }
    otherwise {}
}

A complete ifLet construct results in a single C statement.

Unit type

The unit type UnitT represents a type of a single value, unit_v (it should not be assigned to anything else). UnitT and unit_v are defined as follows:

typedef char UnitT;
static const UnitT unit_v = '\0';

Derive helper attributes

You can pass named arguments to a deriver; these are called derive helper attributes. They must be specified as object-like macros of the form:

#define <variant-name>_<namespace>_<attribute-name> attr(/* attribute value */)

where <namespace> is either <datatype-name>/<record-name> or <variant-name>/<field-name> for datatype/record-specific and variant/field-specific attributes, respectively.

To manipulate derive helper attributes, there are a few predefined macros:

  • DATATYPE99_attrIsPresent/DATATYPE99_ATTR_IS_PRESENT

    Accepts an attribute name and checks if it is present or not. It can be used to check the presence of an optional attribute.

  • DATATYPE99_attrValue/DATATYPE99_ATTR_VALUE

    Accepts an attribute name extracts its value. A provided attribute must be present.

  • DATATYPE99_assertAttrIsPresent

    Accepts an attribute name and emits a fatal error if the attribute is not present, otherwise results in emptiness. It can be used for mandatory attributes.

(The naming convention here is the same as of Metalang99.)

Miscellaneous

  • The macros DATATYPE99_MAJOR, DATATYPE99_MINOR, and DATATYPE99_PATCH stand for the corresponding components of a version of Datatype99.

  • If you do not want the shortened versions to appear (e.g., match without the prefix 99), define DATATYPE99_NO_ALIASES before #include <datatype99.h>.

  • For each macro using ML99_EVAL, Datatype99 provides its Metalang99-compliant counterpart which can be used inside derivers and other Metalang99-compliant macros:

Macro Metalang99-compliant counterpart
datatype DATATYPE99_datatype
record DATATYPE99_record
of DATATYPE99_of
ifLet DATATYPE99_ifLet

(An arity specifier and desugaring macro are provided for each of the above macros.)

  • There is a built-in deriver dummy which generates nothing. It is defined both for record and sum types.

Guidelines

Clang-Format issues

If you use Clang-Format, cancel formatting for a datatype definition using // clang-format off & // clang-format on to make it look prettier, as in the examples.

#undef derive helper attributes

Always #undef derive helper attributes after a corresponding datatype definition not to pollute your namespace.

Descriptive names

If the meaning of variant parameters is not clear from the context, give them descriptive names. This can be achieved in several ways:

// 1. Define type aliases to variant parameters.
typedef double XCoordinate;
typedef double YCoordinate;

typedef double Width;
typedef double Height;

datatype(
    Shape,
    (Point, XCoordinate, YCoordinate),
    (Rectangle, Width, Height)
);

// 2. Define separate structures.
typedef struct {
    double x, y;
} Point;

typedef struct {
    double width, height;
} Rectangle;

datatype(
    Shape,
    (MkPoint, Point),
    (MkRectangle, Rectangle)
);

Comparison:

  • The former option has more concise syntax: MkPoint(x, y) instead of MkPoint((Point){x, y}).
  • The latter option is more appropriate when the structures are to be used separately from the containing sum type.
  • The latter option allows for more graduate control over the data layout: you can accompain the structures with compiler-specific attributes, alignment properties like __attribute__ ((__packed__)), etc.

Pitfalls

  • Do not use break/continue inside statements provided to of and ifLet; use goto labels instead.
  • To specify an array as a variant parameter, you must put it into a separate struct; see examples/array_in_variant.c.
  • Bindings introduced by of are always mutable, so make sure you do not mutate them if the value passed to match is qualified as const.

Credits

Thanks to Rust and ML for their implementations of sum types.

Learning resources

FAQ

Q: Why use C instead of Rust/Zig/whatever else?

A:

  • Datatype99 can be integrated into existing code bases written in pure C.
  • Sometimes C is the only choice:
    • Some resource-constrained systems do not allow for a higher level programming language.
    • Some embedded devices only have C backend.
    • C has a stable ABI which is vital for some projects (e.g., plugin systems).

Q: Why not third-party code generators?

A: See Metalang99's README >>.

Q: How does it work?

A: In short, datatype expands to a tagged union with value constructors; match expands to a switch statement. To generate all this stuff, Metalang99 is used, a preprocessor metaprogramming library.

More on it in Compiling Algebraic Data Types in Pure C99.

Q: What is the difference between Datatype99 and Metalang99?

A: Metalang99 is a functional language for metaprogramming, whereas Datatype99 is an implementation of algebraic data types written in this language.

Q: What about compile-time errors?

A: Some kinds of syntactic errors are detected by the library itself (-E flag):

// !"Metalang99 error" (ML99_assertIsTuple): "Bar(int) must be (x1, ..., xN)"
datatype(A, (Foo, int), Bar(int));

The others are understandable as well:


Error: unknown type name specified in datatype

datatype(Foo, (FooA, NonExistingType));
playground.c:3:1: error: unknown type name ‘NonExistingType’
    3 | datatype(
      | ^~~~~~~~
playground.c:3:1: error: unknown type name ‘NonExistingType’
playground.c:3:1: error: unknown type name ‘NonExistingType’

Error: non-exhaustive match

match(*tree) {
    of(Leaf, x) return *x;
    // of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
}
playground.c: In function ‘sum’:
playground.c:6:5: warning: enumeration value ‘NodeTag’ not handled in switch [-Wswitch]
    6 |     match(*tree) {
      |     ^~~~~

Error: excess binders in of

match(*tree) {
    of(Leaf, x, excess) return *x;
    of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
}
playground.c: In function ‘sum’:
playground.c:15:9: error: unknown type name ‘Leaf_1’; did you mean ‘Leaf_0’?
   15 |         of(Leaf, x, excess) return *x;
      |         ^~
      |         Leaf_0
playground.c:15:9: error: ‘BinaryTreeLeaf’ has no member named ‘_1’; did you mean ‘_0’?
   15 |         of(Leaf, x, excess) return *x;
      |         ^~
      |         _0

Error: improperly typed variant arguments

BinaryTree tree = Leaf("hello world");
playground.c: In function ‘main’:
playground.c:18:28: warning: passing argument 1 of ‘Leaf’ makes integer from pointer without a cast [-Wint-conversion]
   18 |     BinaryTree tree = Leaf("hello world");
      |                            ^~~~~~~~~~~~~
      |                            |
      |                            char *
playground.c:6:1: note: expected ‘int’ but argument is of type ‘char *’
    6 | datatype(
      | ^~~~~~~~

Error: an undereferenced binder

int sum(const BinaryTree *tree) {
    match(*tree) {
        of(Leaf, x) return x; // x is int *
        of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);
    }
}
playground.c: In function ‘sum’:
playground.c:17:28: warning: returning ‘Leaf_0 *’ {aka ‘int *’} from a function with return type ‘int’ makes integer from pointer without a cast [-Wint-conversion]
   17 |         of(Leaf, x) return x; // x is int *
      |                            ^

From my experience, nearly 95% of errors make sense.

If an error is not comprehensible at all, try to look at generated code (-E). Hopefully, the code generation semantics is formally defined so normally you will not see something unexpected.

Q: What about IDE support?

Suggestion

A: VS Code automatically enables suggestions of generated types but, of course, it does not support macro syntax highlightment.

Q: What compilers are tested?

A: Datatype99 is known to work on these compilers:

  • GCC
  • Clang
  • MSVC
  • TCC

Troubleshooting

warning: control reaches end of non-void function [-Wreturn-type]

This is a known false positive occurring when match is used to return control back to a caller. Unfortunately, we cannot fix it in the library itself, but there are two workarounds:

  1. Disable this warning explicitly. With GCC diagnostic pragmas (or the Clang's counterpart):
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wreturn-type"
int foo(void) {
    match(x) {
        of(Foo, foo) return X;
        of(Bar, bar) return Y;
    }
}
#pragma GCC diagnostic pop
  1. Assign a result variable inside branches and return it after match:
int foo(void) {
    int result = 0;

    match(x) {
        of(Foo, foo) result = X;
        of(Bar, bar) result = Y;
    }

    return result;
}

See issue 9.

datatype99's People

Contributors

hirrolot avatar nibirmarduk88 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.