Cérès is a Coq library for serialization to S-expressions.
S-expressions are uniform representations of structured data.
They are an alternative to plain strings as used by Show
in Haskell and
Debug
in Rust for example.
S-expressions are more amenable to programmatic consumption, avoiding custom
parsers and enabling flexible formatting strategies.
This S-expression...
(example
(message "I'm a teapot")
(code 418))
... corresponds to this sexp
in Coq.
Definition example : sexp :=
[ Atom "example"
; [ Atom "message" ; Atom (Str "I'm a teapot") ]
; [ Atom "code" ; Atom 418%Z ]
].
The tutorial module is a good place for a quick start.
Link to the rendered documentation.
This library offers a type sexp
with two constructors:
Inductive sexp :=
| Atom (a : atom)
| List (xs : list sexp)
.
Atoms are identifiers (Raw
), numbers (Num
) or strings (Str
).
Variant atom : Set :=
| Num (n : Z) (* Integers. *)
| Str (s : string) (* Literal strings. *)
| Raw (s : string) (* Simple atoms (e.g., ADT tags). *)
(* Should fit in this alphabet: [A-Za-z0-9'=+*/:!?@#$%^&<>.,|_~-]. *)
.
Two classes Serialize
and Deserialize
are provided to define canonical
serializers and deserializers between your types and S-expressions.
The following functions are provided for conversions to sexp
or directly to
string
:
Definition to_sexp {A} `{Serialize A} : A -> sexp.
Definition from_sexp {A} `{Deserialize A} : sexp -> error + A.
Definition to_string {A} `{Serialize A} : A -> string.
Definition from_string {A} `{Deserialize A} : string -> error + A.
Import the main module of the library.
Require Import Ceres.Ceres.
This exports:
CeresS
: the core definitions for S-expressions.CeresSerialize
: theSerialize
type class (sexp -> error + mytype
).CeresDeserialize
: theDeserialize
type class (mytype -> sexp
).CeresRoundtrip
: roundtrip properties for serializers and deserializers.CeresFormat
: format S-expressions as strings (sexp -> string
).CeresParser
: S-expression parser (string -> error + sexp
).
Other modules in the library:
CeresParserUtils
: low-level primitives for the S-expression parserCeresParserRoundtrip
,CeresParserRoundtripProof
: Correctness proof of the parser. Currently, only soundness is proved (i.e., parse-then-print roundtrip).
Internals:
CeresUtils
: miscellaneous.CeresParserInternal
: S-expression parser, internalsCeresString
: general string utilities.
The type of S-expressions is actually parameterized by the type of atoms.
Inductive sexp_ (A : Type) :=
| Atom_ (a : A)
| List (xs : list (sexp_ A))
.
By default, it is specialized to the atom
type, so that the main S-expression type is
sexp := sexp_ atom
.
Notation sexp := (sexp_ atom).
Notation Atom := (@Atom_ atom).
Coercion Num : Z >-> atom.
Coercion Raw : string >-> atom.
(* Destructors *)
Definition get_Num : atom -> option Z.
Definition get_Str : atom -> option string.
Definition get_Raw : atom -> option string.
Serializers can be defined as instances of the Serialize
type class.
Class Serialize (A : Type) : Type :=
to_sexp : A -> sexp.
S-expressions can be serialized to a string
. Thus, so can serializable types.
Definition to_string {A : Type} `{Serialize A} : A -> string.
For numeric types, it is sufficient to define a conversion to Z
as an
instance of Integral
.
Class Integral (A : Type) : Type :=
to_Z : A -> Z.
Instance Serialize_Integral (A : Type) : Integral A -> Serialize A.
Going the other way requires some additional error handling.
Class Deserialize (A : Type) : Type := ...
Definition from_sexp {A} `{Deserialize A} : sexp -> error + A.
Definition from_string {A} `{Deserialize A} : string -> error + A.
Again, a simplified interface for numeric types is thus provided,
with a SemiIntegral
class.
Class SemiIntegral (A : Type) : Type :=
from_Z : Z -> option A.
Instance Deserialize_SemiIntegral (A : Type) : SemiIntegral A -> Deserialize A.
The module CeresRoundtrip
defines some roundtripping properties
and lemmas to help prove them.
Class CompleteClass {A} `{Serialize A} `{Deserialize A} : Prop.
Class SoundClass {A} `{Serialize A} `{Deserialize A} : Prop.
There are no strong requirements on the encodings implemented by Serialize
and Deserialize
instances, but some utilities are provided for the following
default encoding for inductive types:
- Nullary constructors are atoms
con
. - Non-nullary constructors are lists
(con x y z)
.
Serialization is straightforward by pattern-matching.
For deserialization, the module CeresDeserialize.Deser
provides
some utilities.
unit
,bool
,sum
andoption
follow that standard encoding.(x, y) : prod A B
is encoded as(X Y)
whereX
andY
are the encodings ofx
andy
.[x; y; z] : list A
is encoded as(X Y Z)
."somestring" : string
is encoded as"somestring"
."c" : ascii
is encoded as"c"
.33 : Z
(orN
,nat
) is encoded as33
.
Recursive serializers and deserializers can be defined using explicit fixpoints.
For deserializers, that means to bind the expression explicitly, since that's
the decreasing argument, but immediately pass it as the last argument of
Deser.match_con
:
Definition Deserialize_unary : Deserialize nat :=
fix deser_nat (l : loc) (e : sexp) {struct e} :=
Deser.match_con "nat"
[ ("Z", 0%nat) ]
[ ("S", Deser.con1 S deser_nat) ] l e.
Extra directories:
coqdocjs
: a clone of coqdocjsdoc
: a clone of this repo'sgh-pages
branch
make html && rm -r doc/docs && mv html doc/docs
cd doc
git add docs && git commit -m "Update" && git push
- Real World OCaml, Chapter 17, Data Serialization with S-expressions.
- Down with Show!, a blog post by Harry Garrood advocating for using structured representations instead of strings.