This repository is a formalisation of the Michelson language using the Coq interactive theorem prover.
The simplest way to install the project is to use the OPAM package manager:
# Add the required opam repositories for Mi-Cho-Coq and its dependencies
opam repo add coq-released https://coq.inria.fr/opam/released
opam repo add coq-extra-dev https://coq.inria.fr/opam/extra-dev
# Install mi-cho-coq and its dependencies
opam install coq-mi-cho-coq
The following parts of the language are formalized:
- the type system
- the syntax
- the semantics
- the lexing
- the parsing
This formalisation is entirely based on the specification of Michelson.
In Michelson, an important distinction is made between comparable types and non-comparable ones. Comparable types can be used to index maps and only sets of elements in comparable types are allowed.
Here is the list of comparable types defined in ./src/michocoq/syntax_type.v:
Inductive simple_comparable_type : Set :=
| string
| nat
| int
| bytes
| bool
| mutez
| address
| key_hash
| timestamp.
Inductive comparable_type : Set :=
| Comparable_type_simple : simple_comparable_type -> comparable_type
| Cpair : simple_comparable_type -> comparable_type -> comparable_type.
Each comparable type is trivially a Michelson type, here is the full list of Michelson types as defined in ./src/michocoq/syntax_type.v:
Inductive type : Set :=
| Comparable_type (_ : simple_comparable_type)
| key
| unit
| signature
| option (a : type)
| list (a : type)
| set (a : comparable_type)
| contract (a : type)
| operation
| pair (a : type) (b : type)
| or (a : type) (_ : annot_o) (b : type) (_ : annot_o)
| lambda (a b : type)
| map (k : comparable_type) (v : type)
| big_map (k : comparable_type) (v : type)
| chain_id.
Coq lets us see the type simple_comparable_type
as a subtype of
comparable_type
and comparable_type
as a subtype of the type
type
using implicit coercions.
The syntax and typing of Michelson instructions is defined in file ./src/michocoq/syntax.v. We use a dependent inductive type to rule out ill-typed instructions.
Inductive instruction :
forall (self_type : Datatypes.option type) (tail_fail_flag : Datatypes.bool) (A B : Datatypes.list type), Set :=
| NOOP {A} : instruction A A
| FAILWITH {A B a} : instruction (a ::: A) B
| SEQ {A B C} : instruction A B -> instruction B C -> instruction A C
| IF_ {A B} : instruction A B -> instruction A B -> instruction (bool ::: A) B
| LOOP {A} : instruction A (bool ::: A) -> instruction (bool ::: A) A
...
The self_type
index represents the type of the parameter of the
whole contract, this index is needed for typing SELF
,
CREATE_CONTRACT
and LAMBDA
correctly.
The tail_fail_flag
index is a flag indicating whether or not the
instruction output type can be inferred from its input type. Most
instructions have this flag set to false
meaning that the type of
their input determines the type of their output. FAILWITH
, sequences
ending in FAILWITH
, and branching instructions (such as IF
) with
all branches ending in FAILWITH
have their tail_fail_flag
set to
true
. Instruction sequencing { A ; B }
(also written SEQ A B
)
requires that A
has its tail_fail_flag
set to false
. This
restriction is also present in the reference Michelson typechecker; it
permits efficient typechecking in a single pass through the contract
code.
Finally the A
and B
indexes are the types of respectively the
input and output stacks of the instruction.
The semantics of types is defined by interpreting them
by predefined Coq types using the following table (functions comparable_data
in ./src/michocoq/comparable.v data
in file ./src/michocoq/semantics.v:
Michelson | Coq |
---|---|
int | Z |
nat | N |
string | string |
bytes | string |
timestamp | Z |
mutez | int63 |
bool | bool |
unit | unit |
pair a b | a * b |
option a | option a |
or a b | sum a b |
list a | list a |
set a | set a (lt a) |
map a b | map a b (lt a) |
bigmap a b | idem |
lambda a b | {tff & instruction None tff [a] [b]} |
anything else | axiomatized |
with
Definition int63 :=
{t : int64.int64 | int64.sign t = false}
Definition set a lt :=
{l : list A | Sorted.StronglySorted lt l}
Definition map a b lt :=
set (a * b) (fun x y => lt (fst x) (fst y))
We formalize the semantics of Michelson by defining an evaluator in
file ./src/michocoq/semantics.v. Since evaluating a Michelson instruction might
fail (which Coq functions cannot), we wrap the return type of this
evaluator in the exception monad defined in file
./src/michocoq/error.v. Moreover, Coq forbids non-terminating function so we
use a common Coq trick to define the evalator on diverging
instructions such as LOOP
: we make the evaluator structurally
recursive on an extra argument of type Datatypes.nat
called the
fuel
of the evaluator.
Fixpoint eval {self_type} {tff} {env} {A : stack_type} {B : stack_type}
(i : instruction self_type tff A B) (fuel : Datatypes.nat) {struct fuel} :
stack A -> M (stack B) :=
match fuel with
| O => fun SA => Failed _ Out_of_fuel
| S n =>
match i, SA, env with
...
The env
parameter to eval
is a record providing the semantics of
the following instructions:
- forging operations:
CREATE_CONTRACT
,SET_DELEGATE
, andTRANSFER_TOKENS
: these cannot be defined directly because theoperation
type is axiomatized. - requesting information about the context:
AMOUNT
,BALANCE
,CHAIN_ID
,NOW
,SELF
,SENDER
, andSOURCE
- cryptography:
HASH_KEY
,BLAKE2B
,SHA256
,SHA512
, andCHECK_SIGNATURE
(not yet formalized) - serialization:
PACK
andUNPACK
(not yet formalized) - manipulation of addresses:
ADDRESS
,CONTRACT
, andIMPLICIT_ACCOUNT
(not yet formalized)
A Michelson instruction is called overloaded when it can be assigned
several types. For example, the NEG
instruction which replaces a
number at the top of the stack by its opposite can have either the
type int : 'S -> int : 'S
or the type nat : 'S -> int : 'S
.
To handle this source of ambiguity in our typed AST instruction
, we
use canonical structures, a Coq feature that let the Coq refiner
solve the ambiguity by inferring the missing piece of information from
the context.
Both versions of the NEG
instruction are defined in ./src/michocoq/syntax.v
as follows:
(* NEG takes either a nat or an int as argument *)
Inductive neg_variant : type -> Set :=
| Neg_variant_nat : neg_variant nat
| Neg_variant_int : neg_variant int.
Structure neg_struct (a : type) := Mk_neg { neg_variant_field : neg_variant a }.
Canonical Structure neg_nat : neg_struct nat :=
{| neg_variant_field := Neg_variant_nat |}.
Canonical Structure neg_int : neg_struct int :=
{| neg_variant_field := Neg_variant_int |}.
The structure neg_struct
is then used in the instruction
datatype as follows:
Inductive instruction : ... :=
...
| NEG {n} {s : neg_struct n} {S} : instruction (n ::: S) (int ::: S)
...
so the NEG
instruction receives a neg_struct
as an implicit argument
which is going to be provided by the canonical structure mechanism.
Finally, the evaluator in file ./src/michocoq/semantics.v uses this implicit
argument to call the correct function depending on the chosen type for
the NEG
instruction:
Definition neg a (v : neg_variant a) : data a -> data int :=
match v with
| Neg_variant_nat => fun x => (- Z.of_N x)%Z
| Neg_variant_int => fun x => (- x)%Z
end.
...
Fixpoint eval ... :=
match fuel with
| O => fun SA => Failed _ Out_of_fuel
| S n =>
match i, SA, env with
...
| @NEG _ _ s, (x, SA), _ => Return (neg _ (neg_variant_field _ s) x, SA)
...
An untyped version of the instruction
inductive type is provided in
file ./src/michocoq/untyped_syntax.v. Type checking and type inference
functions are defined in file ./src/michocoq/typer.v. They are proved
correct in file ./src/michocoq/untyper.v by proving they are inverses
of the trivial type-erasure function called untype_instruction
.
A Micheline parser is defined using Menhir in file ./src/michocoq/micheline_parser.vy; its input is streams of Micheline tokens that can be produced from a mere string using the lexer in ./src/michocoq/micheline_lexer.v. Micheline annotations are currently not supported.
To go in the other direction, pretty-printing functions are defined in file ./src/michocoq/micheline_pp.v.
Finally, the bridge between parsed Micheline nodes and untyped instructions is defined in files ./src/michocoq/micheline2michelson.v (that also unfolds macros) and ./src/michocoq/michelson2micheline.v.
The ./src/contracts_coq/ directory contains some examples of verified Michelson smart contracts.
- ./src/contracts_coq/boomerang.v is a simple example of a contract that sends back whatever amount is receives to the address that called it. It can be used for example to test a wallet’s ability to send funds without taking the risk to lose them.
- ./src/contracts_coq/deposit.v is an example of address-based
authentication. The contract stores the address of its manager (that
can either be an implicit account or a smart contract with a
parameter of type
unit
). This contract has two entrypoints: the one on the left does nothing and can be called by anybody, it can be used to send tokens to the deposit contract; the one on the right can only be called by the manager, it takes an amount in parameter and sends to the manager the requested amount. - ./src/contracts_coq/manager.v is the default smart contract that was
originated in place of all scriptless KT1 accounts during the
Babylon migration. Because of this, it is an extremely common smart
contract on the Tezos blockchain. It is quite similar to the deposit
contract; it stores the address of its manager (that has to be an
implicit account) and has two entrypoints, the default entrypoint
does nothing and can be used to send tokens to the contract from
anywhere; the other entrypoint, called
%do
can only be called by the manager. There are two differences with the deposit contract in the behaviour of the%do
entrypoint: first, it checks that no tokens was sent (the default entrypoint should be used for that), second its parameter is not simply an amount to send to the manager but a piece of code (of typelambda unit (list operation)
) to execute and emit as a result. This provides more genericity as it permits for example to change the delegate of the contract, to call other smart contract expecting a parameter, and to perform several transfers atomically. - ./src/contracts_coq/multisig.v is the multisig contract supported by
tezos-client
. A multisig contract is a smart contract that can be used to share ownership over a bag of tokens. It is an example of signature-based authentication. The contract stores a list of public keys that represent the owners of the contract, a number called the threshold and an anti-replay counter. It features three possible actions:- spending the contract tokens (by sending them to a
contract unit
) - updating the delegate (including stopping delegation)
- updating both the threshold and the keys of the owners
To run any of these actions, the following data has to be sent to the contract:
- the current value of the anti-replay counter
- the description of the action
- a list of optional signatures
The signatures must be given in the same order as the stored keys, all provided signature must be valid and the number of provided signatures must be greater or equal to the threshold. The signed data contains the
chain_id
of the chain running the contract, the address of the contract, the anti-replay counter, and the description of the action. - spending the contract tokens (by sending them to a
- ./src/contracts_coq/generic_multisig.v is to the multisig contract
what the manager contract is to the deposit contract: it features a
universally-callable default entrypoint that does nothing (to
receive tokens), it checks that no token is sent on the main
entrypoint, and it generalizes the spending and delegation actions
using a piece of code of type
lambda unit (list operation)
. - ./src/contracts_coq/vote.v is an example of a contract manipulating a map and implementing a paywall. It is a vote contract that stores a map from the available vote options (represented as strings) to their tallies. To vote for one of the available options, a transfer of 5tz (or more) to the contract is needed.