Coder Social home page Coder Social logo

catalalang / catala Goto Github PK

View Code? Open in Web Editor NEW
1.9K 21.0 76.0 56.54 MB

Programming language for literate programming law specification

Home Page: https://catala-lang.org

License: Apache License 2.0

Makefile 0.76% Shell 0.16% OCaml 92.17% Python 2.21% JavaScript 0.97% Dockerfile 0.13% Nix 0.48% Emacs Lisp 0.35% Vim Script 0.69% ReScript 0.32% R 0.62% C 1.16%
programming-language legislative-texts

catala's Introduction

Catala logo

Catala

Explore the docs »
View TutorialReport BugContributeJoin Zulip Chat

CI Opam Licence Tag LoC Language Issues Contributors Activity

Catala is a domain-specific language for deriving faithful-by-construction algorithms from legislative texts. To learn quickly about the language and its features, you can jump right to the official Catala tutorial. You can join the Catala community on Zulip!


Table of Contents

Concepts

Catala is a programming language adapted for socio-fiscal legislative literate programming. By annotating each line of the legislative text with its meaning in terms of code, one can derive an implementation of complex socio-fiscal mechanisms that enjoys a high level of assurance regarding the code-law faithfulness.

Concretely, you have to first gather all the laws, executive orders, previous cases, etc. that contain information about the socio-fiscal mechanism that you want to implement. Then, you can proceed to annotate the text article by article, in your favorite text editor :

Screenshot

Once your code is complete and tested, you can use the Catala compiler to produce a lawyer-readable PDF version of your implementation. The Catala language has been specially designed in collaboration with law professionals to ensure that the code can be reviewed and certified correct by the domain experts, which are in this case lawyers and not programmers.

Screenshot

The Catala language is special because its logical structure mimics the logical structure of the law. Indeed, the core concept of "definition-under-conditions" that builds on default logic has been formalized by Professor Sarah Lawsky in her article A Logic for Statutes. The Catala language is the only programming language to our knowledge that embeds default logic as a first-class feature, which is why it is the only language perfectly adapted to literate legislative programming.

Getting started

To get started, the best place is the tutorial of the language. A French version is also available but might be out of sync with the latest language features.

Note: bleeding-edge version

If you are interested in the latest development version, pre-built artifacts including binaries and API documentation can be found at https://catalalang.github.io/catala

Building and installation

Catala is available as an opam package! If opam is installed on your machine, simply execute:

opam install catala

To get the cutting-edge, latest version of Catala, you can also do

opam pin add catala --dev-repo

However, if you wish to get the latest developments of the compiler, you probably want to compile it from the sources of this repository or use nix. For that, see the dedicated readme.

Usage

Catala

Use catala --help if you have installed it to get more information about the command line options available. The man page is also available online. To get the development version of the help, run make help_catala after make build. The catala binary corresponds to the Catala compiler.

The top-level Makefile contains a lot of useful targets to run. To display them, use

    make help

Plugin backends

While the compiler has some builtin backends for Catala (Python, Ocaml, etc.), it is also possible to add a custom backend to the Catala compiler without having to modify its source code. This plugin solution relies on dynamic linking: see the dedicated README.

Clerk

Use clerk --help if you have installed it to get more information about the command line options available. To get the development version of the help, run make help_clerk after make build. The clerk binary corresponds to the Catala build system, responsible for testing among other things.

To get more information about Clerk, see the dedicated readme

Catleg

Catleg is a command line utility providing useful integration with LégiFrance, the official repository of French legal documentation. See the decidated repository for more information.

Documentation

Syntax cheat sheet

A complete and handy reference of the Catala syntax can be found in the cheat sheet (for French and English versions of the syntax).

Formal semantics

To audit the formal proof of the partial certification of the Catala compiler, see the dedicated readme.

Compiler documentation

The documentation is accessible online, both for the latest release and bleeding-edge version.

It is otherwise generated from the compiler source code using dune and odoc. Run

make doc

to generate the documentation, then open the doc/odoc.html file in any browser.

Examples

To explore the different programs written in Catala, see the dedicated readme.

API

To know how to use the code generated by the Catala compiler in your favorite programming language, head to the readme of the French law library. The corresponding pre-built examples are also available.

Contributing

To know how you can contribute to the project, see the dedicated readme.

Test suite

To know how to run or improve the Catala reference test suite, see the dedicated readme.

License

The compiler and all the code contained in this repository is released under the Apache license (version 2) unless another license is explicited for a sub-directory.

Limitations and disclaimer

Catala is a research project from Inria, the French National Research Institute for Computer Science. The compiler is yet unstable and lacks some of its features.

Pierre Catala

The language is named after Pierre Catala, a professor of law who pionneered the French legaltech by creating a computer database of law cases, Juris-Data. The research group that he led in the late 1960s, the Centre d’études et de traitement de l’information juridique (CETIJ), has also influenced the creation by state conselor Lucien Mehl of the Centre de recherches et développement en informatique juridique (CENIJ), which eventually transformed into the entity managing the LegiFrance website, acting as the public service of legislative documentation.

catala's People

Contributors

adelaett avatar aidaibca004 avatar altgr avatar aminata-dev avatar denismerigoux avatar dependabot[bot] avatar edwintorok avatar emilerolley avatar escott-mitre avatar jemsab avatar jmlx42 avatar kescher avatar lilyaslm avatar numero7 avatar phiseb avatar protz avatar r1km avatar risseraka avatar rmonat avatar rohanpadhye avatar rprimet avatar rubujeto avatar suhail-singh avatar svenski avatar twal avatar vincent-botbol avatar wozniakpl avatar yuelipicasso avatar zachwick avatar zapashcanon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

catala's Issues

Added HTML links in "fixed by" meta-assertions

When the code says :

assertion foo fixed by law

It would be good in the HTML output to create a link on law to the law that defines foo. To get this information, one could alternatively :

  • create a pass on the surface AST that does crude name resolution based on the names of the variables and the scopes
  • retrieve that information from a later intermediate representation.

Alternative compilation scheme from default calculus to lambda calculus

The problem

Currently, the default calculus representation is translated to a lambda calculus with exceptions, according to a scheme described in section 4.3 of the Catala paper. This translation scheme has an advantage and a drawback:

  • As the advantage, this compilation scheme is fully general and accepts any default calculus program.
  • As the drawback, this compilation scheme requires the target lambda calculus to have support for catchable exceptions.

The current translation scheme is implemented here:

let rec translate_default (ctx : ctx) (exceptions : D.expr Pos.marked list)
(just : D.expr Pos.marked) (cons : D.expr Pos.marked) (pos_default : Pos.t) :
A.expr Pos.marked Bindlib.box =
let exceptions =
List.map (fun except -> thunk_expr (translate_expr ctx except) pos_default) exceptions
in
let exceptions =
A.make_app (handle_default pos_default)
[
Bindlib.box_apply
(fun exceptions -> (A.EArray exceptions, pos_default))
(Bindlib.box_list exceptions);
thunk_expr (translate_expr ctx just) pos_default;
thunk_expr (translate_expr ctx cons) pos_default;
]
pos_default
in
exceptions
and translate_expr (ctx : ctx) (e : D.expr Pos.marked) : A.expr Pos.marked Bindlib.box =
match Pos.unmark e with
| D.EVar v -> D.VarMap.find (Pos.unmark v) ctx
| D.ETuple (args, s) ->
Bindlib.box_apply
(fun args -> Pos.same_pos_as (A.ETuple (args, s)) e)
(Bindlib.box_list (List.map (translate_expr ctx) args))
| D.ETupleAccess (e1, i, s, ts) ->
Bindlib.box_apply
(fun e1 -> Pos.same_pos_as (A.ETupleAccess (e1, i, s, ts)) e)
(translate_expr ctx e1)
| D.EInj (e1, i, en, ts) ->
Bindlib.box_apply
(fun e1 -> Pos.same_pos_as (A.EInj (e1, i, en, ts)) e)
(translate_expr ctx e1)
| D.EMatch (e1, cases, en) ->
Bindlib.box_apply2
(fun e1 cases -> Pos.same_pos_as (A.EMatch (e1, cases, en)) e)
(translate_expr ctx e1)
(Bindlib.box_list (List.map (translate_expr ctx) cases))
| D.EArray es ->
Bindlib.box_apply
(fun es -> Pos.same_pos_as (A.EArray es) e)
(Bindlib.box_list (List.map (translate_expr ctx) es))
| D.ELit l -> Bindlib.box (Pos.same_pos_as (translate_lit l) e)
| D.EOp op -> Bindlib.box (Pos.same_pos_as (A.EOp op) e)
| D.EIfThenElse (e1, e2, e3) ->
Bindlib.box_apply3
(fun e1 e2 e3 -> Pos.same_pos_as (A.EIfThenElse (e1, e2, e3)) e)
(translate_expr ctx e1) (translate_expr ctx e2) (translate_expr ctx e3)
| D.EAssert e1 ->
Bindlib.box_apply (fun e1 -> Pos.same_pos_as (A.EAssert e1) e) (translate_expr ctx e1)
| D.EApp (e1, args) ->
Bindlib.box_apply2
(fun e1 args -> Pos.same_pos_as (A.EApp (e1, args)) e)
(translate_expr ctx e1)
(Bindlib.box_list (List.map (translate_expr ctx) args))
| D.EAbs (pos_binder, binder, ts) ->
let vars, body = Bindlib.unmbind binder in
let ctx, lc_vars =
Array.fold_right
(fun var (ctx, lc_vars) ->
let lc_var = A.Var.make (Bindlib.name_of var, pos_binder) in
let lc_var_expr = A.make_var (lc_var, pos_binder) in
(D.VarMap.add var lc_var_expr ctx, lc_var :: lc_vars))
vars (ctx, [])
in
let lc_vars = Array.of_list lc_vars in
let new_body = translate_expr ctx body in
let new_binder = Bindlib.bind_mvar lc_vars new_body in
Bindlib.box_apply
(fun new_binder -> Pos.same_pos_as (A.EAbs (pos_binder, new_binder, ts)) e)
new_binder
| D.EDefault (exceptions, just, cons) ->
translate_default ctx exceptions just cons (Pos.get_position e)

Objectives

For Catala to target imperative low-level languages likes C, we have to eliminate catchable exceptions which are not supported in those languages. In generality, it is not possible to do so without an exception runtime. But there might be a possibility to get away without catchable exceptions to translate the default calculus programs generated by the Catala compiler.

Indeed, there is a structural invariant that is respected by default calculus programs generated by the Catala compiler. As a reminder, default calculus programs have a default term of the form < e_1, ..., e_n | e_{just} :- e_{cons} >. The invariant can be expressed as follows: if a program e is generated from the Catala compiler, then the default terms contained in e are of the form X, described by < e_1, ..., e_n | e_{just} :- e_{cons} > where e_1,..., e_n are of the form X and e_{just} and e_{cons} do not contain any defaults. Furthermore, all default terms of form X are encapsulated at the toplevel by a error_if_empty call that aborts if the argument is EmptyError.

New solution

Because of this structural invariant, we can propose a 2-phase compilation scheme. In the first general phase, the translation is the identity as long as you don't encounter a default term. But when you encounter a default term of form X encapsulated by error_if_empty, then you enter phase 2 of the compilation scheme. The term to translate is akin to a default tree where all the leaves e_{just} and e_{cons} can be compiled using phase 1, since they supposedly don't contain any default term. Then, the default tree can be compiled recursively using a pure option type instead of an exception to translate the EmptyError case, on the model of process_exceptions of section 4.3 of the Catala paper. The case where two exceptions conflicts will still have to be translated using a non-catchable exception, but non-catchable exceptions are fine for languages like C where they can be translated as return.

This new compilation scheme can be implemented in the file src/catala/lcalc/compile_without_exceptions.ml.

Put also examples in English in the README

Currently, all the examples in the README are in French. It can be intimidated for English speakers.

On the landing page of the website, there is an example in English (Qualified employee discount), which could be added to the README.

Cannot generate website assets

~/Documents/git/catala master*                                                                                [13/4789]
❯ ./generate_website_assets.sh ../catala-website/assets                                                                
dune exec ../../src/catala.exe -- --debug --language=fr Makefile allocations_familiales.catala                         
Entering directory '/Users/nadim/Documents/git/catala'                                                                 
Entering directory '/Users/nadim/Documents/git/catala'                                                                 
[DEBUG] Reading files...                                   
[DEBUG] Parsing allocations_familiales.catala
[DEBUG] Parsing ./metadata.catala  
[DEBUG] Parsing ./securite_sociale_L.catala
[DEBUG] Parsing ./securite_sociale_R.catala  
[DEBUG] Parsing ./securite_sociale_D.catala                                                                            
[DEBUG] Parsing ./decrets_divers.catala                                                                                
dune exec ../../src/catala.exe -- --debug --language=fr \
        --wrap \                                           
        --pygmentize=../../syntax_highlighting/fr/pygments/pygments/env/bin/pygmentize \
        HTML \
        allocations_familiales.catala
Entering directory '/Users/nadim/Documents/git/catala'
Entering directory '/Users/nadim/Documents/git/catala'
[DEBUG] Reading files...
[DEBUG] Parsing allocations_familiales.catala
[DEBUG] Parsing ./metadata.catala
[DEBUG] Parsing ./securite_sociale_L.catala
[DEBUG] Parsing ./securite_sociale_R.catala
[DEBUG] Parsing ./securite_sociale_D.catala
[DEBUG] Parsing ./decrets_divers.catala
[DEBUG] Weaving literate program into HTML
[DEBUG] Pygmenting the code chunk in file ./metadata.catala, from 3:1 to 221:33
sh: ../../syntax_highlighting/fr/pygments/pygments/env/bin/pygmentize: No such file or directory
[ERROR] Weaving error: pygmentize command "../../syntax_highlighting/fr/pygments/pygments/env/bin/pygmentize -l catala_
fr -f html -O style=colorful,anchorlinenos=True,lineanchors="./metadata.catala",linenos=table,linenostart=3 -o /var/fol
ders/sy/c_zqf9w55vj6yzfq2yrbk_zw0000gn/T/catala_html_pygmentscf0964out /var/folders/sy/c_zqf9w55vj6yzfq2yrbk_zw0000gn/T
/catala_html_pygments2f2580in" returned with error code 127 
make: *** [allocations_familiales.html] Error 255
dune exec ../../src/catala.exe -- --debug --language=en Makefile english.catala
Entering directory '/Users/nadim/Documents/git/catala'
Entering directory '/Users/nadim/Documents/git/catala'
[DEBUG] Reading files...
[DEBUG] Parsing english.catala
dune exec ../../src/catala.exe -- --debug --language=en \
        --wrap \
        --pygmentize=../../syntax_highlighting/en/pygments/pygments/env/bin/pygmentize \
        HTML \
        english.catala
Entering directory '/Users/nadim/Documents/git/catala'
Entering directory '/Users/nadim/Documents/git/catala'
[DEBUG] Reading files...
[DEBUG] Parsing english.catala
[DEBUG] Weaving literate program into HTML
[DEBUG] Pygmenting the code chunk in file english.catala, from 4:1 to 9:48
sh: ../../syntax_highlighting/en/pygments/pygments/env/bin/pygmentize: No such file or directory
[ERROR] Weaving error: pygmentize command "../../syntax_highlighting/en/pygments/pygments/env/bin/pygmentize -l catala_en -f html -O style=colorful,anchorlinenos=True,lineanchors="english.catala",linenos=table,linenostart=4 -o /var/folders/sy/c_zqf9w55vj6yzfq2yrbk_zw0000gn/T/catala_html_pygments927227out /var/folders/sy/c_zqf9w55vj6yzfq2yrbk_zw0000gn/T/catala_html_pygments07468ein" returned with error code 127
make: *** [english.html] Error 255
make: `grammar.html' is up to date.
make: `catala.html' is up to date.
make: `legifrance_catala.html' is up to date.
cp: examples/allocations_familiales/allocations_familiales.html: No such file or directory
cp: examples/dummy_english/english.html: No such file or directory

And yes, I did run the following before:

make install-dependencies
make build
make pygments
source syntax_highlighting/en/pygments/pygments/env/bin/activate

Adding HTML and LaTeX links for labels and exceptions

Since Catala has a label system for rules, these labels should be clickable and interactive in the literate programming backends.

This involves modifying the src/catala/literate/html.ml and src/catala/latex.ml files. In order to know to which label redirect the links, we might want to change src/catala/driver.ml to do the name resolution pass before generating the literate programming backends. Indeed, the name resolution context will contain information necessary to resolve the label linking.

Implement maximum and minimum aggregation function

Catala supports maximum and minimum aggregation functions for collections. Right now, these are only present in the surface syntax.

One has to fill the holes like this one :

| Ast.Aggregate (Ast.AggregateExtremum _) ->
Errors.raise_spanned_error "Unsupported feature: minimum and maximum"
(Pos.get_position op')

To do that, the expression maximum money for salary in salaries of (salary - $200) should desugar to

fold (fun acc salary -> let tmp = (salary - 200) in if acc < tmp then tmp else acc) <INIT> salaries

The problem lies within the <INIT> expression. What to put here? I guess this should be provided by the user, so we have to update the syntax to something like maximum money context $0 for salary in salaries of (salary - $200)

Documentation for the lambda calculus language

Similar to the other .mld files in the folders of each intermediate representation, the lambda calculus should have its own .mld and appear in the general documentation diagram :

{v
+---------------+
| |
| Surface AST |
| |
+---------------+
|
* Separate code from legislation |
* Remove syntactic sugars |
v
+---------------+
| |
| Desugared AST |
| |
+---------------+
|
* Build rule trees for each definition |
* Order variable computations inside scope |
v
+--------------------+
| |
| Scope language AST |
| |
+--------------------+
|
* Convert scopes into functions |
* Thunking of subscope arguments |
|
v
+----------------------+
| |
| Default calculus AST |
| |
+----------------------+ v}

Exceptions of exceptions

Although written with much care, the algorithm to unflatten the exception structures of the rules into a tree does not work correctly :

let rec def_map_to_tree (def_info : Ast.ScopeDef.t)
(is_def_func : Scopelang.Ast.typ Pos.marked option) (def : Ast.rule Ast.RuleMap.t) :
rule_tree list =
(* first we look to the rules that don't have any exceptions *)
let has_no_exception (r : Ast.RuleName.t) _ =
not
(Ast.RuleMap.exists
(fun _ r' -> match r'.Ast.exception_to_rule with Some r_ex -> r_ex = r | None -> false)
def)
in
let no_exceptions = Ast.RuleMap.filter has_no_exception def in
(* Then, for each top-level rule (that has no exceptions), we build a rule tree *)
(* Among the top-level rules are the base rules that are exceptions to nothing *)
let base_rules, rules_that_are_exceptions =
Ast.RuleMap.partition (fun _ r -> Option.is_none r.Ast.exception_to_rule) no_exceptions
in
let base_trees : rule_tree Ast.RuleMap.t =
Ast.RuleMap.map
(fun r ->
(* we look at the the eventual rule of which r is an exception *)
match r.Ast.exception_to_rule with None -> Leaf r | Some _ -> assert false
(* should not happen *))
base_rules
in
(* Now let's deal with the rules that are exceptions but have no exception. We have to bucket
these, each bucket containing all the rules that are exception to the same rule *)
let exception_targets =
Ast.RuleMap.fold
(fun _ r acc ->
match r.Ast.exception_to_rule with
| None -> assert false (* should not happen *)
| Some r' -> Ast.RuleMap.add r' () acc)
rules_that_are_exceptions Ast.RuleMap.empty
in
(* In each bucket corresponding to an exception target, we have all the rules that are exceptions
to the target *)
let exception_trees =
Ast.RuleMap.mapi
(fun r' _ ->
(* we recursively call the function of a def where we have removed exception edges: this is
why the function should terminate *)
let def_rec =
Ast.RuleMap.map
(fun r ->
{
r with
Ast.exception_to_rule =
( match r.Ast.exception_to_rule with
| None -> None
| Some r'' -> if r'' = r' then None else Some r'' );
})
def
in
let def_rec =
Ast.RuleMap.filter (fun r _ -> not (Ast.RuleMap.mem r exception_targets)) def_rec
in
let exceptions = def_map_to_tree def_info is_def_func def_rec in
Node (exceptions, Ast.RuleMap.find r' def))
exception_targets
in
List.map snd (Ast.RuleMap.bindings base_trees)
@ List.map snd (Ast.RuleMap.bindings exception_trees)

The algorithm works as expected if there is only one base case and one or more exceptions to that base case, but fails when there are exceptions of exceptions.

This test case should return 2 for x:

@Test@
/*
new scope A:
param x content int
scope A:
label base_x
def x := 0
label exception_x
exception base_x
def x := 1
exception exception_x
def x := 2
*/

But right now it squeezes the 1 and 2 cases as exceptions for the 0 case.

def_map_to_tree should be completely rewritten in a clean way by constructing the actual graph of the exception structure for each variable. This graph can then be checked for cycle, solving #60. Then, the tree structure should be extracted by doing a graph traversal.

Module system for Catala

The problem

Right now, the only abstraction mechanism in Catala are the scopes, which correspond roughly to functions. However, as programs get bigger and bigger, we will want to group functions into modules in order to split and compartmentalize implementations.

Even though it is possible to split an implementation across multiple files currently, this can only be done via a primitive textual include mechanism that should ultimately be complemented by a proper module system.

Design of a simple module system

Syntax

This proposal is heavily inspired from the F* and Rust module system. Each file is its own module. The name of the module is introduced at the top of the file with the syntax (taking #84 into account):

> Module Foo

The name of the file must correspond to the ASCII snake case version of the module name.

Scopes and types declared in a module can be accessed without prefix in the same module. To use types and scopes declared and defined in other module Bar, insert at the top of the file:

> Using Bar

Then simply refer to Bar's scopes and types with Bar.<name of type/scope>. Important: it is impossible to add rules and definitions to Bar inside Foo via a scope Bar: ....

Prefixes can be omitted for certain modules with the mention

> Using Bar implicit

For scopes or types with names defined in two distinct implicit modules, name resolution selects the one corresponding to the last implicit when traversing the file top to bottom.

Local aliases can be defined for modules with

> Using Bar as B

External module lookup.

When invoking the compiler with catala bar.catala_en, the compiler retrieves the list of the names external modules used in bar.catala_en and proceeds to find .catala_en files corresponding to those module names in the current directory. Additional lookup directories can be added with the -I option.

Compilation units

For the OCaml backend and future backends, each Catala module should be translated to a different target language file. Hence, the -o option should take an output directory instead of file and the generated files names should be the same as the source files, but with a different extension.

Literate programming

When invoking Catala to produce a literate programming output from a particular file, the output should only contain that single file and not the modules on which this file depends. To produce a literate programming output containing multiple files, one can uss the master file and dependencies list syntax like

@@Master file@@
@@Include: preamble.catala_en@@
@@Include: section_121.catala_en@@
@@Include: section_132.catala_en@@
@@Include: section_1015.catala_en@@

Implementation

The implementation of this module system is going to vastly affect the compiler code. First, programs in the different intermediate representations should be represented as a list of modules (each of them containing types and scopes). The list should be ordered according to the topological order of the dependency graph between modules, which will have to be constructed. Cyclic dependencies between modules are forbidden. The passage to a collections of modules to an ordered list can be implemented in the surface -> desugared translation.

When parsing expressions like <expression>.x.y that are transformed into Dotted

| Dotted of expression Pos.marked * constructor Pos.marked option * ident Pos.marked
(** Dotted is for both struct field projection and sub-scope variables *)

the name resolution of x and y should be resolved in that order:

  • first, lookup if the string is a module name, in which case we expect a following Dotted qualifier that can be a scope or a type belonging to that module
  • second, lookup if the string is a struct name, in which case we expect a following Dotted qualifier containing a field name belonging to that struct
  • thrid, lookup if the string is a struct field or a subscope.

Before starting this implementation, fixing #47 seems to be a good pre-requisite as we don't want to bother with ordering the types and scopes declaration correctly as long as they are cycle-free.

Add collection types into the language

Once #36 is solved, we can add a primitive polymorphic collection type into the language with its associated operators. This type should be present all the way from the surface syntax to the default calculus.

Once this type is added, the open question of #17 should be reactivated; maybe with a special map operator that maps a scope (viewed as a function) onto a collection.

Collections are fixed-length arrays, whose length can be provided at runtime. Their natural extraction target are C arrays.

Compute the eldest child for Allocations Familiales

The implementation of the allocations familiales is not yet finished. Among other things, the function for computing the eldest child in the list of children is missing.

champ d'application EnfantLePlusÂgé:
définition est_le_plus_âgé de enfant égal à
faux # TODO: changer!

A correct implementation would have to build at least on #52, but it would certainly also require an argmax/argmin functionnality since two children might have the exact same age, and we have to arbitrarily pick between the two of them.

Add polymorphism into the language

The language does not have any polymorphism right now. Support for polymorphism should be added in the default calculus representation via a classic Big-Lambda type abstraction with type variables. Big lambdas should be explicitly annotated in order to avoid big-lambda inference in the type checking algorithm.

Then, the polymorphism should be reflected upwards in the intermediate representations, and reflected as minimally as possible in the surface syntax. In a first thought, surface syntax should not change at all.

Mixing defaults and functions

Right now, the example test/test_func/func.catala is not passing. It fails on this assertion : https://github.com/CatalaLang/catala/blob/master/src/catala/lambda_calculus/lambda_interpreter.ml#L193.

This assertion is triggered because the default for the f wraps functions inside its justifications and consequences, as it's written here : https://github.com/CatalaLang/catala/blob/master/src/catala/catala_surface/desugaring.ml#L105. @nchataing you had set up things in a different manner but I changed them during the rewrite.

So two choices here :

  • either we keep the new desugaring style for default functions, but then we need to add a new rule in the lambda interpreter that first feeds the arguments to the functions inside the default and then evaluate it,
  • either we push the function in front of the default, so then we evaluate the function first and then we get to the default. This would required changings the type of defs in our scope language because right now we assume all the definitions are default.

I'm done implementing for the week-end, so if somebody wants to pick up here please do :)

Missing duration division operator

For the latest work on Section 121, there is a notion of a ratio of days of nonqualified use over days of ownership.

It would be convenient to have an operator /^: duration -> duration -> decimal

Typecheck and test the US Tax Code

Right now, only the section 132 of the US Tax code example typechecks in Catala. Concerning the other sections :

Sections that typecheck should also be unit-tested on the model of section 132.

Better literate programming

Context

Right now, the surface Catala AST mixes code items as well as literate programming headings and items : https://github.com/CatalaLang/catala/blob/master/src/catala/catala_surface/catala_ast.ml#L192. This makes it uneasy to determine to which "legislative atom" a chunk of code belongs to.

Indeed, if we are to compile Catala programs into programs that can explain why they got to their results, we should be able to link a position in a code block with a reference to a legislative atom: an article for French text, a code section for English texts, etc.

Goal

In order to do that, we need to change the flat structure of program_item into a tree-shaped structure where the headings and atom title would be nodes of the tree, and the code blocks would be the leaves. Then, when desugaring the AST, we can augment Pos.t with the additional information like Legislative Code XXX -- Section YYY --- Paragraph ZZZ -- ...

What to do ?

The main mission is to refactor Catala_ast.program_item into something more like :

type program_item = 
  | CodeBlock of code_block 
  | LawTex of law_text

type program_node = 
  | LawHeading of string * int * program_item list
  | LawArticle of law_article * program_item list 

Then, the parser and especially this rule https://github.com/CatalaLang/catala/blob/master/src/catala/catala_surface/parser.mly#L500 should be updated accordingly. Finally, Pos.t should be augmented with the information of the path of program_nodes leading to the chunk of code.

Update tutorial with collections

Update the English and French versions of the tutorial with a chapter about collections : the collection type, aggregation functions, length, etc.

No default target in Makefile

jonathan@absinthe:~/Code/catala/examples/us_tax_code (master) $ make
latexmk -f -C us_tax_code.tex
Latexmk: This is Latexmk, John Collins, 26 Dec. 2019, version: 4.67.
Latexmk: Could not find file 'us_tax_code.tex'
rm -rf us_tax_code.tex \
		us_tax_code.d \
		_minted-us_tax_code \
		us_tax_code.html

This is because GNU make picks the first non-pattern rule to be the default target, in this case, the clean target. I suggest adding an all target at the beginning, and pick a suitable default; alternatively, the all target could print a help message with a few suggested targets.

Syntactic sugar for one base case and exceptions

Right now, we have a label system to declare exceptions to rules :

label article_5
definition income_tax equals two_brackets.tax_formula of individual.income
# Then, you can declare the exception by referring back to the label
exception article_5
definition income_tax under condition
individual.income <=$ $10,000
consequence equals $0

But in many cases, when defining variable x of scope A, you only have one base case and several exceptions to it. To make the syntax easier to read in this case, we could have a syntactic sugar of the form:

definition income_tax equals two_brackets.tax_formula of individual.income  
  
# Then, you can declare the exception by referring back to the label 
exception definition income_tax under condition  
individual.income <=$ $10,000 
consequence equals $0 

The desugaring pass could make sure that there's only one definition without the exception tag, making it the base case. If there are multiple definition without exception, then the desugaring pass throws an error message saying you should revert to a label to disambiguate.

This involves modyfing src/catala/catala_surface/parser.mly as well as src/catala/catala_surface/desugaring.ml

Replace structural by nominal typing in default calculus

Right now, the following test:

new enum E:
-- Case1
-- Case2
new enum F:
-- Case3
-- Case4
new scope A:
param x content E
param y content bool
scope A:
def x := Case1
def y := x with Case3

Has the following output in the Catala interpreter :

[RESULT] Computation successful! Results:
[RESULT] x = Case1 ()
[RESULT] y = true

This is a weird behavior and should be changed. It should return a type error saying that Case3 is not in the enum of x (that has only Case1 and Case2.

Why is it like this?

Because in the default calculus representation, I implement structural typing. Indeed, structs and enums hasve been desugared into plain product and sum types. This means that if two enums have the same structure (same number of fields of the same type in the same order), then they can be used interchangeably, hence the current interpreter result.

Replace with nominal typing

This should change, as it is confusing for users. In order to change these default calculus AST cases:

| ETuple of (expr Pos.marked * Uid.MarkedString.info option) list
(** The [MarkedString.info] is the former struct field name*)
| ETupleAccess of expr Pos.marked * int * Uid.MarkedString.info option * typ Pos.marked list
(** The [MarkedString.info] is the former struct field name *)
| EInj of expr Pos.marked * int * Uid.MarkedString.info * typ Pos.marked list
(** The [MarkedString.info] is the former enum case name *)
| EMatch of expr Pos.marked * (expr Pos.marked * Uid.MarkedString.info) list
(** The [MarkedString.info] is the former enum case name *)

And add to each case a Scopelang.StructName.t and Scopelang.EnumName.t that correspond to the name of the struct/enum that is being introduced or eliminated. By doing that, the typechecking algorithm can check whether the enum or struct name match.

The struct and enum name is available at the right point in the scope language, so I expect changes in the translation to happen only here :

| EStruct (struct_name, e_fields) ->
let struct_sig = Ast.StructMap.find struct_name ctx.structs in
let d_fields, remaining_e_fields =
List.fold_right
(fun (field_name, _) (d_fields, e_fields) ->
let field_e =
try Ast.StructFieldMap.find field_name e_fields
with Not_found ->
Errors.raise_spanned_error
(Format.asprintf "Missing field for structure %a: \"%a\""
Ast.StructName.format_t struct_name Ast.StructFieldName.format_t field_name)
(Pos.get_position e)
in
let field_d = translate_expr ctx field_e in
let field_d =
Bindlib.box_apply
(fun field_d -> (field_d, Some (Ast.StructFieldName.get_info field_name)))
field_d
in
(field_d :: d_fields, Ast.StructFieldMap.remove field_name e_fields))
struct_sig ([], e_fields)
in
if Ast.StructFieldMap.cardinal remaining_e_fields > 0 then
Errors.raise_spanned_error
(Format.asprintf "The fields \"%a\" do not belong to the structure %a"
Ast.StructName.format_t struct_name
(Format.pp_print_list
~pp_sep:(fun fmt () -> Format.fprintf fmt ", ")
(fun fmt (field_name, _) ->
Format.fprintf fmt "%a" Ast.StructFieldName.format_t field_name))
(Ast.StructFieldMap.bindings remaining_e_fields))
(Pos.get_position e)
else
Bindlib.box_apply (fun d_fields -> Dcalc.Ast.ETuple d_fields) (Bindlib.box_list d_fields)
| EStructAccess (e1, field_name, struct_name) ->
let struct_sig = Ast.StructMap.find struct_name ctx.structs in
let _, field_index =
try List.assoc field_name (List.mapi (fun i (x, y) -> (x, (y, i))) struct_sig)
with Not_found ->
Errors.raise_spanned_error
(Format.asprintf "The field \"%a\" does not belong to the structure %a"
Ast.StructFieldName.format_t field_name Ast.StructName.format_t struct_name)
(Pos.get_position e)
in
let e1 = translate_expr ctx e1 in
Bindlib.box_apply
(fun e1 ->
Dcalc.Ast.ETupleAccess
( e1,
field_index,
Some (Ast.StructFieldName.get_info field_name),
List.map (fun (_, t) -> translate_typ ctx t) struct_sig ))
e1
| EEnumInj (e1, constructor, enum_name) ->
let enum_sig = Ast.EnumMap.find enum_name ctx.enums in
let _, constructor_index =
try List.assoc constructor (List.mapi (fun i (x, y) -> (x, (y, i))) enum_sig)
with Not_found ->
Errors.raise_spanned_error
(Format.asprintf "The constructor \"%a\" does not belong to the enum %a"
Ast.EnumConstructor.format_t constructor Ast.EnumName.format_t enum_name)
(Pos.get_position e)
in
let e1 = translate_expr ctx e1 in
Bindlib.box_apply
(fun e1 ->
Dcalc.Ast.EInj
( e1,
constructor_index,
Ast.EnumConstructor.get_info constructor,
List.map (fun (_, t) -> translate_typ ctx t) enum_sig ))
e1
| EMatch (e1, enum_name, cases) ->
let enum_sig = Ast.EnumMap.find enum_name ctx.enums in
let d_cases, remaining_e_cases =
List.fold_right
(fun (constructor, _) (d_cases, e_cases) ->
let case_e =
try Ast.EnumConstructorMap.find constructor e_cases
with Not_found ->
Errors.raise_spanned_error
(Format.asprintf
"The constructor %a of enum %a is missing from this pattern matching"
Ast.EnumConstructor.format_t constructor Ast.EnumName.format_t enum_name)
(Pos.get_position e)
in
let case_d = translate_expr ctx case_e in
let case_d =
Bindlib.box_apply
(fun case_d -> (case_d, Ast.EnumConstructor.get_info constructor))
case_d
in
(case_d :: d_cases, Ast.EnumConstructorMap.remove constructor e_cases))
enum_sig ([], cases)
in
if Ast.EnumConstructorMap.cardinal remaining_e_cases > 0 then
Errors.raise_spanned_error
(Format.asprintf "Patter matching is incomplete for enum %a: missing cases %a"
Ast.EnumName.format_t enum_name
(Format.pp_print_list
~pp_sep:(fun fmt () -> Format.fprintf fmt ", ")
(fun fmt (case_name, _) ->
Format.fprintf fmt "%a" Ast.EnumConstructor.format_t case_name))
(Ast.EnumConstructorMap.bindings remaining_e_cases))
(Pos.get_position e)
else
let e1 = translate_expr ctx e1 in
Bindlib.box_apply2
(fun d_fields e1 -> Dcalc.Ast.EMatch (e1, d_fields))
(Bindlib.box_list d_cases) e1

Lifting scope from an object to a list of objects

The problem

Let us say you're dealing with a household, that can have many children. Here are your scopes and structures :

declaration structure Child : 
  data child_foo content integer 
  data child_bar content bool depends on amount

declaration scope HouseHold : 
  context household_foo content integer 
  context children content collection Child

declaration scope ChildScope : 
  context child content Child 

scope ChildScope :   
  definition child_foo equals 42

  definition child_bar of revenue equals revenue >= $1,000

We want that the rules of the scope ChildScope apply for each child of the children context variable, which is a collection. There is no language mechanism in Catala to let you do that other than to include inside HouseHold another user-defined scope, MultipleChildrenScope :

declaration scope MultipleChildrenScope : 
  context multiple_children content collection Child
  context child_foo content integer depends on Child 
  context child_bar content bool depends on amount depends on Child

scope MultipleChildrenScope : 
  definition child_foo of child under condition 
    child in multiple_children
  consequence equals 
     ???

But then, what can you put here ? You need a way to have a copy of ChildScope for every child in the list. There is currently no surface language concept for doing that. What we need here is some kind of "scope functor".

A proposal for scope functors

Scopes are actually translated into functions later in the compiler, so we have a way to backport this functional semantic into the surface language to make appear those scope functors. We could have something like that :

declaration scope MultipleChildrenScope : 
  ...
  context child_scope scope ChildScope depends on child

The last child here is the name of the context parameter of ChildScope. Then, inside your expressions you can invoke the functor :

scope MultipleChildrenScope : 
  definition child_foo of child under condition 
    child in multiple_children
  consequence equals 
     (child_scope of child).child_foo

Inside the compiler, this means that child_scope cannot be inlined statically anymore, since the function that it now represents will be called dynamically. Sub-scope inlining may not be the rule after all.

Out of date French documentation

The documentation for writing Catala with French keywords is out of date, mainly some keywords are missing compared to the English version.

Per Denis' remarks in PR#23, the reading guide should be written as HTML rather than in LaTeX because we want it to appear here : https://catala-lang.org/en/guide. There's already a tutorial for programmer that showcases most of the syntax here : https://catala-lang.org/en/examples/tutorial.

I would imagine the English versions are considered the reference versions. If that is indeed the case, it might be easiest to simply translate the English versions, taking special care to maintain the French keywords as they are already used.

What do you think? If that is ok with you I can help out by doing some translation.

Implement date collections merging

The problem

Currently, the operators for handling collections are insufficient to implement this function required for Section 121 of the US Tax Code:

scope PeriodMerge:
# Placeholders, overwritten by caller
definition periods1 equals []
definition periods2 equals []
# TODO: find a way to implement the merging of two collections of date
# periods into a single non-overlapping collection of date periods such
# that the output covers both input date ranges.
definition output equals []

What do do?

We clearly need a fully fledged standard library for collections. Even, it is unclear which functions would be sufficient to implement such an algorithm.

This situation is the usual DSL trap where we don't have the power of a full language to implement what we need. Multiple ways to get out of it:

  • keep adding standard library functions as our needs grow, in which case we will need a proper extensible system for doing so;
  • moving this merge_periods functions to an input of the algorithm; this has the downside of not being able to write and run test cases in fully Catala.

I would favor the first solution, and am listening for proposals on how to setup such an extensible system :)

Create a syntax cheat sheet

Right now, only the tutorial and the examples are reliable sources of documentation for the Catala syntax. https://catala-lang.org/en/formalisation contains a dump of the Catala grammar but with abstract tokens that are not tied to actual French or English keywords.

A better documentation should include a French and English syntax cheat sheet listing all the operators and structures available. A good way to build such a cheat sheet is to simply go through src/catala/catala_surface/ast.ml and do all the cases for expressions, operators, scopes, etc.

Is there an automated way of doing that for OCaml ? I guess not.

Remove parser tokens for builtin functions

Right now, builtin functions have a parser token associated with them:

| INT_TO_DEC {
(Builtin IntToDec, $sloc)
}
| GET_DAY {
(Builtin GetDay, $sloc)
}
| GET_MONTH {
(Builtin GetMonth, $sloc)
}
| GET_YEAR {
(Builtin GetYear, $sloc)
}

Altough this allows supporting multi-language frontends out of the box, it is cumbersome since it requires touching the parser each time we want to add a new builtin.

Instead, we should add a piece of code in the qident expression constructor:

| q = ident { let (q, q_pos) = q in (Ident q, q_pos) }

This piece of code would check would lookup the parser identifier into a multi-language list of identifiers corresponding to builtin functions, that would sit in the lexer*.ml files. Careful, one also has to check that we can't define scope parameters, structure fields or bound variables with a name corresponding to a builtin.

Improve name resolution for top-level scope, structs and enums

Right now, name resolution is done in two passes (in file src/catala/catala_surface/name_resolution.ml):

  1. Look at scope, struct and enum declarations in order and create identifiers for them and their fields/cases/context
  2. Look at scope uses and link the identifiers to the ones created in 1.

This is nice but lacks flexibility for scope, structs and enums. Especially, we want users to be able to declare their scope anywhere in the code, and not following a dependency order. For instance, if scope A has a subscope of type B, we don't want to force the user to declare B before A.

This problem can be solved by doing a three-pass name resolution with the following passes:

  1. Look at scope, struct and enum declarations and create identifiers for them
  2. Look at fields/cases/context are create identifiers for them, also possibly linking to the scope/struct/enum identifiers of 1.
  3. Look at scope uses and link the identifiers to the ones created in 1. and 2.

Add disambiguation syntax for enum constructors and struct fields

The problem

Currently, it is possible in Catala for structs to share field names and for enums to share constructor names. E.g:

declaration structure Foo:
  data field1 content int 
  data field2 content bool

declaration structure Bar:
  data field1 content date 
  data field3 content bool

But then, field access becomes ambiguous:

... baz.field1 # Is it Foo.field1 or Bar.field1 ?

Right now, an error is thrown in this case:

let disambiguate_constructor (ctxt : Name_resolution.context) (constructor : string Pos.marked list)
(pos : Pos.t) : Scopelang.Ast.EnumName.t * Scopelang.Ast.EnumConstructor.t =
let constructor =
match constructor with
| [ c ] -> c
| _ ->
Errors.raise_spanned_error "The deep pattern matching syntactic sugar is not yet supported"
pos
in
let possible_c_uids =
try Desugared.Ast.IdentMap.find (Pos.unmark constructor) ctxt.constructor_idmap
with Not_found ->
Errors.raise_spanned_error
"The name of this constructor has not been defined before, maybe it is a typo?"
(Pos.get_position constructor)
in
if Scopelang.Ast.EnumMap.cardinal possible_c_uids > 1 then
Errors.raise_spanned_error
(Format.asprintf
"This constuctor name is ambiguous, it can belong to %a. Desambiguate it by prefixing it \
with the enum name."
(Format.pp_print_list
~pp_sep:(fun fmt () -> Format.fprintf fmt " or ")
(fun fmt (s_name, _) -> Format.fprintf fmt "%a" Scopelang.Ast.EnumName.format_t s_name))
(Scopelang.Ast.EnumMap.bindings possible_c_uids))
(Pos.get_position constructor);
Scopelang.Ast.EnumMap.choose possible_c_uids

if Scopelang.Ast.StructMap.cardinal x_possible_structs > 1 then
Errors.raise_spanned_error
(Format.asprintf
"This struct field name is ambiguous, it can belong to %a. Desambiguate it by \
prefixing it with the struct name."
(Format.pp_print_list
~pp_sep:(fun fmt () -> Format.fprintf fmt " or ")
(fun fmt (s_name, _) ->
Format.fprintf fmt "%a" Scopelang.Ast.StructName.format_t s_name))
(Scopelang.Ast.StructMap.bindings x_possible_structs))
(Pos.get_position x)
else

However, it is not currently syntactically possible to disambiguate using Foo.field1 or Bar.field1 ; currently, the only solution is to rename the fields.

How to fix it ?

First, the parser has to be updated here:

| e = small_expression DOT i = ident {
(Dotted (e, i), Pos.from_lpos $sloc)
}

ident should be turned into something like option(constructor DOT) ident and the AST should be updated accordingly:

https://github.com/CatalaLang/catala/blob/master/src/catala/catala_surface/ast.ml#L266

Then, the desugaring should be updated. I don't known at this point whether we would need to update this part

| Dotted (e, x) -> (
match Pos.unmark e with
| Ident y when Name_resolution.is_subscope_uid scope ctxt y ->
(* In this case, y.x is a subscope variable *)
let subscope_uid : Scopelang.Ast.SubScopeName.t =
Name_resolution.get_subscope_uid scope ctxt (Pos.same_pos_as y e)
in
let subscope_real_uid : Scopelang.Ast.ScopeName.t =
Scopelang.Ast.SubScopeMap.find subscope_uid scope_ctxt.sub_scopes
in
let subscope_var_uid = Name_resolution.get_var_uid subscope_real_uid ctxt x in
Bindlib.box
( Scopelang.Ast.ELocation
(SubScopeVar (subscope_real_uid, (subscope_uid, pos), (subscope_var_uid, pos))),
pos )
| _ ->
(* In this case e.x is the struct field x access of expression e *)
let e = translate_expr scope ctxt e in
let x_possible_structs =
try Desugared.Ast.IdentMap.find (Pos.unmark x) ctxt.field_idmap
with Not_found ->
Errors.raise_spanned_error "This identifier should refer to a struct field"
(Pos.get_position x)
in
if Scopelang.Ast.StructMap.cardinal x_possible_structs > 1 then
Errors.raise_spanned_error
(Format.asprintf
"This struct field name is ambiguous, it can belong to %a. Desambiguate it by \
prefixing it with the struct name."
(Format.pp_print_list
~pp_sep:(fun fmt () -> Format.fprintf fmt " or ")
(fun fmt (s_name, _) ->
Format.fprintf fmt "%a" Scopelang.Ast.StructName.format_t s_name))
(Scopelang.Ast.StructMap.bindings x_possible_structs))
(Pos.get_position x)
else
let s_uid, f_uid = Scopelang.Ast.StructMap.choose x_possible_structs in
Bindlib.box_apply (fun e -> (Scopelang.Ast.EStructAccess (e, f_uid, s_uid), pos)) e )

Refactor runtime functions

Leftover of #73: src/catala/default_calculus/interpreter.ml use src/catala/runtime.ml have very similar code, the former should use the latter as a dependency ot avoid duplication.

Syntax regression

In the old syntax, the following was accepted:

@§286-136 Penalty@
 
/*
 scope Penalty286_83_135:
 
   rule fine_ok under condition min_fine <=$ fine and fine <=$ max_fine

but with the new syntax, it seems to be rejected:

## §286-136 Penalty

```catala
scope Penalty286_83_135:

with error:

[ERROR] Syntax error at token "```catala"
[ERROR] Message: expected an article title, another heading or some text
[ERROR] 
[ERROR] Error token:
[ERROR]   --> Title17-MotorAndOtherVehicles.catala_en
[ERROR]    | 
[ERROR] 58 | ```catala
[ERROR]    | ^^^^^^^^^
[ERROR]    + 
[ERROR] 
[ERROR] Last good token:
[ERROR]   --> Title17-MotorAndOtherVehicles.catala_en
[ERROR]    | 
[ERROR] 56 | ## §286-136 Penalty
[ERROR]    | ^^^^^^^^^^^^^^^^^^^
[ERROR] 57 | 
[ERROR]    | 
[ERROR] 58 | ```catala
[ERROR]    | ^
[ERROR]    + 
make: *** [Title17-MotorAndOtherVehicles.run] Error 1

Change dates to ISO format

The problem

Right now, date literals are of the form |01/01/2021| in Catala. However, this format is confusing because Americans use the MM/DD/YYYY format instead of the logical DD/MM/YYYY format.

To remove this source of confusion, we should adopt the ISO format and write |YYYY-MM-DD|.

How to implement the solution

This is the code responsible for date literal parsing:

| VERTICAL d = date_int DIV m = date_int DIV y = date_int VERTICAL {
(LDate {
literal_date_day = (match !Utils.Cli.locale_lang with `En -> m | `Fr -> d);
literal_date_month = (match !Utils.Cli.locale_lang with `En -> d | `Fr -> m);
literal_date_year = y;
}, Pos.from_lpos $sloc)
}

It should be tweaked to implement the ISO format separated by dashes instead of DIV.

Implement quick pattern-check desugaring

The Catala surface syntax contains a handy sugar that allows to check whether a value of a sum type falls into one of the sum type cases:

| TestMatchCase of expression Pos.marked * constructor Pos.marked

This AST pattern is not handled by desugaring yet:

The desugaring should be implemented in the following way. Suppose you have:

declaration enumeration Foo:
  -- Case1
  -- Case2
  -- Case3 content integer

Then x with pattern Case2 should be desugared to:

match x with pattern 
  -- Case1 : false 
  -- Case2: true 
  -- Case3 content y: false

One can take inspiration from

| MatchWith (e1, (cases, _cases_pos)) ->
let e1 = translate_expr scope ctxt e1 in
let cases_d, e_uid =
List.fold_left
(fun (cases_d, e_uid) (case, pos_case) ->
match Pos.unmark case.Ast.match_case_pattern with
| [ constructor ], binding ->
let possible_c_uids =
try Desugared.Ast.IdentMap.find (Pos.unmark constructor) ctxt.constructor_idmap
with Not_found ->
Errors.raise_spanned_error
"The name of this constructor has not been defined before, maybe it is a \
typo?"
(Pos.get_position constructor)
in
if e_uid = None && Scopelang.Ast.EnumMap.cardinal possible_c_uids > 1 then
Errors.raise_spanned_error
(Format.asprintf
"This constuctor name is ambiguous, it can belong to %a. Desambiguate it by \
prefixing it with the enum name."
(Format.pp_print_list
~pp_sep:(fun fmt () -> Format.fprintf fmt " or ")
(fun fmt (s_name, _) ->
Format.fprintf fmt "%a" Scopelang.Ast.EnumName.format_t s_name))
(Scopelang.Ast.EnumMap.bindings possible_c_uids))
(Pos.get_position constructor)
else
let e_uid, c_uid =
match e_uid with
| Some e_uid -> (
( e_uid,
try Scopelang.Ast.EnumMap.find e_uid possible_c_uids
with Not_found ->
Errors.raise_spanned_error
(Format.asprintf "This constructor is not part of the %a enumeration"
Scopelang.Ast.EnumName.format_t e_uid)
(Pos.get_position constructor) ) )
| None -> Scopelang.Ast.EnumMap.choose possible_c_uids
in
( match Scopelang.Ast.EnumConstructorMap.find_opt c_uid cases_d with
| None -> ()
| Some e_case ->
Errors.raise_multispanned_error
(Format.asprintf "The constructor %a has been matched twice:"
Scopelang.Ast.EnumConstructor.format_t c_uid)
[
(None, Pos.get_position case.match_case_expr);
(None, Pos.get_position (Bindlib.unbox e_case));
] );
let ctxt, (param_var, param_pos) =
match binding with
| None -> (ctxt, (Scopelang.Ast.Var.make ("_", Pos.no_pos), Pos.no_pos))
| Some param ->
let ctxt, param_var = Name_resolution.add_def_local_var ctxt param in
(ctxt, (param_var, Pos.get_position param))
in
let case_body = translate_expr scope ctxt case.Ast.match_case_expr in
let e_binder = Bindlib.bind_mvar (Array.of_list [ param_var ]) case_body in
let case_expr =
Bindlib.box_apply2
(fun e_binder case_body ->
Pos.same_pos_as
(Scopelang.Ast.EAbs
( param_pos,
e_binder,
[
Scopelang.Ast.EnumConstructorMap.find c_uid
(Scopelang.Ast.EnumMap.find e_uid ctxt.Name_resolution.enums);
] ))
case_body)
e_binder case_body
in
(Scopelang.Ast.EnumConstructorMap.add c_uid case_expr cases_d, Some e_uid)
| _ :: _, _ ->
Errors.raise_spanned_error
"The deep pattern matching syntactic sugar is not yet supported" pos_case
| [], _ -> assert false
(* should not happen *))
(Scopelang.Ast.EnumConstructorMap.empty, None)
cases
in
Bindlib.box_apply2
(fun e1 cases_d -> (Scopelang.Ast.EMatch (e1, Option.get e_uid, cases_d), pos))
e1
(LiftEnumConstructorMap.lift_box cases_d)

Makefile does not offer suggestions

jonathan@absinthe:~/Code/catala/examples/us_tax_code (master) $ make section_121.run --debug
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0
Reading makefiles...
Updating goal targets....
 File `section_121.run' does not exist.
Must remake target `section_121.run'.
dune exec --no-print-director ../../src/catala/catala.exe --  --language=en Makefile section_121.catala_en
dune exec --no-print-director ../../src/catala/catala.exe --  --language=en \
		Interpret \
		-s  \
		section_121.catala_en
catala: required argument FILE is missing
Usage: catala [OPTION]... BACKEND FILE
Try `catala --help' for more information.
make: *** [section_121.run] Error 1

not sure there's a good way of dealing with this, except perhaps

SCOPE?=PleasePickADefaultScopeViaTheSCOPEVariable

in the shared Makefile

Switch to markdown syntax for literate programming

Right now, the literate programming syntax of Catala uses the following operators:

  • @@ to open and close headings (with + for precedence)
  • @ to open and close article titles
  • /* and */ to delimit code sections

This syntax is non-standard and does not display well anywhere. Since we have to pick a syntax, might as well pick one that enhances interoperability with existing tools.

Towards a markdown syntax

Markdown seems adapted to the level of formatting that we want for the legislative part. Plus, it's supported by most IDEs and has special GitHub/Gitlab support so it would help the display on the CVS.

With a markdown syntax, a Catala file could look like:

## Some law header

### Some law header subsection

#### [Article 6 ] 

```
declaration structure:
   ...
```

The [...] notation is used to distinguish law headers from titles of legislative article. An article is the atom of legislation, all the code blocks should belong to an article (see #29)

How to implement that

We need to change the parts of the lexers that deal with this syntax like

| "@@", Star white_space, "Master file", Star white_space, "@@" -> MASTER_FILE
| "@@", Star white_space, "Begin metadata", Star white_space, "@@" -> BEGIN_METADATA
| "@@", Star white_space, "End metadata", Star white_space, "@@" -> END_METADATA
| ( "@@",
Star white_space,
"Include:",
Star white_space,
Plus (Compl '@'),
Star white_space,
Opt ('@', Star white_space, "p.", Star white_space, Plus '0' .. '9', Star white_space),
"@@" ) ->
let extract_components =
R.regexp "@@\\s*Include\\:\\s*([^@]+)\\s*(@\\s*p\\.\\s*([0-9]+)|)@@"
in
let get_component = R.get_substring (R.exec ~rex:extract_components (Utf8.lexeme lexbuf)) in
let name = get_component 1 in
let pages = try Some (int_of_string (get_component 3)) with Not_found -> None in
let pos = lexing_positions lexbuf in
if Filename.extension name = ".pdf" then LAW_INCLUDE (Ast.PdfFile ((name, pos), pages))
else LAW_INCLUDE (Ast.CatalaFile (name, pos))
| "@@", Plus (Compl '@'), "@@", Star '+' ->
let extract_code_title = R.regexp "@@([^@]+)@@([\\+]*)" in
let get_match = R.get_substring (R.exec ~rex:extract_code_title (Utf8.lexeme lexbuf)) in
let get_new_lines = R.regexp "\n" in
let new_lines_count =
try Array.length (R.extract ~rex:get_new_lines (Utf8.lexeme lexbuf)) with Not_found -> 0
in
for _i = 1 to new_lines_count do
new_line lexbuf
done;
let law_title = get_match 1 in
let precedence = String.length (get_match 2) in
LAW_HEADING (law_title, precedence)
| "@", Plus (Compl '@'), "@" ->
let extract_article_title = R.regexp "@([^@]+)@" in
let title = R.get_substring (R.exec ~rex:extract_article_title (Utf8.lexeme lexbuf)) 1 in
let get_new_lines = R.regexp "\n" in
let new_lines_count =
try Array.length (R.extract ~rex:get_new_lines (Utf8.lexeme lexbuf)) with Not_found -> 0
in
(* the -1 is here to compensate for Sedlex's automatic newline detection around token *)
for _i = 1 to new_lines_count - 1 do
new_line lexbuf
done;
LAW_ARTICLE (title, None, None)

From the top of my head no changes to the parser should be required. EDIT: BEGIN_CODE and END_CODE should be merged together I think ? Or we can say that BEGIN_CODE should always be ```catala whereas END_CODE is ```

Externally-implemented modules for Catala

The problem

In this issue, we assume that Catala has a module system as described in #88. The problem we want to solve is exemplified by #81: we want for Catala programs to rely on abstract scopes and types which cannot be defined or implemented within the limits of the DSL.

Design of an external implementations systems

Syntax

To simplify the implementation, we only allow for whole modules to be externally implemented at a time. Externally implemented modules have to be define in standard .catala* files, except that the first line with the module name should bear:

> Module Foo external

While types and scopes have to be declared in such a file, it is impossible to write any scope usage that might introduce rules or definitions. Modules that use Foo can then use any of the externally implemented types and scopes defined in Foo.

Compilation

When compiling an externally implemented module to a backend like OCaml, the Catala compiler shall create a header file like *.mli or *.h. It is then up to the user to provide a manually-written implementation in the target language.

Intererpreter

See below

Precedence issues

To be fixed by cleaning up the parser...

This does not parse (need to make not bind tighter):

  exception definition max_days under condition
    paragraph_a_applies and
    not paragraph_b_applies and

Associativity for = and != is not left

Will add more as I find them

Bug in the Makefile

Currently, an error occurs when trying to execute make tests just after a make clean:

Screenshot from 2021-03-16 18-17-53

Typecheck and test Allocations familiales

The code of the Allocations familiales example is old and should be completely rewritten. While resolving #36 and #37 is needed for the list of children to get the case study running in its entirety, some parts of the examples can be written up in Catala using small scopes that can be unit-tested.

Polymorphic operators

Right now, binary and unary operators are specialized depending on their type : +$ for money addition, -@ for date substraction, etc. This is cumbersome and ideally we would want to write + for adding money, dates, integers, decimals, etc.

Why is it like this?

The need for specialized operators stems form Catala's type system, and more precisely from the default calculs typing algorithm. It uses a classical Hindley-Milner inference and the W algorithm. In classical Hindley-Milner inference, operators either operate on a fixed type, or are completely polymorphic. This is the typing procedure for Catala operators right now :

let op_type (op : A.operator Pos.marked) : typ Pos.marked UnionFind.elem =
let pos = Pos.get_position op in
let bt = UnionFind.make (TLit TBool, pos) in
let it = UnionFind.make (TLit TInt, pos) in
let rt = UnionFind.make (TLit TRat, pos) in
let mt = UnionFind.make (TLit TMoney, pos) in
let dut = UnionFind.make (TLit TDuration, pos) in
let dat = UnionFind.make (TLit TDate, pos) in
let any = UnionFind.make (TAny (Any.fresh ()), pos) in
let array_any = UnionFind.make (TArray any, pos) in
let any2 = UnionFind.make (TAny (Any.fresh ()), pos) in
let arr x y = UnionFind.make (TArrow (x, y), pos) in
match Pos.unmark op with
| A.Ternop A.Fold -> arr (arr any2 (arr any any2)) (arr any2 (arr array_any any2))
| A.Binop (A.And | A.Or) -> arr bt (arr bt bt)
| A.Binop (A.Add KInt | A.Sub KInt | A.Mult KInt | A.Div KInt) -> arr it (arr it it)
| A.Binop (A.Add KRat | A.Sub KRat | A.Mult KRat | A.Div KRat) -> arr rt (arr rt rt)
| A.Binop (A.Add KMoney | A.Sub KMoney) -> arr mt (arr mt mt)
| A.Binop (A.Add KDuration | A.Sub KDuration) -> arr dut (arr dut dut)
| A.Binop (A.Sub KDate) -> arr dat (arr dat dut)
| A.Binop (A.Add KDate) -> arr dat (arr dut dat)
| A.Binop (A.Div KMoney) -> arr mt (arr mt rt)
| A.Binop (A.Mult KMoney) -> arr mt (arr rt mt)
| A.Binop (A.Lt KInt | A.Lte KInt | A.Gt KInt | A.Gte KInt) -> arr it (arr it bt)
| A.Binop (A.Lt KRat | A.Lte KRat | A.Gt KRat | A.Gte KRat) -> arr rt (arr rt bt)
| A.Binop (A.Lt KMoney | A.Lte KMoney | A.Gt KMoney | A.Gte KMoney) -> arr mt (arr mt bt)
| A.Binop (A.Lt KDate | A.Lte KDate | A.Gt KDate | A.Gte KDate) -> arr dat (arr dat bt)
| A.Binop (A.Lt KDuration | A.Lte KDuration | A.Gt KDuration | A.Gte KDuration) ->
arr dut (arr dut bt)
| A.Binop (A.Eq | A.Neq) -> arr any (arr any bt)
| A.Binop A.Map -> arr (arr any any2) (arr array_any any2)
| A.Unop (A.Minus KInt) -> arr it it
| A.Unop (A.Minus KRat) -> arr rt rt
| A.Unop (A.Minus KMoney) -> arr mt mt
| A.Unop (A.Minus KDuration) -> arr dut dut
| A.Unop A.Not -> arr bt bt
| A.Unop A.ErrorOnEmpty -> arr any any
| A.Unop (A.Log _) -> arr any any
| A.Unop A.Length -> arr array_any it
| A.Unop A.IntToRat -> arr it rt
| Binop (Mult (KDate | KDuration)) | Binop (Div (KDate | KDuration)) | Unop (Minus KDate) ->
Errors.raise_spanned_error "This operator is not available!" pos

Why can't we make all operators completely polymorphic ? Because it does not make any sense to + two enumerations together, or * two dates.

Hindley-Milner inference with constraints

What we really want is a type system where we say : + operates only on a fixed list of types. This corresponds to Hindley-Milner inference with constraints, a notoriously hard problem. Some libraries exist to perform the typing, like https://gitlab.inria.fr/fpottier/inferno, but overall it is much more complicated from the classical inference I'm doing right now.

Because I don't want to spend too much time on Catala's type system, I prefer we stick to classical inference and hence we're stuck with the default calculus operators as they are.

A compromise

But this does not mean we can't have what we want ! Indeed, we can keep the differentiated default calculus operators but have undifferentiated surface syntax operator. We can write |01/01/2020| - |01/01/2019| and have the compiler figure out that - is actually -@ during desugaring.

This system is nice because it does not complicate the default calculus. Instead, we rely on a conservative desugaring pass that would sit somewhere in Desugared or Scopelang and that would specialize the operators with the information provided by a conservative, high-level additional type checking pass done in Desugared or Scopelang.

If this pass does not manage to determine the type of the operator, we send an error message back to the user saying that desambiguation is needed and the user actually has to write -@.

What are your thoughts on this?

Detect cycles in exceptions

Right now the algorithm that builds the default tree from the list of rules defining a variable expects that there are no cycles in the label and exception declaration.

(** Transforms a flat list of rules into a tree, taking into account the priorities declared between
rules
{e Invariant:} there are no exceptions cycles
{e Invariant:} there are no dandling exception pointers in the rules *)
let rec def_map_to_tree (def_info : Ast.ScopeDef.t)
(is_def_func : Scopelang.Ast.typ Pos.marked option) (def : Ast.rule Ast.RuleMap.t) :
rule_tree list =

This requirement is not enforced, so we should built a dependency graph whose vertices are label and edges are exception, and check for cycles in this graph for each scope variable (defined by a list of rules).

The code for this graph checking could live in src/catala/desugared/dependency.ml.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.