Add a CONTRIBUTING.md file

Many projects have one of these. Perhaps it would help make it easier to get started?

Use arena for allocating AST nodes

This might be faster than using reference counting, as we currently do in the core AST. We'll have take care not to let it adversely impact readability in the type checker, however.

Add facilities for debugging/visualising the typechecker

Currently it's hard to get much insight into what the type checker is doing when it is checking a term, even with a debugger. This is a barrier to entry for helping new developers gain understand how the Pikelet type checking algorithm works. This is important if we want more developers to be able to confidently contribute fixes and features to the type checker. It could also help us fix bugs and spot opportunities for optimizations.

A debugger would output some directed graph showing how a Pikelet term is checked. We could potentially use Graphvis, or perhaps HTML for this. Console logging is another option.

Allow files to be loaded from the REPL

We should be able to load files in the REPL, either when passing a list of filenames on startup, or by a :load/:l command while the REPL is running:

cargo run repl ./src/library/prelude.pi

Pikelet> :load ./src/library/prelude.pi
Pikelet> id String "hi"
"hi" : #String

We should also be able to reload files using :reload/:r:

Pikelet> :load ./src/library/prelude.pi
Pikelet> :reload

Dependent Records (first pass)

This is a cut down set of requirements for dependent records, in contrast to the more lofty goals described in issue #2.

The following additions will be made to the core syntax:

R ::= l : T, R  -- field type annotation
    | ε         -- empty record type

r ::= l = t, r  -- field value definition
    | ε         -- empty record value

t,T ::= ...
      | Record { R }   -- record types
      | record { r }   -- record values
      | t.l            -- projection

The same label cannot be used multiple times in the same record.
For now we will be picky on field order, ie. Record { x : I8, y: I8 } will not be compatible with Record { y: I8, x : I8 }, and record { x = 1, y = 2 } : Record { x : I8, y: I8 } is a type error.
Field type annotations introduce new values into the type checking context, and may be used in the type annotations of later fields.
Field value definitions introduce new values into the evaluation context, to be when constructing later fields.
Care must be taken to preserve these substitutions when projecting fields.

Dependent records

One big motivation I've had for going with dependent types in the first place is the idea of being able to use the same language feature (dependent records) for both modules and data types. This is similar to what is proposed in 1ML, but I would prefer not to jump through the hoops of forcing the language into System Fω, and rather just go straight to dependent types, seeing as they are such a handy tool to have in the first place.

These would be super handy for representing modules. Eg.

Graph (a : Type) = Record {
    Node : Type,
    Edge : Type,

    has-edge : a -> Node -> Node -> Bool,
    edges : a -> Node -> List Edge,
}

MyGraph = DArray (Int, Int)

MyGraph-graph : Graph MyGraph
MyGraph-graph = record {
    Node = Int,
    Edge = (Int, Int),

    has-edge = _,
    edges = _,
}

This would also allow us to hide the length of length of arrays to get something like Rust's Vec type:

DArray (a : Type) = Record {
    len : U64,
    data : Array a len,
};

Relationship with Sigma types

A very simple way of representing dependent records would be to use dependent pairs (sigma types):

Unit : Type
{} : Unit
{e1, e2} : (x : t1) * t2
left : (x : t1) * t2 -> t1
right : (x : t1) * t2 -> t2

Where t1 depends on t2.

The trouble with this formulation is that labels are not significant when comparing for equality at the type and value level.

Structural vs Nominal

I would kind of prefer to go structural rather than nominal with this, but it seems that most dependently typed languages use nominal records. I'm guessing this is because it makes checking if two record types are compatible much easier. If we went structural we might need to do some complex subtyping and coercions to handle the fact that fields might be mentioned out of order.

In his thesis on Agda, Ulf Norel says:

In practise, however, it is a good idea to let each record declaration introduce a new type. This means that two record types declared to have the same fields will be different, but they will have the same elements. One advantage of this is that it significantly improves the efficiency of checking equality between record types—instead of comparing the types of all the fields, it is enough to compare the names. It is also good programming practise to keep intentionally different types separate in the type system.

Data layout

It would be nice if records were efficiently packed in memory by default, like Rust does it. The unboxed types issue #18 is related.

A potential issue with a non-uniform memory representation is that, as @ollef points out, if you have a record { x : A, y : B x, z : C }, then the memory offset of z depends on x. We might be able to do some size analysis though, and require non-statically known sizes to have a pointer indirection.

If we allow structural types, data layout might become trickier. We might need to have a consistent 'normalized field order' for records. Things would become even more trickier if row polymorphism was brought into the mix.

Other stuff to consider wrt. data layout is struct of array and array of structs. Perhaps our compile time reflection would be powerful enough to auto-derive this, but langs like Jai have this built in.

This paper talks about some interesting ways to parameterise over data layout: https://www.doc.ic.ac.uk/~scd/ShapesOnwards.pdf

Here is a book going through Data Oriented Design.

Row polymorphism

It would be nice to support row polymorphism, but I have no idea how this would look.

Haven't thought much about it. That paper uses subtyping. For row typing you might be able to do like { Γ; Θ={ y₀:B₀; ⋯ yₙ:Bₙ }; Δ[Γ, Θ] } where Γ and Δ are treated as two separate row variables, Δ depends on Γ and Θ, and Θ is closed with respect to Γ. Not sure though.

@freebroccolo on twitter

First class labels

Labels are also interesting - I've never really liked the 'type level string' nature of record labels, especially once row polymorphism comes into play. Would be nice to have these be namespaced, sort of like how clojure.spec does it. Not sure what the ergonomics would look like though.

Lenses

Would we want to auto-generate these? If we had first class labels and row polymorphism I could see us being generate polymorphic lenses in a library, like how Purescript does it.

References

Use Bors for merging PRs

This would mean that we wouldn't need to wait around for the CI to pass before merging. Also makes use look more 'legit'!

https://bors.tech/

Pattern match compilation

Core languages like PiSigma and MiniTT have simple constructs for pattern matching, allowing for easier verification of type checking.

I'm not sure where this should be done, however. MLton seems to do it during its defunctorise pass.

We can do it in a naive way for now, but there is an interesting paper showing how to do it well: Compiling Pattern Matching to Good Decision Trees. Do note that we have the added complication of perhaps wanting to add dependent patterns etc.

Compiling non-dependent pattern matching to case trees is described in chapter 5 of The Implementation of Functional Programming Languages. This is what Sixten currently uses, "with some dependent pattern matching hacked in".

Compiling dependent pattern matching to case trees has been described by Jesper Cockx in:

Pretty error messages with source code snippets and underlines, etc.

Error messages are pretty rudimentary. brendanzab/codespan#1 would fix them!

Effects: Pikelet should actually be useful!

At the moment Pikelet doe not do anything useful. In order to be more useful, we probably need some form of way to perform effects.

Initially we could use an Io : Type -> Type wrapper type for this.

Eventually I would like us to move to something like an algebraic effect system. The limiting factor for this would be that it must be zero-cost. ie. We need to make sure that any reification of effects can be erased or partially evaluated away at compile time in a reliable way. This should result in codegen identical to if one had written code in an imperative language.

Resources

yallop/effects-bibliography has lots of links.

Here are some papers that are more specifically about integrating dependent types and effects:

D. Ahman, Handling Fibred Algebraic Effects. (Paper, Agda code)
D. Ahman, N. Ghani, G. Plotkin. Dependent Types and Fibred Computational Effects. (Paper)
Y. Cong, K. Asai, Handling Delimited Continuations with Dependent Types (Paper, Video)
C. McBride. Do be do be do. (Paper)
E. Miquey. A Classical Sequent Calculus with Dependent Types (Paper)
F. Nielson, H. R. Nielson. Type and Effect Systems. (Paper)
P.-M. Pédrot. Taming Effects in a Dependent World (Slides)
P.-M. Pédrot, N. Tabareau. Failure is Not an Option: An Exceptional Type Theory. (Paper, Slides)
A. Rossberg. 1ML with Special Effects: F-ing Generativity Polymorphism. (Paper)

Remove block comments

PR #32 introduced block comments to the lexer. ~~It would be nice if these nested too! This would then admit the following string:~~

{- 
    {- ... -} 
    {- 
       {- ... -} 
    -}
-}

We have decided instead to remove the block comments entirely!

Terminology bikeshed: Universe shifting/lifting/embiggening

So I now have some sort of McBride-style universe shifting implemented in Pikelet in (excuse my poor technical writing). I was wondering about terminology here. Should we call the foo^n a:

'shift'
- seems to be what RedTT is using
- used a few times in McBride's blog post
'lift'
- used in the Agda stdlib
- used in this report Dependently typed lambda calculus with a lifting operator
- does this fit with the mathematical term?

Wondering if you have any strong thoughts @jonsterling, @pigworker?

Prior discussion and implementation here: #10, #125

cc. @LightAndLight

Property based testing of the type system

Resources

Testing an Optimising Compiler by Generating Random Lambda Terms (Paper) (Code)
Effect-Driven QuickChecking of Compilers (Paper) (Code) (Talk)

Layout based/indentation based lexer

Would be nice to have a layout-based syntax like Haskell, Idris, and F#. I'm getting a little sick of all those closing curly braces in Rust!

Resources

Finish implementing Core->Concrete resugaring

This will allow us to remove all the excess pretty printers that we have to write for all the core representations by just pretty printing the concrete representation. Perhaps we could keep the core pretty printers around for debugging purposes, but use a simpler s-expression syntax that doesn't require stuff like precedences calculated.

Insert parens in proper places
use simpler s-expr representation for core syntax
Rename generated names (#6), taking care to avoid accidental name captures

Don't panic on mismatched number of record fields

The following expressions illustrate the problem:

record { x = "hello" } : Record { x : String; y : String }
record { x = "hello"; y = "hello" } : Record { x : String }

These both cause the following panic:

panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `1`', /../moniker-0.3.2/src/scope.rs:87:13

This is due to the following call to Scope::unbind2 in the type checker:

        // C-RECORD
        (&raw::Term::Record(span, ref raw_scope), &Value::RecordType(ref raw_ty_scope)) => {
            let (raw_fields, (), raw_ty_fields, ()) =
                Scope::unbind2(raw_scope.clone(), raw_ty_scope.clone());

Scope::unbind2 should probably return a result in this case, but for now we could work around this by adding the following check:

        // C-RECORD
        (&raw::Term::Record(span, ref raw_scope), &Value::RecordType(ref raw_ty_scope)) => {
            if raw_scope.unsafe_pattern.unsafe_patterns.len()
                != raw_ty_scope.unsafe_pattern.unsafe_patterns.len()
            {
                return Err(unimplemented!("mismatched fields"));
            }

            let (raw_fields, (), raw_ty_fields, ()) =
                Scope::unbind2(raw_scope.clone(), raw_ty_scope.clone());

unimplemented!("mismatched fields") should be replaced with an appropriate error diagnostic, and we'll need some tests to cover this as well! Here are some tests that could go in the check_term.rs file:

#[test]
fn record_field_mismatch_lt() {
    let mut codemap = CodeMap::new();
    let tc_env = TcEnv::default();

    let expected_ty = r"Record { x : String; y : String }";
    let given_expr = r#"record { x = "hello" }"#;

    let expected_ty = parse_nf_term(&mut codemap, &tc_env, expected_ty);
    match check_term(&tc_env, &parse_term(&mut codemap, given_expr), &expected_ty) {
        Err(TypeError::/* your error variant here! */ { .. }) => {},
        Err(err) => panic!("unexpected error: {:?}", err),
        Ok(term) => panic!("expected error but found: {}", term),
    }
}

#[test]
fn record_field_mismatch_gt() {
    let mut codemap = CodeMap::new();
    let tc_env = TcEnv::default();

    let expected_ty = r"Record { x : String }";
    let given_expr = r#"record { x = "hello"; y = "hello" }"#;

    let expected_ty = parse_nf_term(&mut codemap, &tc_env, expected_ty);
    match check_term(&tc_env, &parse_term(&mut codemap, given_expr), &expected_ty) {
        Err(TypeError::/* your error variant here! */ { .. }) => {},
        Err(err) => panic!("unexpected error: {:?}", err),
        Ok(term) => panic!("expected error but found: {}", term),
    }
}

Recursive definitions

For Pikelet to be of much use at all, we should allow definitions to be recursive. I would like to add totality checking at some stage, like Idris has. Mutually recursive definitions would also be handy!

Non-uniform representation and unboxed data types

It would be nice for our data types to be unboxed by default, similar to how Rust and Sixten do it. Records for instance should be represented in a contiguous run of memory, unless otherwise specified.

Error recovery in the type checker

We should be able to run programs that:

are broken syntactically
are partially type-incorrect

We should also be able to see more than one type error message.

At the moment we recover from syntactic errors, inserting concrete::Term::Error variants in the places that we fail to parse. But then we blow up if wen try to desugar these into RawTerms. Instead we should probably insert a static-error : {a : Type} -> String -> a function in those places instead...

Cumulative Universe Hierarchy

At the moment we use a stratified universe hierarchy to avoid Girard's Paradox (endemic to systems where Type : Type). Unfortunately this means that we lose some polymorphism. For example:

Pikelet> (\(x : Type) (x: x) => x) Type String
error: found a term of type `Type 1`, but expected a term of type `Type`
- <repl>:1:27: the term
Pikelet> (\(x : Type) (x: x) => x) (Type 1) Type
error: found a term of type `Type 2`, but expected a term of type `Type`
- <repl>:1:28: the term

It would be nice to be able to write the identity function once and have it work for all levels in the universe hierarchy.

Resources:

Add some primitive operations

Here's a list of some primitive functions that would be useful to have:

#String-eq : #String -> #String -> #Bool
#String-lt : #String -> #String -> #Bool
#String-le : #String -> #String -> #Bool
#String-ge : #String -> #String -> #Bool
#String-gt : #String -> #String -> #Bool
#String-gt : #String -> #String -> #Bool
#String-append : #String -> #String -> #String
#Char-eq : #Char -> #Char -> #Bool
#Char-lt : #Char -> #Char -> #Bool
#Char-le : #Char -> #Char -> #Bool
#Char-ge : #Char -> #Char -> #Bool
#Char-gt : #Char -> #Char -> #Bool
#Char-gt : #Char -> #Char -> #Bool
#Char-to-string : #Char -> #String
#U8-eq : #U8 -> #U8 -> #Bool
#U8-lt : #U8 -> #U8 -> #Bool
#U8-le : #U8 -> #U8 -> #Bool
#U8-ge : #U8 -> #U8 -> #Bool
#U8-gt : #U8 -> #U8 -> #Bool
#U8-gt : #U8 -> #U8 -> #Bool
#U8-add : #U8 -> #U8 -> #U8
#U8-sub : #U8 -> #U8 -> #U8
#U8-mul : #U8 -> #U8 -> #U8
#U8-div : #U8 -> #U8 -> #U8
#U8-to-string : #U8 -> #String
#U16-eq : #U16 -> #U16 -> #Bool
#U16-lt : #U16 -> #U16 -> #Bool
#U16-le : #U16 -> #U16 -> #Bool
#U16-ge : #U16 -> #U16 -> #Bool
#U16-gt : #U16 -> #U16 -> #Bool
#U16-gt : #U16 -> #U16 -> #Bool
#U16-add : #U16 -> #U16 -> #U16
#U16-sub : #U16 -> #U16 -> #U16
#U16-mul : #U16 -> #U16 -> #U16
#U16-div : #U16 -> #U16 -> #U16
#U16-to-string : #U16 -> #String
#U32-eq : #U32 -> #U32 -> #Bool
#U32-lt : #U32 -> #U32 -> #Bool
#U32-le : #U32 -> #U32 -> #Bool
#U32-ge : #U32 -> #U32 -> #Bool
#U32-gt : #U32 -> #U32 -> #Bool
#U32-gt : #U32 -> #U32 -> #Bool
#U32-add : #U32 -> #U32 -> #U32
#U32-sub : #U32 -> #U32 -> #U32
#U32-mul : #U32 -> #U32 -> #U32
#U32-div : #U32 -> #U32 -> #U32
#U32-to-string : #U32 -> #String
#U64-eq : #U64 -> #U64 -> #Bool
#U64-lt : #U64 -> #U64 -> #Bool
#U64-le : #U64 -> #U64 -> #Bool
#U64-ge : #U64 -> #U64 -> #Bool
#U64-gt : #U64 -> #U64 -> #Bool
#U64-gt : #U64 -> #U64 -> #Bool
#U64-add : #U64 -> #U64 -> #U64
#U64-sub : #U64 -> #U64 -> #U64
#U64-mul : #U64 -> #U64 -> #U64
#U64-div : #U64 -> #U64 -> #U64
#I8-eq : #I8 -> #I8 -> #Bool
#I8-lt : #I8 -> #I8 -> #Bool
#I8-le : #I8 -> #I8 -> #Bool
#I8-ge : #I8 -> #I8 -> #Bool
#I8-gt : #I8 -> #I8 -> #Bool
#I8-gt : #I8 -> #I8 -> #Bool
#I8-add : #I8 -> #I8 -> #I8
#I8-sub : #I8 -> #I8 -> #I8
#I8-mul : #I8 -> #I8 -> #I8
#I8-div : #I8 -> #I8 -> #I8
#I8-to-string : #I8 -> #String
#I16-eq : #I16 -> #I16 -> #Bool
#I16-lt : #I16 -> #I16 -> #Bool
#I16-le : #I16 -> #I16 -> #Bool
#I16-ge : #I16 -> #I16 -> #Bool
#I16-gt : #I16 -> #I16 -> #Bool
#I16-gt : #I16 -> #I16 -> #Bool
#I16-add : #I16 -> #I16 -> #I16
#I16-sub : #I16 -> #I16 -> #I16
#I16-mul : #I16 -> #I16 -> #I16
#I16-div : #I16 -> #I16 -> #I16
#I16-to-string : #I16 -> #String
#I32-eq : #I32 -> #I32 -> #Bool
#I32-lt : #I32 -> #I32 -> #Bool
#I32-le : #I32 -> #I32 -> #Bool
#I32-ge : #I32 -> #I32 -> #Bool
#I32-gt : #I32 -> #I32 -> #Bool
#I32-gt : #I32 -> #I32 -> #Bool
#I32-add : #I32 -> #I32 -> #I32
#I32-sub : #I32 -> #I32 -> #I32
#I32-mul : #I32 -> #I32 -> #I32
#I32-div : #I32 -> #I32 -> #I32
#I32-to-string : #I32 -> #String
#I64-eq : #I64 -> #I64 -> #Bool
#I64-lt : #I64 -> #I64 -> #Bool
#I64-le : #I64 -> #I64 -> #Bool
#I64-ge : #I64 -> #I64 -> #Bool
#I64-gt : #I64 -> #I64 -> #Bool
#I64-gt : #I64 -> #I64 -> #Bool
#I64-add : #I64 -> #I64 -> #I64
#I64-sub : #I64 -> #I64 -> #I64
#I64-mul : #I64 -> #I64 -> #I64
#I64-div : #I64 -> #I64 -> #I64
#F32-eq : #F32 -> #F32 -> #Bool
#F32-lt : #F32 -> #F32 -> #Bool
#F32-le : #F32 -> #F32 -> #Bool
#F32-ge : #F32 -> #F32 -> #Bool
#F32-gt : #F32 -> #F32 -> #Bool
#F32-gt : #F32 -> #F32 -> #Bool
#F32-add : #F32 -> #F32 -> #F32
#F32-sub : #F32 -> #F32 -> #F32
#F32-mul : #F32 -> #F32 -> #F32
#F32-div : #F32 -> #F32 -> #F32
#F32-to-string : #F32 -> #String
#F64-eq : #F64 -> #F64 -> #Bool
#F64-lt : #F64 -> #F64 -> #Bool
#F64-le : #F64 -> #F64 -> #Bool
#F64-ge : #F64 -> #F64 -> #Bool
#F64-gt : #F64 -> #F64 -> #Bool
#F64-gt : #F64 -> #F64 -> #Bool
#F64-add : #F64 -> #F64 -> #F64
#F64-sub : #F64 -> #F64 -> #F64
#F64-mul : #F64 -> #F64 -> #F64
#F64-div : #F64 -> #F64 -> #F64

Allow fields to be shifted

It would be nice to be able to shift field lookups too!

At the moment this just gives:

Pikelet> prelude.id^1 Type
error: expected one of "(", ")", "->", ".", ":", ";", "=", "=>", "?", "Record", "Type", "[", "]", "binary literal", "character literal", "decimal literal", "else", "float literal", "hex literal", "identifier", "import", "octal literal", "of", "record", "string literal", "then", "where", or "}", found `^`
- <repl>:1:11
1 | prelude.id^1
  |           ^ unexpected token

Rename signed integer types

I'm proposing that we rename the signed integer types from I8, I16, I32, I64 to S8, S16, S32, S64. I originally was following Rust's lead with the naming of signed integers, but I feel like this is more consistent.

If you do help out on this one, please be sure to update the book as well! 😄

Move to an incremental/query driven architecture

Before we go too far down the path of building a traditional compiler (#9), it probably makes sense to start thinking about how we might incrementalise things. This will be super important for supporting a good editor experience (#97). If we put this off for too long we might end up having to rebuild a bunch - this is what Rust is facing, for example.

Without knowing much about it, perhaps something like the Incremental Lambda Calculus would be handy for this. We could try to find the derivative of each pass of our compiler, based on the result of a previous run. This could also be a helpful framework for formalising our incremental compiler as well (#39)!

CRDT-style data structures could also be of potential use, and perhaps projects like timely-dataflow and differential-dataflow.

Unresolved Questions

What is the difference between 'incremental' vs. 'query-driven'?

Resources

Record keyword/sigil ambiguity bikeshedding

I'd like to use bare braces for implicit arguments (see #8), so we'll need a keyword or sigil for dependent record syntax (see #2). Now, if I just used a single identifier, like record, for record types and record values, I would run into the following ambiguity:

Pikelet> :t record {}

Is this an empty record value, or the type of an empty record?

If we were going to have default fields then we would run into a similar issue:

Pikelet> :t record { x : I32 = 23 }

Is this a record value with a field annotated with a type I32 and a value 23, or is it a record type with a field of type I32 and a default value 23?

One way to resolve this would be to defer this choice to type checking, and require annotations in ambiguous circumstances. Bidirectional checking should make this trivial, but I'm not sure how nice it would be ergonomically.

Idris has a similar ambiguity around (), which can stand for the 'unit value' or the 'unit type'. It however defaults to the unit value in ambiguous circumstances, and coerces to a type or value when it has the necessary type information:

Idris> ()
() : ()
Idris> the () ()
() : ()
Idris> the Type ()
() : Type

Another option would be to have two different keywords, but then I have to decide on that:

Record/record is one way, using a capital to signify the change in universe level
Struct/struct - would convey more of a feeling of that fact we want a non-uniform memory representation
sig/struct, like in ML?
sig/record?
signature/record - maybe a bit verbose?

Adopt a labelling scheme for issues

At the moment our issues are a mess, and not very well categorised. We should probably adopt a labelling scheme like Rust's!

Plasma has a nice labelling scheme with a 'skill' label. This might help folks find something that they can help out with!

Implement a Language Server and build an extension for VS Code

Currently Pikelet is implemented in a very traditional way - ie. it consumes a source string, does some processing, and spits something out. This isn't a very good approach when wanting to support interactive editing, however, but the more more modern, query-driven approaches to language implementation are still sparsley documented.

One way to make some of the requirements for this more concrete is to try our hands at implementing a language server communicating via the LSP to a VS Code extension. This could help direct these efforts, before we accumulate too much tech debt!

Plan

create a skeleton language extension for VS Code
implement a syntax highlighter
build a rudimentary language server using languageserver-types and codespan-lsp, possibly upstreaming helpful stuff to codespan. Others are currently working on similar problems (which is a good thing!), so we should coordinate our work on the codespan Gitter and on brendanzab/codespan#8.
Add integration tests for the language server
- eg. tests/language_server.rs
- eg. rust-analyzer/crates/ra_lsp_server/tests/heavy_tests

TODO: Improve this plan!

Resources

Language Extension Guidelines
Example - Language Server
Language Server Protocol Specification
Pikelet Book: Implementation
Current Rust LSPs:
Other interesting projects
- https://github.com/leanprover/lean

Module import planner, and implement import declarations

We have syntax for declaring module names and imports, but this has not been implemented yet!

Modules will not be mutually recursive, so we'll want to ensure these form a DAG.

Implement 'Let Arguments Go First'

It would be nice to improve the bidirectional type checking algorithm to also pull information from the arguments. This could also help us eventually implement implicit arguments (see #8). A nice approach for doing this was given in the paper Let Arguments Go First. In the introduction the authors state (emphasis added):

We believe that, similarly to standard bi-directional type checking, bi-directional type checking with an application mode can be applied to a wide range of type systems. Our work shows two particular and non-trivial applications. Other potential areas of applications are other type systems with subtyping, static overloading, implicit parameters or dependent types.

In section 3 (the Hindley-Milner example) of the paper they show the application context pulling type information from the arguments in order to fill in the 'hidden type arguments' of the foralls via the subtyping rules. Perhaps we could do something similar for implicit arguments! This would effectively involve combining sections 3, 4, and 5.3 into one system.

Resources

N. Xie, B. C. d. S. Oliveira, Let Arguments Go First. (Paper, Slides)
C. Jenkins, A. Stump, Spine Local Type Inference. (Paper, Slides)

Instance arguments

Once we get dependent records in #2 it would be nice to then be able to use them to implement typeclasses/modules. Instance arguments would give us the ability to automatically pass instances to functions. I think this might actually be easier to implement than implicit arguments (see #8).

Resources:

Fix ambiguous dependent function syntax

Currently we use the following syntax for functions:

(x : t1) -> t2  -- dependent functions
t1 -> t2        -- non-dependent functions

Sadly once we add in the type annotation syntax, ie. e : t, these notations become ambiguous in a context free grammar. This means we have to do an additional pass over the AST, fixing this after-the-fact.

We should either:

Live with this ambiguity and the hack
Switch to a non-context free parser generator, like some kind of PEG with ordered choice
Choose a different dependent function syntax, perhaps something like: Fn (x : t1) -> t2
Remove the type annotation syntax, perhaps replacing it with a function like: the : (a : Type) -> a -> a (this is what Idris does)

Turn theory appendix in book into the Pikelet specification

I've been looking at the Web Assembly specification, and have been very impressed. I'm thinking it might be nice to follow its lead with regard to notation and approach. At the moment I just use the formal notation on the theory page, but it would nice to be able to expand it to a larger specification.

Static site generators

I'm not sure what we should use for documentation. Mdbook has been nice, but the support for maths notation support in pulldown-cmark is lacking right now. It would be lovely if pulldown-cmark supported setting the class of inline and block elements, kind of like: {katex}`\Gamma \vdash x : \tau` - this way it would also render ok in Github's previews, but I don't know what it would take to get that merged.

The features I need are:

Math support:
- preferrably KaTeX, but MathJax if necessary
- ideally include math only on the pages I need it
- not have to do double-backslashes all the time
- ability to add global macros that I can use between documents
Ability to add a custom syntax highlighting - currently I use a fork of highlight.js for this

So yeah some other options available to us:

Tool	Language	Markup Language	Math support	Custom syntax highlighting	Docs theme	Boots to SPA	Versions	PDF Export
Gutenburg	Rust	Markdown	No (could be added)	Awkward	Yes	No	No	No
Hakyll	Haskell	Markdown	MathJax or KaTeX	Unknown	No	No	No	No
Slick	Haskell	Markdown	MathJax or KaTeX	Unknown	No	No	No	No
Sphinx	Python	ReStructured text	MathJax or KaTeX	Unknown	Yes	No	Yes	Yes
VuePress	Javascript	Markdown	KaTeX (server rendered)	Yes	Yes	Yes	No	No
Docusaurus	Javascript	Markdown	KaTeX (server rendered)	Yes	Yes	No	Yes	No

Separate repository

Should we put the spec and/or site in a separate repository? Or continue with the mono-repo approach?

Existing Specs

Plans for formalizing the type system and implementation

Had a nice chat with @pythonesque on Twitter:

I think at this point most people involved in Coq development, at least, agree that a majority of the stuff in it should be in a proof assistant (maybe should have been from the beginning). I strongly recommend you try to prove as much as you possibly can; it'll save you later.

https://twitter.com/awesomeintheory/status/983592526232932352

Apparently lots of the trouble comes in the form of optimizations to the type checking algorithm.

I think it might make sense in the short term to try to keep the implementation as close to the syntax-directed algorithm described in the appendix of the book. For the short term our programs should be small, and speed should be less of a concern for us.

As a precaution, we should also try to ensure certain aspects of our implementation avoid known trouble areas, for example using implicit substitutions, and building up a tangle of mutually recusive definitions/modules/types (although that can be hard to avoid at times in dependent type systems).

In parallel we could:

Define the declarative semantics (non-syntax-directed) and prove soundness theorems about that
Ensure that the syntax directed semantics match the properties of the declarative semantics
Using something fancy like Iris to design an imperative, optimized version of the type checker
Ensure that the optimized version matches the properties of the declarative semantics as well

Then we could start carefully porting over the optimizations to the Rust implementation.

Resources

Papers

Projects

wilbowma/cic-redex
coq-contribs/coq-in-coq
Type Preservation and Confluence for a for a predicative Calculus of Construction in Autosubst
sweirich/corespec
Milawa ("self-verifying" theorem prover for an ACL2-like logic)
CertiCoq
CakeML
... - please help us find more!

Variable not bound during evaluation

Found during a demo at the Melbourne Haskell User Group:

Pikelet> (\(t : Type) (f : (a : Type) -> a -> a) => f t) String (\(a : Type) (x : a) => x)
error: internal compiler error: cannot find `t$1` in scope
- <repl>:1:46: not found in this scope

If we manually apply the first argument it seems to be fine:

Pikelet> (\(f : (a : Type) -> a -> a) => f String) (\(a : Type) (x : a) => x)
\x$12 : #String => x$12 : ($7 : #String) -> #String

Thanks @andykitchen for finding this!

Case expressions (first pass)

For now we could just pattern match against literals:

case x of
   1 => ...
   2 => ...
   x => ...

Compiler backend

It would be nice to actually compile Pikelet programs!

Initial sketch of passes

We'll probably want some intermediate representations like MLton to get the higher level Pikelet code to be in better shape for passing to a back-end.

Raw - check/infer/elaborate -> Core
- fill in holes
- pass instances and implicit args explicitly
- add explicit coercions
Core - A-Normalize -> A-Normal Form (#91)
- make evaluation order and control flow explicit
A-Normal Form - Closure Conversion -> Closure Converted
- separate anonymous functions into a pair of:
  - a function pointer
  - the data containing the environment
Closure Converted - ??? -> SSA
SSA - ??? -> ...
... - Codegen -> LLVM/Cretonne/Bytecode

Unresolved questions

We would like to do more work on the type system to make efficient code-gen easier. For example tracking coeffects such as usage mutliplicities (#7), and data flow annotations (allowing us to partially evaluate code at compile time). Regions would also be handy at some stage.

Another issue we'll have to face at some time is garbage collection. Ultimately it would be nice to not have to force users to use a specific allocation strategy, but figuring out how this interacts with multiplicities might be tricky. Until we have a better idea it might be be easier just to use something like the Boehm-Demers-Weiser GC in the interim.

Possible codegen targets

Some possible targets for codegen are as follows:

LLVM (the inkwell crate looks handy!)
Cretonne
Web Assembly (could be covered by Cretonne or LLVM)
Some sort of Typed Assembly Language (TAL)?
Custom JIT (see runtimes-WG for some ideas)

A JIT might be handy as it would allow us to embed Pikelet in existing Rust codebases. This could help us gain adoption without needing to convince folks to switch to an entirely new language in the mid-term.

Resources:

Example compilers in Rust

Sixten is another functional language with dependent types and unboxed data types. Might be worth looking there for inspiration too!

Figure out a strategy for handling common (usually ignored) errors

Here are some common errors that are often ignored, but would be nice to be able to avoid either statically or ensure they are accounted for at runtime:

Out of memory errors
Integer overflow/underflow (ranged integers?)
Divide by zero
Loss of numerical precision
Units of measure mismatches (compile time units of measure lib?)
Index out of bounds (track array/matrix dimensions at the type level?)

Use "goldenfiles" for some tests

At the moment I'm explicitly writing out asts manually in some of my tests. This is tedious to update when I change the AST, and it would be handy to just get a diff of the changes automatically generated instead. This would allow us to be more liberal with our testing too. 'Goldenfiles' (also known as 'snapshot testing') seems like a good answer to this. There is a crate called goldenfile that does this for Rust.

Here are some tests that I think could be improved with this method:

We could probably add some more tests after that once we set up a nice, repeatable process for this.

Use a string interner for caching identifiers and strings

This should reduce the memory footprint of the type checker, and also make it faster to clone identifiers. LALRPOP uses string_cache for this.

I've updated our name binding crate to support this. See:

Implicit (inferred) arguments

Currently type parameters must always be supplied to polymorphic functions. Eg.

id : (a : Type) -> a -> a;
id a x = x;

const : (a b : Type) -> a -> b -> a;
const a b x y = x;

This can get quite annoying! Eg.

id String "hi"
id I32 42
const String Char "hi" '\s'
const I32 Char 42 '\s'

Implicit arguments will allow users to omit some arguments Eg.

id : {a : Type} -> a -> a;
id a x = x;

const : {a b : Type} -> a -> b -> a;
const a b x y = x;

This would make using the above functions much less tedious:

id "hi"
id 42
const "hi" '\s'
const 42 '\s'

We will probably have to add some simple unification to the type checker to enable this!

Variable numbering doesn't reset

Each time a term is printed in the REPL the variable numbering is different. It looks like the variable numbers don't reset at any time. This is confusing if you enter the same or similar term twice and expect to get the same or similar output.

Autocreate repl-history if it does not exist

It was reported to me that this is currently broken:

@brendanzab: @markbrown if you pull the pikelet repo, you can do cargo run repl to run the REPL
@markbrown: I had to create a repl-history file myself for that to work

Obviously I don't test the first-run experience very often! 😮

Use explicit substitutions and WHNFs in the type checker

Currently we do eager substitutions under binders. This is inefficient and probably won't be tenable once we have recursive datatypes and definitions. Type systems like F* and PiSigma use weak head normal forms and explicit substitutions to get around this.

Singleton Types

This will be handy when working with records-as modules, and also potentially make checking dependent records easier.

Core syntax changes:

t ::= ... | (= t)

The t should be an inferrable term.

Resources:

Mentioned in our discussion on record types: #2 (comment)

Implement a pass to compile Core terms to A-Normal Form

So, it seems that the first stage of compilation that we might want to do is to compile to A-Normal Form (ANF). I'm guessing we could start a new module called pikelet::compiler::anf for this. We'll probably want a new set of IR types for this.

A-Normal Form syntactically sequentializes computations, and it partitions expressions into two classes: atomic expressions and complex expressions.

Because it (implicitly) simplifies the internal structure of continuations, it is easier to construct an interpreter for A-Normal Form.

For the same reason, it is easier to generate machine code or construct a static analyzer.

– A-Normalization: Why and How

Type Preservation

I ultimately want each of our passes to be type preserving, so we'll want a new type system and type checker, but it's fine if the initial version leaves this out. The paper Compiling Dependent Types Without Continuations outlines how this can be done for dependent types.

Resources

Don't feel that you need to read and understand all of these in their entirety to tackle this issue, but I've included them with the hope that they might be helpful or interesting! Let me know if there are any more I should add!

User-defined infix operators

It would be nice to have operators for:

forward and reverse function composition: (<<), (>>)
forward and reverse function application: (<|), (|>)
mathematical operations: (+), (-), (*), (/)
string concatenation: (++)

Resources

Let and where bindings

I've implemented concrete syntax for let and where bindings, but not hooked them up to the core syntax yet. We need to do this at some stage!

I've been thinking - could we perhaps desugar them into dependent records like 1ML does? Eg:

let x = y; .. in z to (record { x = y, .., body = z }).body
z where x = y; .. to (record { x = y, .., body = z }).body

Not sure what the implications for alpha equality would be though. Also whether this is compatible with how we want bindings to be evaluated. 🤔

Refactor the compiler to use visitors

I poke fun at expression parser/evaluator written in OO style, because really the visitor pattern is just asking for it. But I have a weird, almost morbid fondness for such things written in plain old ANSI C. It’s complicated between me and C don’t ask me about it I’m not ready

— Drn’t Tran Ma 🦖

At the moment we create lots of intermediate data structures between the stages in the compiler, and it would be nice to cut that down!

Using the visitor pattern might make things more efficient by allowing us to fuse various traversals into single passes. Ultimately it might even be a good idea (or even necessary) to push the visitor abstraction back to the nameless library as well. The challenge would be to continue to maintain a nice, readable codebase in the process.

Resources

Zero-Overhead Tree Processing with the Visitor Pattern
libsyntax/visit.rs (visitor module in Rustc) - uses 'walkers' to allow for default methods to simplify visitor implementations.
ratel-visitor - has an interesting build! macro
Serde's Serializer anf Deserializer traits follow this pattern.
Visitors Unchained - auto-generating visitors for use in variable binding (uses OCaml)
Miniphases: Compilation using Modular and Efficient Tree Transformations - slightly different approach, applied to compilation in Dotty (the next version of the Scala compiler)

Get Pikelet to build on wasm

@sordina was trying to get Pikelet to build on wasm, but was running into issues with some system dependencies:

error[E0432]: unresolved import `sys`
 --> /Users/../.cargo/registry/src/github.com-1ecc6299db9ec823/mortal-0.1.5/src/screen.rs:9:5
  |
9 | use sys;
  |     ^^^ no `sys` in the root

error[E0432]: unresolved import `sys`
  --> /Users/../.cargo/registry/src/github.com-1ecc6299db9ec823/mortal-0.1.5/src/terminal.rs:10:5
   |
10 | use sys;
   |     ^^^ no `sys` in the root

error: aborting due to 2 previous errors

Getting Pikelet to build on wasm would be great, because it would let us build a 'try' website in the browser without needing to run it on a server, and also support use cases like the one above.

Quantitative Type Theory (ie. Linear Types + Erasure)

Quantitative type theory promises to get us both erasure and linear types in the same feature.

The idea is that we could add usage count annotations (multiplicities) to each binder in order to track how many times the parameters can be used at runtime:

alloc : {a : Type} (1 op : (1 data : T) -> IO a) -> IO a
free : {a : Type} (1 data : T) -> IO ()

newMArray : {a b : Type} -> Int -> (1 f : (1 data : MArray a) -> b) -> b
write : {a : Type} (1 data : MArray a) -> (Int, a) -> MArray a
read : {a : Type} (1 data : MArray a) -> Int -> (MArray a, a)
freeze : {a : Type} (1 data : MArray a) -> Array a

There would be the following multiplicities initially:

0: erased
1: linear - must be used exactly once
ω: unbounded - can be used any number of times

Here are some resources:

pikelet-lang / pikelet Goto Github PK

pikelet's Introduction

Pikelet!

Roadmap

Code of Conduct

pikelet's People

Contributors

Stargazers

Watchers

Forkers

pikelet's Issues

Relationship with Sigma types

Structural vs Nominal

Data layout

Row polymorphism

First class labels

Lenses

References

Resources

Resources

Resources

Unresolved Questions

Resources

Plan

Resources

Resources

Static site generators

Separate repository

Existing Specs

Resources

Papers

Projects

Initial sketch of passes

Unresolved questions

Possible codegen targets

Resources:

Example compilers in Rust

Type Preservation

Resources

Resources

Resources

Recommend Projects

Recommend Topics

Recommend Org