ssm-lang / sslang Goto Github PK

A language built atop the Sparse Synchronous Model

License: BSD 3-Clause "New" or "Revised" License

Haskell 51.71% Makefile 0.79% C 32.24% CMake 0.03% Shell 2.30% Yacc 2.07% JavaScript 0.70% CSS 5.69% HTML 0.47% Python 0.20% Logos 3.50% Nix 0.31%

sslang's People

Contributors

Stargazers

Watchers

sslang's Issues

Lambda lift breaks recursive let definitions

In regression-tests/tests/hello13.ssl, the following function for printing a list of characters fails to compile. It complains that the puts__ variable is unbound, the error presumably originating when it is recursively called.

puts_ putc s =
  let puts__ ss =
        match ss
          Cons c s = putc c
                     puts__ s
          Nil = ()
  puts__ s

// TypeError (ErrorMsg "Unbound variable: Var (VarId puts__) (Type [])")

We should allow recursive nested functions a la Haskell. This could be achieved by adding the name to the environment at the moment it is declared, like letrec bindings in Racket or OCaml, but I'm not sure if would be negative consequences to that. We could also add an explicit rec keyword like those languages do, but the result might be aesthetically subpar.

newnew1 test case is broken

The generated code segfaults for some reason. We should look into this, but I don't want this to block #104

Memory leak due to wrong main function signature

As discussed in today's weekly meeting, we discovered that the following program causes a memory leak because the main function is missing the cin argument:

main ( cout : & Int ) -> () =
//main : &Int -> & Int -> () =
  //fun (cint : &Int) (cout : &Int)
    after 10 , (cout : & Int) <- 65
    wait (cout : & Int)

We commented out the lines of code that would make the program correctly print out the character 'A'. Currently, this program is printing output to cin.

Here's the error produced in runtests.log:

###### Testing memory-leak-main-func
stack exec sslc -- tests/memory-leak-main-func.ssl > out/memory-leak-main-func.c
sizeof(struct ssm_mm) = 4
page size 4096
pages allocated 2
objects allocated 4
objects freed 3
live objects 1
4 pools
pool   0: pages   0  block-size    16  free-blocks     0
pool   1: pages   1  block-size    64  free-blocks    63
pool   2: pages   1  block-size   256  free-blocks    16
pool   3: pages   0  block-size  1024  free-blocks     0

FAILED: 1 live objects leaked at the end
###### FAILED

It's not desirable for users to encounter a memory leak when using the wrong function signature for main, and they should be notified through a different type of error with a communicative error message.

Scanner Breaks on Character Literals

It looks like one of the two most recent commits to the scanner causes a bug on character literals.
Input:

main cin cout =
 let x = 'h' // character literals should turn into integers, right?
 after 1, cout <- x
 ()

Expected Output:

Running at current head of main:

./runtests.sh tests/check_single_quotes.ssl
check_single_quotes...FAILED
...
###### Testing check_single_quotes
stack exec sslc -- tests/check_single_quotes.ssl > out/check_single_quotes.c
ParseError (ErrorMsg "Could not escape string: ''")
###### FAILED

If I roll back three commits, to commit: e4d8bf2

./runtests.sh tests/check_single_quotes.ssl
check_single_quotes...OK

Postfix function type annotations don't make it past type inference

In the following example:

foo (w: Int) (z: Int) -> Int =
  w + z

I had to annotate the function args "inline". The following equivalent:

foo w z: Int -> Int -> Int =
  w + z

Results in this error being thrown: TypeError "Cannot change empty Ann type to Classes type".

Correctly elide type annotations

If we write something like:

id (a: t) (b: t) -> t = b

This should be the same as writing:

id a b: t -> t -> t = b

That is, the type variable t across the annotations for a, b, and the return type should be quantified as the same name.

However, the lowering pass lowers these as three separate type annotations, which means they technically become separately quantified, since they reside in three separate type annotations.

I fear this kind of scoping might be tricky to get recover in the type-checking phase, so I should fix the lowering phase to collapse the three annotations into a single type annotation.

@XijiaoLi something to keep in mind while you work on type inference; to be safe, stick with the postfix type annotation syntax (using :) for now.

Use type environment for type inference

Currently, the type inference pass is embrassingly pure. It just uses whatever information is locally available about a node to assign it a type, if possible. This is why the hello world example we have right now has explicit type annotations on variable names.

The end-goal here is to achieve full-blast type inference (as well as type-checking). Thus a natural first step is to actually look up the types previously determined of variables, and store that information in an environnment. Something like the following should work:

main(led : & Int) =
  loop
    after 50 , led <- 1
    wait led
    after 50 , led <- 0
    wait led

Logically, this is a very straightforward, but should also force us to restructure our code with respect to a state monad (similar to the one I have in Codegen.Codegen at the moment). We can later expand upon this state to actually perform type inference and checking later on.

Blink 3 test case hangs

I think @sedwards-lab mentioned that this might be due to some race condition in the POSIX backend, when it tries to quit. This is really annoying for testing locally.

Construct IR syntax embedded directly into language front-end

We need some "ugly but usable" interaface that allows a programmer to directly construct IR constructs. This allows us to rapidly prototype changes to the IR subsystem without having to make corresponding design decisions and code changes to the front end of the compiler.

Use "odd numbers" to represent integer values

We have settled on representing all values using (32-bit) words; these may either represent literal values where possible, as is the case with integers, enums, and other small primitve types, or pointers to heap-allocated objects.

The memory manager needs to distinguish between these types of values because it must be able to recursively drop pointers embedded in heap-allocated structures. Whether a heap-allocated structure contains pointers can depend on the variant (e.g., Some bigObj : Maybe BigObj contains a pointer to BigObj while Nothing : Maybe BigObj does not, so this determination must be made at runtime.

Since pointers will always point to word-aligned values in the heap, we are guaranteed that the LSB is 0 for pointers. We can safely represent up to 31 bits of information by value, as long as we encode them such that the LSB for these values is always 1. For example. to represent an integer value x, we encode it as x << 1 | 1.

As a first step toward using words to represent a mix of values and pointers, we need to first adjust codegen to encode values using this scheme. It also needs to decode values back into integers to perform arithmetic before re-encoding them.

Add type annotation syntax for lambdas

Right now, if we want to type-annotate a lambda, e.g., fun x { x }, we need to write it postfix, i.e.,

(fun x
  x): a -> a

However, this can get kind of cumbersome. @sedwards-lab suggested that to keep the syntax uniform, we should also support the following:

fun (x: a) -> a
  x
fun x : a -> a
  x

This should of course generalize to lambdas with multiple arguments.

Parallel Expressions

"yield abstraction" might not be the best name for this, but that's what I've called it for now in our IR pipeline (and it does nothing right now). Perhaps "par lift" or "par abstraction" is a better term.

But @Rewbert actually brought up this issue as it was something he thought of while working on ssm-edsl.

Say we have something like this:

par do wait x
       after 1, y <- ()
    do after 1, x <- ()
       wait y

This expression alone already has some fairly interesting temporal behavior, and is something we do allow in our language. But codegen doesn't know how to deal with this, because it only knows how to deal with parallel function calls.

That can be achieved by lifting these into immediately applied lambdas:

par (\() -> wait x
            after 1, y <- ()) ()
    (\() -> wait y
            after 1, x <- ()) ()

This should be done before lambda lifting (#32) to ensure that these newly introduced lambdas are lifted to the top-level so that they are compatible with Codegen's compilation scheme.

Generalization bugs with type annotations

This test case passes when it shouldn't, while this one complains about an over-general type annotation when it's clearly not more general.

There's definitely some generalization bugs lurking in the typechecker. I don't currently have the bandwidth to fix this myself, but should do so at some point.

Don't rely on mangled names for AST equivalence checks

when we compare the two programs, we don't really want to compare the mangled names... it suggests that we should have some kind of more sophisticated IR comparison mechanisms modulo alpha renaming.

this was a comment on the following code in lambda lifting unit tests:

        nestedToLifted = lowerAndLift nonLiftedProgram
        liftedToLifted = lowerAndLift liftedProgram
    nestedToLifted `shouldBe` liftedToLifted

Lambda-lifting

We need to convert lambdas (aka closures) into top-level function definitions. So this:

f = \x -> (\() -> x) ()

should be converted to

type Fenv a = Fenv { x : a }

f__anon1 = \fenv -> \() -> fenv.x

f = \x -> f__anon (Fenv a) ()

That is, any lambdas that don't appear at the top-level of a top-level function definition should be hoisted into a name-mangled top-level definition and given an extra "environment" parameter that includes all variables captured by closure.

Support time-related primitives

We should be able to query the current time and time a scheduled variable was last-updated. In fact, ssm-runtime already supports this, and codegen even has bindings for it. I tried hacking this in with 072fb9a (for now and last, whose syntax is prefix @@) and 152d994 (adding in Time as an alias for U64).

But there are several problems, leading me to remove these features from the scanner in 11752b8:

Timestamps cause mad leaks. This is because they are 64-bit values and are stored on the heap. Reading them causes allocation, and for some reason they aren't being cleaned up properly.
We can't do any kind of arithmetic or comparisons with time. Arithmetic operators like +, -, <, and == are hard-coded as I32 -> I32 -> I32. To support time arithmetic, we would need them to be aware of U64 somehow. This involves type classes or some other kind of overloading, which we're not ready for. We would also need to do codegen for them differently because we would need to read the value out of the heap using ssm_time_read() rather than ssm_unmarshal().
We can't even "read" the time. An easy hack out of the above issue is to add primitives to read the lower and upper 32 bits out of the 64-bit timestamp, using ssm_time_read(), but we would need syntax for this.

Codegen might not know when to yield

here is an especially tricky case that our current codegen scheme doesn't account for:

f x : Ref Int -> Ref Int -> Int =
  wait x -- could be anything side-effectful
  fun y
    deref x + deref y

g x y : Ref Int -> Ref Int -> Int =
  deref x * deref y

p f g a : (Ref Int -> Ref Int -> Int) -> (Ref Int -> Ref Int -> Int) -> Ref Int -> _ =
  par
    f a -- needs yield
    g a -- needn't yield
  
q =
  let a = new 1
  after sec 2, a <- 3
  p f g a

The problem is we need to figure out when we need to do a yield (i.e., act->pc = CTR; return; case CTR: ...). We need to yield when we call another function so that the scheduler can switch us to that other execution context. This is both the case for regular (sequential) function applications and parallel function applications (as illustrated in the example above).

The way we currently handle a par statement is that we assume all operands are full function applications. (We similarly assume all sequential function calls end up fully-applied.) Then we always insert a yield point after such function calls.

However, this does not account for partial function applications. In the above example, f is a function that, when partially-applied, does something side-effectful; g is just a regular function. Thus, the caller should yield after applying the former, but not the latter. However, which is which cannot be distinguished at compile-time (the problem is we currently assume it can be): the function p cannot actually distinguish at compile-time whether the two closures given as its arguments require a yield after applying the first argument. We know as programmers because we can see that q calls p with f and g, but this is undecidable in general (e.g., q may decide to flip f and g dependent on some undecidable condition).

This is not yet a problem for our current compiler since our runtime just doesn't even support closures at all, but once it does, we will need to adapt codegen with this consideration in mind.

Add tuple literal syntax

I encountered this issue while working on #58 . Apparently we just don't have any syntax for tuples yet.

This is a pretty easy fix, and should not be too difficult, but will require touching a few files that may conflict with others. Since we can always fake tuples using ADTs, this is not strictly necessary, so I'm deeming this a lower priority compared to getting ADT up and running.

Add syntax for type-related facilities

We need the following:

syntax for ADTs (@EmilySillars)
syntax for type classes definitions, instance definitions, and type class constraints (@leoqiao18)
syntax for type variables (@XijiaoLi)
syntax for record type getters and setters

I tagged some relevant authors here since they will likely find this syntax useful for constructing test cases as they continue to work on the implementation, but @hmontero1205 and I can also help out.

For the first cut, we should just use syntax similar to what Haskell uses, without idiosyncratic keywords like data, newtype, etc., and with support from our lexer's layoutNL system:

type Option a =
  Some a
  None

class Eq a
  eq : a -> a -> Bool

inst Eq Bool
  eq a b = // definition

not (a: a) (b: b): Bool where (Eq a, Eq b) = // definition

Something like that (perhaps people have thoughts?)

As for the record types, we do need to think a little more carefully about how the field accessors are designed to be used, and how that plays with the assignment operation. OCaml and Haskell set an example of constructing functions out of the field names which access the variable, but they each make talking about deeply nested fields awkward in different circumstances. I'd like to avoid this awkwardness from the get-go and try to have something like Haskell's lenses baked into the language syntax (rather than relying on typeclass wizardry to achieve readable syntax).

Desugaring patterns

At the moment, LowerAst just bails out when it finds any pattern that isn't a wildcard or an identifier. So the following two (written in OCaml syntax for brevity and syntax highlighting) are both not possible:

let (p, q) = pqdef in pqscope

let f (x, y) = fdef in fscope

These should desugar to, respectively

let __pq = pqdef in match __pq with (p, q) -> pqscope

let f __xy = match __xy with (x, y) -> fdef in fscope

Then there's the issue of aliases:

let p@q = pqdef in pqscope

let f x@y = fdef in fscope

which should desguar to

let p = pqdef in let q = p in pqscope

let f x = let y = x in fdef in fscope

These may get tricky in the presence of mutual recursion/parallel let-bindings of variables, and so should probably come after #30 .

Just as a note: I'm not sure how best to desugar mutually recursive name aliases:

let rec f@f' x = fdef
    and g@g' x = gdef
    and h@h' x = hdef
in
fghscope

This seems to lead to a combinatorial explosion:

let rec f' x = let g = g' in let h = h' in fdef
    and g' x = let f = f' in let h = h' in gdef
    and h' x = let f = f' in let g = g' in hdef
in
let f = f' in
let g = g' in
let h = h' in
fghscope

But for now, that original example is just not allowed by our syntax (and not representable in our AST either).

Syntax for specifying operator precedence and associativity

Right now, we are lexing and parsing operators like +, -, etc., and then doing some late-stage parsing to assemble OpRegions into proper expression-level applications in AST. But we're doing so based on some hard-coded rules.

We should be exposing some kind of syntax/mechanism for users to control this late-stage parsing, so that they may define their own late-stage parsers for this kind of thing. Basically, something like Haskell's infixl and infixr constructs.

Optimize ADT structs for reuse

Background Info

The current SSLANG object model features ADTs as a union (called ssm_value_t) of either a "packed_val" integer or "heap_ptr" to a struct on the heap. In the latter case, the struct on the heap will be a generic memory management header followed by an array of ssm_value_t fields.

By personalizing the amount of space the memory manager allocates for each ADT instance, and saving the "actual" size of its fields array in the header, we are able to specialize a generic SSLANG object structure for different ADT instances.
For example, given the following adt definition,

type Color
  RGB Int Int Int         
  CMYK Int Int Int Int
  White
  Black
  Sparkly

C implementations of each instance will have an ssm_mm header followed by a different number of fields:

let bg = RGB 192 192 192  
// 
//    .-----------------.              .-----------------.
// bg | xxxxxxx0        | -----------> | struct ssm_mm   | - tag set to RGB; val_count set to 3 for 3 fields
//    .-----------------.              |-----------------|
//                                     | ssm_value_t     | - contains 192 
//                                     |-----------------|
//                                     | ssm_value_t     | - contains 192 
//                                     |-----------------|
//                                     | ssm_value_t     | - contains 192 
//                                     .-----------------.

let ink = Black        
// 
//     .-----------------.
// ink | 00000111        | - tag set to Black and then marshalled
//     .-----------------.

let dot = RGB 204 255 204 
// 
//     .-----------------.             .-----------------.
// dot | xxxxxxx0        | ----------> | struct ssm_mm   | - tag set to RGB; val_count set to 3 for 3 fields
//     .-----------------.             |-----------------|
//                                     | ssm_value_t     | - contains 204 
//                                     |-----------------|
//                                     | ssm_value_t     | - contains 255
//                                     |-----------------|
//                                     | ssm_value_t     | - contains 204
//                                     .-----------------.

Optimization Idea

In this example, an instance of ADT Color gets assigned a value three different times.

//draw (brush : & Color) -> () =
// ...                      // ommitted for brevity

main ( brush : & Color ) -> () =
  brush <- RGB 192 192 192
  draw brush       
  brush <- Black 
  draw brush  
  brush <- RGB 204 255 204
  draw brush
  ()

In our current implementation, a new Color objected is allocated for each heap_ptr assignment:

  acts->__tmp_0 = ssm_new_adt(3U, RGB);                        // allocate new DCon w/ space for 3 fields
  ssm_adt_field(acts->__tmp_0, 0U) = ssm_marshal(192);
  ssm_adt_field(acts->__tmp_0, 1U) = ssm_marshal(192);
  ssm_adt_field(acts->__tmp_0, 2U) = ssm_marshal(192);
  ssm_assign(acts->brush, actg->priority, acts->__tmp_0);
  // draw 
  ssm_assign(acts->brush, actg->priority, ssm_marshal(Black)); //  nullary DCon
  // draw
  acts->__tmp_1 = ssm_new_adt(3U, RGB);                        //  allocate new DCon w/ space for 3 fields
  ssm_adt_field(acts->__tmp_1, 0U) = ssm_marshal(204);
  ssm_adt_field(acts->__tmp_1, 1U) = ssm_marshal(255);
  ssm_adt_field(acts->__tmp_1, 2U) = ssm_marshal(204);
  ssm_assign(acts->brush, actg->priority, acts->__tmp_1);
  // draw

If an ADT's value changes a lot during the course of a program, this pattern could be costly, requiring lots of repetitious deallocations and reallocations. To reduce needless memory freeing and reallocation, we can change the representation of SSLANG ADTs so if an ADT isn't always a packed value, it will be allocated a struct only the first time it's assigned a value; this struct would contain its max number of fields.
Then our generated code could change to something like:

  acts->__tmp_0 = ssm_new_adt(3U, RGB);              //  allocate new DCon w/ space for 3 (max # of) fields
  ssm_adt_field(acts->__tmp_0, 0U) = ssm_marshal(192);
  ssm_adt_field(acts->__tmp_0, 1U) = ssm_marshal(192);
  ssm_adt_field(acts->__tmp_0, 2U) = ssm_marshal(192);
  ssm_assign(acts->brush, actg->priority, acts->__tmp_0);
  // draw
  - change brush's ssm_mm->tag field to Black
  // draw
  - change brush's ssm_mm->tag field to RGB
  //  assign 3 fields
  ssm_adt_field(acts->brush, 0U) = ssm_marshal(204);
  ssm_adt_field(acts->brush, 1U) = ssm_marshal(255);
  ssm_adt_field(acts->brush, 2U) = ssm_marshal(204);
  // draw

Preventing ssm_drop from segfaulting

When the memory manager performs ssm_drop, it relies on val_count to signal when it can stop checking for fields to recurse on.
For each word after the header until val_count is reached, the memory manager checks whether the given word is even. If the word is even, it recurses on this field because it is a pointer to a heap object. If the word is odd, it skips this field because it's a packed value. Once val_count becomes the maximum number of fields in an ADT, it's possible for an ADT representation to contain garbage values in some of its fields if it's a smaller instance. In this situation it becomes possible for the memory manager to recurse on even garbage values and segfault.
By setting all the fields to an odd value before assignment, we prevent the possibility of drop accidentally recursing on even garbage values.

  acts->__tmp_0 = ssm_new_adt(3U, RGB);                    //  allocate new DCon w/ space for 3 (max # of) fields
  ssm_adt_field(acts->__tmp_0, 0U) = 0xffffffff;               //  set to odd value before assigning
  ssm_adt_field(acts->__tmp_0, 1U) = 0xffffffff;               //  set to odd value before assigning
  ssm_adt_field(acts->__tmp_0, 2U) = 0xffffffff;               //  set to odd value before assigning
  ssm_adt_field(acts->__tmp_0, 0U) = ssm_marshal(192);
  ssm_adt_field(acts->__tmp_0, 1U) = ssm_marshal(192);
  ssm_adt_field(acts->__tmp_0, 2U) = ssm_marshal(192);
  ssm_assign(acts->brush, actg->priority, acts->__tmp_0);
  - change brush's ssm_mm->tag field to Black
  ssm_adt_field(acts->brush, 0U) = 0xffffffff;               //  set to odd value to eliminate garbage
  ssm_adt_field(acts->brush, 1U) = 0xffffffff;               //  set to odd value to eliminate garbage
  ssm_adt_field(acts->brush, 2U) = 0xffffffff;               //  set to odd value to eliminate garbage
  - change brush's ssm_mm->tag field to RGB
  //  assign 3 fields
  ssm_adt_field(acts->brush, 0U) = 0xffffffff;               //  set to odd value to before assigning
  ssm_adt_field(acts->brush, 1U) = 0xffffffff;               //  set to odd value to before assigning
  ssm_adt_field(acts->brush, 2U) = 0xffffffff;               //  set to odd value to before assigning
  ssm_adt_field(acts->brush, 0U) = ssm_marshal(204);
  ssm_adt_field(acts->brush, 1U) = ssm_marshal(255);
  ssm_adt_field(acts->brush, 2U) = ssm_marshal(204);

Operators vs type operators

I began to run some of the old unit tests, and found that the scanner test was failing because & was being lexed as TAmpersand rather than TOp "&".

In #14, I hard-coded the scanner test to expect TAmpersand instead, but this puts more of a strain on the parser to account for all these special operators as edge cases. At some point, we should enumerate all the type operators we expect to use, and make sure we are treating them consistently in the front-end.

I see three solutions:

A general, thorough solution is to just allow full-blast type-level operators, and just throw them to the same ParseOperators logic we're already using for expressions. Frankly, I think this is the most elegant, since most of the prior work on that can be reused. Then we can expose this as a feature for those maverick users who want their own notation for types. Of course, this is blocking on #15.
We could have the parser explicitly also look for the TOp "&" token when parsing type expressions, or for TAmpersand when parsing expressions. But this doesn't scale well if we add more operators, not to mention it's not very elegant, so I'm hoping we don't adopt this solution. But this can be done immediately and temporarily.
We could rethink our grammar for type operators and designate a complentary set of tokens that can be used as type operators vs expression operators. If designed well, this could be beneficial to the reader, the same way that it is beneficial that types and data constructors are always upper case and type variables and variables are always lower case in Haskell; but if designed poorly, this could lead us toward something inscrutible.

As a side note, we should decide on what notation to use for the "reference" type operator (a pre-requisite of solution 3). I'm in favor of using a type operator rather than something alphanumeric since it's a built-in type hard-wired into our programming model, and will appear everywhere. For this, I default to the prefix & symbol but I can be persuaded otherwise.

Constructs for non-sequential control flow

I just realized that so far we haven't added return into the parser yet... It's not totally clear cut to me how this should be done, so I wanted to write down a few thoughts/propose some ideas here.

I think we should make return a layout keyword, and parse for 'return' '{' expr '}'. This looks odd at first, and is not what I had in mind, but here is why I don't like the alternatives where it isn't a layout keyword, and return expressions are strictly concatenative:

If return is at the same precedence as application (juxtaposition), writing return f x is a type error, so if we are forced to write return (f x) or return $ f x. I think it should be at a higher precedence compared to other atomic expressions because something like f return shouldn't make any sense.
If return is magically at a higher precedence than application (juxtaposition), something like return return a b becomes ambiguous.

What are people's thoughts?

Monomorphization

Codegen expects a monomorphized type system, so at the moment there's a monomoprhize :: Expr Poly.Type -> Expr Flat.Type function that's doing that. But all it's really doing is trivially mapping types that already happen to be nullary polytypes, and throwing out anything that involves any kind of type application.

A proper monomorphization pass will also need to type-specialize polymorphic variables/functions, on an as-needed basis.

We also need to come up with a sane type name mangling scheme, to convert type applications like Option Int into Option_Int or something of the sort. We need to convince ourselves that such a scheme will never introduce name conflicts where there were none previously.

Prettify pretty printer

The pretty printer output is not pretty---it's time to write an actual pretty printer for the IR.

Add builtin booleans

Right now, conditional statements use I32 conditions, and boolean operators like == and <= return I32 results, just like their C counterparts. We should really commit to some kind of builtin Boolean type that avoids this hack.

blank line or comment at top of .ssl file fails

When writing a SSLANG source program, if I leave a blank line at the top of the file or try to make a comment the first line in the file, the compiler throws a parsing error. Below is an example of this behavior.
Example program:

// a comment here
type Tree
  TwoChildren Tree Tree  
  Leaf                   
  OneChild Int Tree

main ( cout : & Int ) -> () =
  let d = Leaf 
  let zzz = match d 
                  Leaf = 65
                  _ = 66

// correct output should be 65 'A'
  after 10 , (cout : & Int) <- zzz
  wait (cout : & Int)

Error ouput from runtests.log:

###### Testing comment--first-line
stack exec sslc -- tests/comment--first-line.ssl > out/comment--first-line.c
ParseError (ErrorMsg "at Token (Span {tokPos = 18, tokLen = 0, tokLine = 2, tokCol = 1},TDBar)")
###### FAILED

This is likely a very simple fix, but I haven't had time to investigate it deeply yet so I am documenting it here for now.

Sequentialize parallel let-bindings

The user may write:

let x = xdef
    y = ydef
xyscope

for brevity. Of course, this isn't a lazy language, so xdef cannot actually reference y and ydef cannot actually reference x, but so long as that is the case, this is semantically equivalent to writing:

let x = xdef
let y = ydef
xyscope

Indeed, LowerAst will only work on the second form but not the first. We need to implement a desugaring pass that does this for us.

Furthermore, something like:

let f x = if x { h x } else { () }
    g x = if x { h x } else { () }
    h x = if x { g x } else { () }
fghscope

can also be partially sequentialized:

let g x = if x { h x } else { () }
    h x = if x { g x } else { () }
let f x = if x { h x } else { () }
fghscope

This may make type-checking/inference much easier, depending on how mutual recursion is deal with, so this is something we would want done too.

In summary:

sequentialize parallel variable bindings
best-effort sequentialize parallel function bindings

match indentation error

The following snippet code of code should be valid in SSLANG, but throws an indentation error instead.

  let zzz = 
    match d 
          Leaf = 65
          _ = 66

I have only been able to get match statements to compile when the word "match" is on the same line as the assignment operator. Writing matches like this takes more horizontal space and is especially inconvenient when writing nested matches.

  let zzz = match d 
                  Leaf = 65
                  _ = 66

Here is a full example program (with the version of matching that works commented out),

type Tree
  TwoChildren Tree Tree  
  Leaf                   
  OneChild Int Tree

//main ( cout : & Int ) -> () =
//  let d = Leaf 
//  let zzz = match d 
//                  Leaf = 65
//                  _ = 66

main ( cout : & Int ) -> () =
  let d = Leaf 
  let zzz = 
    match d 
          Leaf = 65
          _ = 66

// correct output should be 65 'A'
  after 10 , (cout : & Int) <- zzz
  wait (cout : & Int)

and the error produced in runtests.log:

###### Testing match-indentation
stack exec sslc -- tests/match-indentation.ssl > out/match-indentation.c
ParseError (ErrorMsg "Cannot start block at lower indentation than before")
###### FAILED

Support break statements

The code and syntax for this is actually already there ("completed" in 152d994), but I took it out in 669814e.

The problem is break completely breaks (ROFL) reference counting, because break ing out of a loop forgoes any inserted drops. The fix might be to rewrite how codegen handles loops, or to rewrite how codegen handles drops. We may even need to rewrite most of codegen.

multiple argument functions fail on AST lowering pass

When writing a SSLANG source program, if I define a function with multiple arguments, the lowerAst IR pass throws an error. Below is an example of this behavior.
Example program:

type Color
  RGB Int Int Int  
  CMYK Int Int Int Int
  White
  Black
  Sparkly


// main (brush : & Color)-> () =

main ( cout : &Int , brush : &Color)-> () = // fails
  brush <- RGB 192 192 192
  // draw brush       
  brush <- Black
  // draw brush
  brush <- RGB 204 255 204
  // draw  brush
  ()

Error ouput from runtests.log:

###### Testing colors
stack exec sslc -- tests/colors.ssl > out/colors.c
sslc: pattern should be desguared into pattern match
CallStack (from HasCallStack):
  error, called at src/IR/LowerAst.hs:225:18 in sslang-0.1.0.0-4TMoE2od9JzCqrGeolufpl:IR.LowerAst
###### FAILED

CI takes a really long time to run

For example, in this run, it takes 2 whole minutes to set up Stack: https://github.com/ssm-lang/sslang/runs/4577980018?check_suite_focus=true . I'm not sure why it needs to take this long (it takes about a quarter of the time to build our compiler, rebuild our compiler, and run all tests). Most of the time is apparently spent processing 155600 (!) cabal files.

Not very important at the moment, but it would be nice to figure out why it's taking so long.

Ar does not work on macOS

@maxhhelman ran into this issue when we were pair programming. If I remember correctly, ar on macOS does not recognize the -U flag (and I think Zach ran across this issue before too).

It looks like -U is the default behavior anyway: https://www.man7.org/linux/man-pages/man1/ar.1.html

We should adjust the ARFLAGS in the ssm-runtime Makefile, maybe just remove that flag if it's not actually necessary, so that we can support building on macOS.

Fix Haddock errors and set up hook to automatically deploy doc site

We should always aim to keep documentation presentable on the main branch. We should have a hook that does the following:

Automatically rebuilds the docs using stack haddock, and deploy it to a docs branch that we can serve as a navigable webiste using GitHub pages.
Automatically try to build the docs during a PR, and block the PR if documentation cannot be built (e.g., there are missing links, haddock cannot parse, etc.).