Coder Social home page Coder Social logo

dibyendumajumdar / ravi-compiler Goto Github PK

View Code? Open in Web Editor NEW
64.0 5.0 5.0 1.9 MB

Parser and compiler for Ravi and Lua

License: Other

CMake 0.39% C 96.67% Lua 2.69% Shell 0.16% Dockerfile 0.09%
lua parser lexer c compiler compiler-construction ast abstract-syntax-tree intermediate-representation

ravi-compiler's Introduction

build

ravi-compiler

A compiler for Ravi and Lua that processes Lua/Ravi source code and generates C code.

Goals

  • Create a re-usable Lua/Ravi lexer/parser.
  • Define conventional linear intermediate representation (IR).
  • Generate C code from Lua/Ravi source.
  • Support Ahead of time (AOT) compilation.
  • The generated code can be executed by Ravi. Since the generated code depends on various VM structures it is not binary compatible with Lua, but in theory one can modify the code relatively easily to work with Lua 5.3. Lua 5.4 has modifications to the call stack used by Lua which makes it harder to port to.

Modules

The compiler library consists of distinct modules:

  • lexer (alpha) - responsible for tokenizing an input buffer.
  • parser (alpha) - responsible for generating abstract syntax tree (AST).
  • AST lowerer (alpha) - currently transforms generic for loops to while loops.
  • typechecker (alpha) - responsible for assigning types to variables when possible.
  • AST simplifier (alpha) - responsible for performing some initial simplifications such as constant folding, and string concatenation re-writing.
  • linearizer (alpha) - responsible for constructing a linear IR representation of the AST.
  • optimizer (WIP) - responsible for improving the code, this doesn't do much right now.
  • codegenerator (alpha) - responsible for generating C code. Each input is translated to a standalone C file that can be compiled using any C compiler. Ravi can compile this at runtime using MIR C JIT compiler. For AOT compilation, dynamic library needs to be created and a special loader needs to be used that treats the shared library as a compiled version of Lua chunk. The generated C code doesn't use the Lua C call api, as it is designed to look like Lua code.

Status

Limitations

  • No support for var args

Change Log

  • 12-nov-2022 More bug fixes, e.g. repeat statement was incorrectly compiled
  • 10-July-2022 Many bug fixes to do with how the virtual registers are allocated
  • 12-Oct-2021 Initial proof of concept for new embedded C syntax
  • 22-Jun-2021 Increased coverage of Lua syntax to cover string concatenations and generic for loops.
  • 17-Jan-2021 The code is now C++ compliant so we can compile everything in C++ or C.
  • 28-Nov-2020 We can generate code for a large subset of Ravi language and run the compiled code from Ravi.
  • 01-Dec-2020 The generated code is now also suitable for AOT compilation but requires special loading facility in Ravi.

LICENSE

The project is available under MIT license. It includes code from chibicc copyrighted by Rui Ueyama.

Documentation

Documentation is coming soon.

For now you can look at following:

Why

Lua's inbuilt parser and code generator is a work of art, very compact and low overhead but extremely fast. It uses minimal memory and produces bytecodes as it parses the source code (single pass compiler). This is great for Lua and Ravi given the use cases of these languages, but makes the parser and code generator hard to understand, play with, or reuse in tools such as IDEs. It also makes it harder to perform any advanced type checking or performance optimizations.

This project will create a new parser and code generator that is not a replacement for the default one in Lua/Ravi but can be used for more specialised code generation, as well as as a means of understanding how the parser and code generator works.

Technology

This project is written in C for maximum portability like Lua.

Building

You will need CMake 3.12 or greater. The build steps are fairly simple on Linux:

mkdir build
cd build
cmake ..
make 

Try it out!

The compiler can be run using the trun command line utility. Example:

trun "return 'hello'"

To see the C output, try:

trun --gen-C "print 'hello world'"

Testing

At the moment we have a simple test driver programs: trun. The driver takes a string or file input which must be a valid Lua/Ravi chunk of code, and outputs the AST, the result of type checking, linear IR output if supported, and the CFG as a dot file. Example of the output can be found in the tests/expected folder.

Suppose trun was built in build folder then you can run the tests as follows:

cd tests && sh truntests.sh ../build/trun

The test script compares the output to the expected output. Any difference will cause the test script to fail.

ravi-compiler's People

Contributors

dibyendumajumdar avatar mingodad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ravi-compiler's Issues

Several problems detected by valgrind

valgrind --track-origins=yes ./tastwalk 
==6721== Memcheck, a memory error detector
==6721== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6721== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==6721== Command: ./tastwalk
==6721== 
No code to process
==6721== Use of uninitialised value of size 8
==6721==    at 0x4E53440: raviX_destroy_compiler (parser.c:1754)
==6721==    by 0x10BD66: main (tastwalk.c:349)
==6721==  Uninitialised value was created by a stack allocation
==6721==    at 0x10BC24: main (tastwalk.c:321)
==6721== 
==6721== Use of uninitialised value of size 8
==6721==    at 0x4E53456: raviX_destroy_compiler (parser.c:1756)
==6721==    by 0x10BD66: main (tastwalk.c:349)
==6721==  Uninitialised value was created by a stack allocation
==6721==    at 0x10BC24: main (tastwalk.c:321)
==6721== 
==6721== Use of uninitialised value of size 8
==6721==    at 0x4E53466: raviX_destroy_compiler (parser.c:1757)
==6721==    by 0x10BD66: main (tastwalk.c:349)
==6721==  Uninitialised value was created by a stack allocation
==6721==    at 0x10BC24: main (tastwalk.c:321)
==6721== 
==6721== Invalid read of size 8
==6721==    at 0x4E56C53: raviX_destroy_linearizer (linearizer.c:133)
==6721==    by 0x4E53474: raviX_destroy_compiler (parser.c:1757)
==6721==    by 0x10BD66: main (tastwalk.c:349)
==6721==  Address 0x5f6b6c617700703f is not stack'd, malloc'd or (recently) free'd
==6721== 
==6721== 
==6721== Process terminating with default action of signal 11 (SIGSEGV)
==6721==  General Protection Fault
==6721==    at 0x4E56C53: raviX_destroy_linearizer (linearizer.c:133)
==6721==    by 0x4E53474: raviX_destroy_compiler (parser.c:1757)
==6721==    by 0x10BD66: main (tastwalk.c:349)
==6721== 
==6721== HEAP SUMMARY:
==6721==     in use at exit: 0 bytes in 0 blocks
==6721==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==6721== 
==6721== All heap blocks were freed -- no leaks are possible
==6721== 
==6721== For lists of detected and suppressed errors, rerun with: -s
==6721== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
/home/mingo/bin/myvalgrind: line 2:  6721 Segmentation fault      (core dumped) valgrind $*
valgrind --track-origins=yes ./tgraph 
==6747== Memcheck, a memory error detector
==6747== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6747== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==6747== Command: ./tgraph
==6747== 
==6747== Conditional jump or move depends on uninitialised value(s)
==6747==    at 0x4E4ACD7: raviX_add_node (graph.c:202)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747==  Uninitialised value was created by a heap allocation
==6747==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==6747==    by 0x4E4486C: raviX_malloc (allocate.c:48)
==6747==    by 0x4E4499F: blob_alloc (allocate.c:85)
==6747==    by 0x4E44B89: raviX_allocator_allocate (allocate.c:139)
==6747==    by 0x4E4ACC9: raviX_add_node (graph.c:201)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747== 
==6747== Conditional jump or move depends on uninitialised value(s)
==6747==    at 0x4E4AD01: raviX_add_node (graph.c:203)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747==  Uninitialised value was created by a heap allocation
==6747==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==6747==    by 0x4E4486C: raviX_malloc (allocate.c:48)
==6747==    by 0x4E4499F: blob_alloc (allocate.c:85)
==6747==    by 0x4E44B89: raviX_allocator_allocate (allocate.c:139)
==6747==    by 0x4E4ACC9: raviX_add_node (graph.c:201)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747== 
==6747== Conditional jump or move depends on uninitialised value(s)
==6747==    at 0x4E4ACD7: raviX_add_node (graph.c:202)
==6747==    by 0x4E4AAAA: raviX_init_graph (graph.c:148)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747==  Uninitialised value was created by a heap allocation
==6747==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==6747==    by 0x4E4486C: raviX_malloc (allocate.c:48)
==6747==    by 0x4E4499F: blob_alloc (allocate.c:85)
==6747==    by 0x4E44B89: raviX_allocator_allocate (allocate.c:139)
==6747==    by 0x4E4ACC9: raviX_add_node (graph.c:201)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747== 
==6747== Conditional jump or move depends on uninitialised value(s)
==6747==    at 0x4E4AD01: raviX_add_node (graph.c:203)
==6747==    by 0x4E4AAAA: raviX_init_graph (graph.c:148)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747==  Uninitialised value was created by a heap allocation
==6747==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==6747==    by 0x4E4486C: raviX_malloc (allocate.c:48)
==6747==    by 0x4E4499F: blob_alloc (allocate.c:85)
==6747==    by 0x4E44B89: raviX_allocator_allocate (allocate.c:139)
==6747==    by 0x4E4ACC9: raviX_add_node (graph.c:201)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747== 
==6747== Conditional jump or move depends on uninitialised value(s)
==6747==    at 0x4E4ACD7: raviX_add_node (graph.c:202)
==6747==    by 0x4E4ADA1: raviX_add_edge (graph.c:215)
==6747==    by 0x108D41: test1 (tgraph.c:31)
==6747==    by 0x109345: main (tgraph.c:146)
==6747==  Uninitialised value was created by a heap allocation
==6747==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==6747==    by 0x4E4486C: raviX_malloc (allocate.c:48)
==6747==    by 0x4E4499F: blob_alloc (allocate.c:85)
==6747==    by 0x4E44B89: raviX_allocator_allocate (allocate.c:139)
==6747==    by 0x4E4ACC9: raviX_add_node (graph.c:201)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747== 
==6747== Conditional jump or move depends on uninitialised value(s)
==6747==    at 0x4E4AD01: raviX_add_node (graph.c:203)
==6747==    by 0x4E4ADA1: raviX_add_edge (graph.c:215)
==6747==    by 0x108D41: test1 (tgraph.c:31)
==6747==    by 0x109345: main (tgraph.c:146)
==6747==  Uninitialised value was created by a heap allocation
==6747==    at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==6747==    by 0x4E4486C: raviX_malloc (allocate.c:48)
==6747==    by 0x4E4499F: blob_alloc (allocate.c:85)
==6747==    by 0x4E44B89: raviX_allocator_allocate (allocate.c:139)
==6747==    by 0x4E4ACC9: raviX_add_node (graph.c:201)
==6747==    by 0x4E4AA99: raviX_init_graph (graph.c:147)
==6747==    by 0x108D27: test1 (tgraph.c:30)
==6747==    by 0x109345: main (tgraph.c:146)
==6747== 
digraph {
L0 -> L1
L1 -> L2
}
...

valgrind --track-origins=yes ./tparse > tparse.log 2>&1
==6783== Memcheck, a memory error detector
==6783== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6783== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==6783== Command: ./tparse
==6783==
==6783== Conditional jump or move depends on uninitialised value(s)
==6783== at 0x4E4906E: raviX_ptrlist_iter_prev (ptrlist.c:107)
==6783== by 0x4E4F944: search_for_variable_in_block (parser.c:210)
==6783== by 0x4E4FCF6: search_for_variable (parser.c:306)
==6783== by 0x4E50005: new_symbol_reference (parser.c:382)
==6783== by 0x4E510E5: parse_primary_expression (parser.c:821)
==6783== by 0x4E5116C: parse_suffixed_expression (parser.c:840)
==6783== by 0x4E528AC: parse_expression_statement (parser.c:1461)
==6783== by 0x4E52CA6: parse_statement (parser.c:1564)
==6783== by 0x4E52CEC: parse_statement_list (parser.c:1578)
==6783== by 0x4E51A4D: parse_block (parser.c:1121)
==6783== by 0x4E51DE2: parse_forbody (parser.c:1223)
==6783== by 0x4E51F32: parse_fornum_statement (parser.c:1244)
==6783== Uninitialised value was created by a heap allocation
==6783== at 0x4C31E83: malloc (vg_replace_malloc.c:307)
==6783== by 0x4E4486C: raviX_malloc (allocate.c:48)
==6783== by 0x4E4499F: blob_alloc (allocate.c:85)
==6783== by 0x4E44B89: raviX_allocator_allocate (allocate.c:139)
==6783== by 0x4E49471: raviX_ptrlist_add (ptrlist.c:192)
==6783== by 0x4E4F3C9: add_symbol (parser.c:52)
==6783== by 0x4E4FC0A: add_upvalue_in_function (parser.c:281)
==6783== by 0x4E4FE32: add_upvalue_in_levels_upto (parser.c:335)
==6783== by 0x4E4FF34: add_upvalue_for_ENV (parser.c:360)
==6783== by 0x4E53071: parse_lua_chunk (parser.c:1666)
==6783== by 0x4E531D6: raviX_parse (parser.c:1694)
==6783== by 0x109068: main (tparse.c:64)
==6783==
==6783== Conditional jump or move depends on uninitialised value(s)
==6783== at 0x4E48F20: raviX_ptrlist_iter_next (ptrlist.c:71)
==6783== by 0x4E4F9E0: search_upvalue_in_function (parser.c:232)
...



Add special upvalue for _ENV

Our upvalue implementation so far assumes that an upvalue references a local symbol in an enclosing function. However _ENV does not fall into this category as there is no local variable of that name in the main chunk. So we need to extend the concept of upvalue to support this special case.

Liveness analysis

It is not clear yet how useful this is, but it is fun learning how to do it.

Make the Ravi cast tokens keywords

The Ravi cast operators are really keywords but the lexer treats them as identifiers due to the way we parse the input.

We are specifically concerned with tokens TOK_TO_INTEGER to TOK_TO_CLOSURE.

Note that related to this we also have an issue with recognizing keywords - we are unnecessarily considering some tokens as keywords.

Dubious assumption in lexer.c

In lexer.c::raviX_create_string there is a test to see if we passed the reserved words:

for (int i = 0; i < ARRAY_SIZE(luaX_tokens); i++) {
			if (luaX_tokens[i][0] == '<')
				break; // We have gone past reserved keywords
			if (strcmp(luaX_tokens[i], s) == 0) {
				new_string->reserved = i; /* save index of the keyword */
				break;
			}
		}

And this is the actual keyword list:

static const char *const luaX_tokens[] = {
    "and",	 "break",  "do",      "else",	  "elseif", "end",	"false",     "for",	"function",
    "goto",	 "if",	   "in",      "local",	  "defer",  "nil",	"not",	     "or",	"repeat",
    "return",	 "then",   "true",    "until",	  "while",  "//",	"..",	     "...",	"==",
    ">=",	 "<=",	   "~=",      "<<",	  ">>",	    "::",	"@integer",  "@number", "@integer[]",
    "@number[]", "@table", "@string", "@closure", "<eof>",  "<number>", "<integer>", "<name>",	"<string>"};

The test if (luaX_tokens[i][0] == '<') doesn't seem to be correct probably if (luaX_tokens[i][0] == '/') is what we want ?

Coding convention change

  • Rename structs to use CamelCase names
  • Use typedefs for struct names
  • Ensure all public functions have raviX_ prefix

Implement generic for statement

From Lua 5.3 manual:

The generic for statement works over functions, called iterators. On each iteration, the iterator function is called to produce a new value, stopping when this new value is nil. The generic for loop has the following syntax:

	stat ::= for namelist in explist do block end
	namelist ::= Name {‘,’ Name}

A for statement like

     for var_1, ···, var_n in explist do block end

is equivalent to the code:

     do
       local f, s, var = explist
       while true do
         local var_1, ···, var_n = f(s, var)
         if var_1 == nil then break end
         var = var_1
         block
       end
     end

Note the following:

  • explist is evaluated only once. Its results are an iterator function, a state, and an initial value for the first iterator variable.
  • f, s, and var are invisible variables. The names are here for explanatory purposes only.
  • You can use break to exit a for loop.
  • The loop variables var_i are local to the loop; you cannot use their values after the for ends. If you need these values, then assign them to other variables before breaking or exiting the loop.

Add line number info to the AST

Currently we do not store the line numbers in the AST which means we cannot report errors with the line number information

Implement escape analysis

For a start if a local variable is referenced in an up-value then we can mark is escaped.
It would mean that this variable needs to be present on Lua stack when functions are called.

String concatenation expressions need special treatment in the parser

Lua tries to optimize string concatenations by performing them in batch.

Our current parser AST expresses string concatenation as regular binary op - but this means that we process each op one by one.
We need to add an AST Node that expresses string concatenations as an expression list. We can do this in an AST transformation step.

Mixing use of malloc/calloc and raviX_allocator_allocate

Looking through the code I can see that there some places where there is direct call to malloc/calloc instead of raviX_allocator_allocate and in some of then there is no check of the result.

Usages of Function malloc(size_t): [3 occurrences]
ravi-compiler
allocate.c
    48:  ptr = malloc(size);
hash_table.c
  112:  ht = (HashTable *) malloc(sizeof(*ht));
set.c
  112:  set = (Set *) malloc(sizeof(*set));
Usages of Function calloc(size_t,size_t): [15 occurrences]
ravi-compiler
bitset.c
    64:  bm->varr = (bitset_el_t *) calloc(bm->size, sizeof(bitset_el_t));
df_liveness.c
    54:  struct liveness_info *liveness_info = (struct liveness_info *)calloc(1, sizeof(struct liveness_info));
dominator.c
    62:  DominatorTree *state = (DominatorTree *)calloc(1, sizeof(DominatorTree));
    64:  state->IDOM = (GraphNode **)calloc(state->N, sizeof(GraphNode *));
graph.c
  142:  Graph *g = (Graph *)calloc(1, sizeof(Graph));
  384:  GraphNode **nodes = (GraphNode **) calloc(N, sizeof(GraphNode *));
hash_table.c
  122:  ht->table = (HashEntry *) calloc(ht->size, sizeof(*ht->table));
  215:  table = (HashEntry *) calloc(hash_sizes[new_size_index].size, sizeof(*ht->table));
lexer.c
  276:  LexerState *ls = (LexerState *)calloc(1, sizeof(LexerState));
linearizer.c
  109:  LinearizerState *linearizer = (LinearizerState *)calloc(1, sizeof(LinearizerState));
membuf.c
    45:  mb->buf = (char *)calloc(1, initial_size);
parser.c
1720:  CompilerState *container = (CompilerState *)calloc(1, sizeof(CompilerState));
set.c
  122:  set->table = (SetEntry *) calloc(set->size, sizeof(*set->table));
  219:  table = (SetEntry *) calloc(hash_sizes[new_size_index].size, sizeof(*set->table));
tcommon.c
  114:  char *buffer = (char *)calloc(1, len + 10);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.