Coder Social home page Coder Social logo

elvm's Introduction

ELVM Compiler Infrastructure

Build Status

ELVM is similar to LLVM but dedicated to Esoteric Languages. This project consists of two components - frontend and backend. Currently, the only frontend we have is a modified version of 8cc. The modified 8cc translates C code to an internal representation format called ELVM IR (EIR). Unlike LLVM bitcode, EIR is designed to be extremely simple, so there's more chance we can write a translator from EIR to an esoteric language.

Currently, there are 60 backends:

  1. Aheui
  2. Awk (by @dubek)
  3. Bash
  4. Befunge
  5. Binary Lambda Calculus (by @woodrush)
  6. Brainfuck
  7. C
  8. C++14 constexpr (compile-time) (by @kw-udon)
  9. C++ Template Metaprogramming (compile-time) (by @kw-udon) (WIP)
  10. C# (by @masaedw)
  11. C-INTERCAL
  12. CMake (by @ooxi)
  13. CommonLisp (by @youz)
  14. Conway's Game of Life (via QFTASM) (by @woodrush)
  15. Crystal (compile-time) (by @MakeNowJust)
  16. Emacs Lisp
  17. F# (by @masaedw)
  18. Forth (by @dubek)
  19. Fortran (by @samcoppini)
  20. Go (by @shogo82148)
  21. Go text/template (Gomplate) (by @Syuparn)
  22. Grass (by @woodrush)
  23. HeLL (by @esoteric-programmer)
  24. J (by @dubek)
  25. Java
  26. JavaScript
  27. Kinx (by @Kray-G)
  28. Lambda calculus (by @woodrush)
  29. Lazy K (by @woodrush)
  30. LLVM IR (by @retrage)
  31. LOLCODE (by @gamerk)
  32. Lua (by @retrage)
  33. Octave (by @inaniwa3)
  34. Perl5 (by @mackee)
  35. PHP (by @zonuexe)
  36. Piet
  37. Python
  38. Ruby
  39. Scheme syntax-rules (by @zeptometer)
  40. Scratch3.0 (by @algon-320)
  41. SQLite3 (by @youz)
  42. SUBLEQ (by @gamerk)
  43. Swift (by @kwakasa)
  44. Tcl (by @dubek)
  45. TeX (by @hak7a3)
  46. TensorFlow (WIP)
  47. Turing machine (by @ND-CSE-30151)
  48. Unlambda (by @irori)
  49. Universal Lambda (by @woodrush)
  50. Vim script (by @rhysd)
  51. WebAssembly (by @dubek)
  52. WebAssembly System Interface (by @sanemat)
  53. Whirl by (@samcoppini)
  54. W-Machine by (@jcande)
  55. Whitespace
  56. arm-linux (by @irori)
  57. i386-linux
  58. sed

The above list contains languages which are known to be difficult to program in, but with ELVM, you can create programs in such languages. You can easily create Brainfuck programs by writing C code for example. One of interesting testcases ELVM has is a tiny Lisp interpreter. The all above language backends are passing the test, which means you can run Lisp on the above languages.

Moreover, 8cc and ELVM themselves are written in C. So we can run a C compiler written in the above languages to compile the ELVM's compiler toolchain itself, though such compilation takes long time in some esoteric languages.

A demo site

http://shinh.skr.jp/elvm/8cc.js.html

As written, ELVM toolchain itself runs on all supported language backends. The above demo runs ELVM toolchain on JavaScript (thus slow).

Example big programs

ELVM internals

ELVM IR

  • Harvard architecture, not Neumann (allowing self-modifying code is hard)
  • 6 registers: A, B, C, D, SP, and BP
  • Ops: mov, add, sub, load, store, setcc, jcc, putc, getc, and exit
  • Psuedo ops: .text, .data, .long, and .string
  • mul/div/mod are implemented by _builtin*
  • No bit operations
  • No floating point arithmetic
  • sizeof(char) == sizeof(int) == sizeof(void*) == 1
  • The word-size is backend dependent, but most backend uses 24bit words
  • A single programming counter may contain multiple operations

See ELVM.md for more detail.

Directories

shinh/8cc's eir branch is the frontend C compiler.

ir/ directory has a parser and an interpreter of ELVM IR. ELVM IR has

target/ directory has backend implementations. Code in this directory uses the IR parser to generate backend code.

libc/ directory has an incomplete libc implementation which is necessary to run tests.

Notes on language backends

Brainfuck

Running a Lisp interpreter on Brainfuck was the first motivation of this project (bflisp). ELVM IR is designed for Brainfuck but it turned out such a simple IR could be suitable for other esoteric languages.

As Brainfuck is slow, this project contains a Brainfuck interpreter/compiler in tools/bfopt.cc. You can also use other optimized Brainfuck implementations such as tritium. Note you need implementations with 8bit cells. For tritium, you need to specify `-b' flag.

Unlambda

This backend was contributed by @irori. See also 8cc.unl.

This backend is tested with @irori's interpreter. tools/rununl.sh automatically downloads it.

C-INTERCAL

This backend uses 16bit registers and address space, though ELVM's standard is 24bit. Due to the lack of address space, you cannot compile large C programs using 8cc on C-INTERCAL.

This backend won't be tested by default because C-INTERCAL is slow. Use

$ CINT=1 make i

to run them. Note you may need to adjust tools/runi.sh.

You can make faster executables by doing something like

$ cp out/fizzbuzz.c.eir.i fizzbuzz.i && ick fizzbuzz.i
$ ./fizzbuzz

But compilation takes much more time as it uses gcc instead of tcc.

Piet

This backend also has 16bit address space. There's the same limitation as C-INTERCAL's.

This backend won't be tested by default because npiet is slow. Use

$ PIET=1 make piet

to run them.

Befunge

BefLisp, which translates LLVM bitcode to Befunge, has very similar code. The interpreter, tools/befunge.cc is mostly Befunge-93, but its address space is extended to make Befunge-93 Turing-complete.

Whitespace

This backend is tested with @koturn's Whitespace implementation.

Emacs Lisp

This backend is somewhat more interesting than other non-esoteric backends. You can run a C compiler on Emacs:

  • M-x load-file tools/elvm.el
  • open test/putchar.c (or write C code without #include)
  • M-x 8cc
  • Now you'll see ELVM IR. You need to prepend a backend name (`el' for example) as the first line.
  • M-x elc
  • M-x eval-buffer
  • M-x elvm-main

Vim script

This backend was contributed by @rhysd. You can run a C compiler on Vim:

  • Open test/hello.c (or write your C code)
  • :source /path/to/out/8cc.vim
  • Now you can see ELVM IR in the buffer
  • Please prepend a backend name (vim for Vim) to the first line
  • :source /path/to/out/elc.vim
  • You can see Vim script code as the compilation result in current buffer
  • You can :source to run the code

You can find more descriptions and released vim script in 8cc.vim.

TeX

This backend was contributed by @hak7a3. See also 8cc.tex.

C++14 constexpr (compile-time)

This backend was contributed by @kw-udon. You can find more descriptions in constexpr-8cc.

sed

This backend is very slow so only limited tests run by default. You can run them by

$ FULL=1 make sed

but it could take years to run all tests. I believe C compiler in sed works, but I haven't confirmed it's working yet. You can try Lisp interpreter instead:

$ FULL=1 make out/lisp.c.eir.sed.out.diff
$ echo '(+ 4 3)' | time sed -n -f out/lisp.c.eir.sed

This backend should support both GNU sed and BSD sed, so this backend is more portable than sedlisp, though much slower. Also note, due to limitation of BSD sed, programs cannot output non-ASCII characters and NUL.

HeLL

This backend was contributed by @esoteric-programmer. HeLL is an assembly language for Malbolge and Malbolge Unshackled. Use LMFAO to build the Malbolge Unshackled program from HeLL. This backend won't be tested by default because Malbolge Unshackled is extremely slow. Use

$ HELL=1 make hell

to run them. Note you may need to adjust tools/runhell.sh.

This backend does not support all 8-bit characters on I/O, because I/O of Malbolge Unshackled uses Unicode codepoints instead of single bytes in getc/putc calls. Further, the Malbolge Unshackled interpreter automatically converts newlines read from stdin, which cannot be revert in a platform independent way. The backend reverts/converts newlines from input to Linux encoding and applies modulo 256 operations to all input and output, but it cannot compensate the issues this way. You should limit I/O to ASCII characters in order to avoid unexpected behaviour or crashes.

This backend may be replaced by a Malbolge Unshackled backend in the future.

TensorFlow

Thanks to control flow operations such as tf.while_loop and tf.cond, a TensorFlow's graph is Turing complete. This backend translates EIR to a Python code which constructs a graph which is equivalent to the source EIR. This backend is very slow and uses a huge amount of memory. I've never seen 8cc.c.eir.tf works, but lisp.c.eir.tf does work. You can test this backend by

$ TF=1 make tf

TODO: Reduce the size of the graph and run 8cc

Scratch 3.0

Scratch is a visual programming language.

Internally, a Scratch program consists of a JSON that represent the program and some resources such as images or sounds. They are zip-archived and you can import/export them from project page (Create new one from here).

You can use tools/gen_scratch_sb3.sh to generate complete project files from output of this backend, and tools/run_scratch.js to execute programs from command line (npm 'scratch-vm' package is required).

You can try "fizzbuzz_fast" sample from here.

Example (for test/basic.eir)

First, generate scratch project.

$ ./out/elc -scratch3 test/basic.eir > basic.scratch3
$ ./tools/gen_scratch_sb3.sh basic.scratch3
$ ls basic.scratch3.sb3
basic.scratch3.sb3
Execute it from Web browser
  1. Visit https://scratch.mit.edu/projects/editor.
  2. Click a menu item: "File".
  3. Click "Load from your computer".
  4. Select and upload the generated project file: basic.scratch3.sb3.
  5. Wait until the project is loaded. (It takes a long time for a hevy project.)
  6. Click the "Green Flag"

From the Web editor, to input special characters (LF, EOF, etc.) you have to input them explicitly by following:

special character representation
LF \n
EOF \0
other character with codepoint XXX (decimal) \dXXX

Note that: the escape character is (U+FF3C) not \.

For normal ASCII characters, you can just put them into the input field.

Execute it from command line
  1. First install the npm package "scratch-vm" under the tools directory :
$ cd tools
$ npm install scratch-vm
  1. Run it with tools/run_scratch.js:
$ echo -n '' | nodejs ./run_scratch.js ../basic.scratch3.sb3
!!@X

Conway's Game of Life

This backend was contributed by @woodrush based on QFTASM. See tools/qftasm/README.md for its details. Further implementation details are described in the Lisp in Life project.

Binary Lambda Calculus

This backend was contributed by @woodrush. Implementation details are described in the LambdaVM and lambda-8cc repositories.

The output of this backend is an untyped lambda calculus term written in binary lambda calculus notation. The output program runs on the IOCCC 2012 "Most Functional" interpreter written by @tromp. The program runs on the byte-oriented mode which is the default mode.

This backend outputs a sequence of 0/1s written in ASCII. This bit stream must be packed into a byte stream before passing it to the interpreter, which can be done using tools/packbits.c. Please see tools/runblc.sh for usage details.

This backend is tested with the interpreter uni, a fast implementation of the "Most Functional" interpreter written in C++ by @melvinzhang. This interpreter significantly speeds up the running time of large programs such as 8cc.c. tools/runblc.sh automatically clones and builds uni via tools/runblc.sh when the tests are run.

Lambda Calculus

This backend was contributed by @woodrush. This backend outputs an untyped lambda calculus term written in plain text, such as \x.(x x).

The I/O model used in this backend is identical to the one used in the Binary Lambda Calculus backend. The backend's output program is a lambda calculus term that takes a string as an input and returns a string. Here, strings are encoded into lambda calculus terms using Scott encoding and Church encoding, so the entire computation only consists of the beta-reduction of lambda calculus terms. Further implementation details are described in the LambdaVM and lambda-8cc repositories. Note that the backend's output program is assumed to be evaluated using a lazy evaluation strategy.

This backend is tested with the interpreter uni, written by @melvinzhang. The blc tool written by @tromp is also used to convert plain text lambdas into binary lambda calculus notation, the format accepted by uni. Both tools are automatically cloned and built via tools/runlam.sh when the tests are run.

Lazy K

The Lazy K backend was contributed by @woodrush. Implementation details are described in the LambdaVM and lambda-8cc repositories.

This backend is tested with the Lazy K interpreter lazyk written by @irori. Interactive programs require the -u option which disables standard output buffering, used as lazyk -u [input file]. The interpreter is automatically cloned and built via tools/runlazy.sh when the tests are run.

Universal Lambda

The Universal Lambda backend was contributed by @woodrush. Implementation details are described in the LambdaVM repository.

This backend is tested with the Universal Lambda interpreter clamb written by @irori. Interactive programs require the -u option which disables standard output buffering, used as clamb -u [input file]. The interpreter is automatically cloned and built via tools/runulamb.sh when the tests are run.

The output of this backend is an untyped lambda calculus term written in the binary lambda calculus notation. The output program is written as a sequence of 0/1s in ASCII. The bit stream must be packed into a byte stream before passing it to the interpreter. This can be done using tools/packbits.c. Please see tools/runulamb.sh for usage details.

Grass

The Grass backend was contributed by @woodrush. Implementation details are described in the GrassVM and LambdaVM repositories.

This backend is tested with the Grass interpreter grass.ml, originally written by @ytomino and modified by @youz and @woodrush. The modifications are described in the GrassVM repository.

Future works

I'm interested in

  • adding more backends (e.g., 16bit CPU, Malbolge Unshackled, ...)
  • running more programs (e.g., lua.bf or mruby.bf?)
  • supporting more C features (e.g., bit operations)
  • eliminating unnecessary code in 8cc

Adding a backend shouldn't be extremely difficult. PRs are welcomed!

See also

This project is a sequel of bflisp.

Acknowledgement

I'd like to thank Rui Ueyama for his easy-to-hack compiler and suggesting the basic idea which made this possible.

elvm's People

Contributors

algon-320 avatar davidweichiang avatar dubek avatar earthcomputer avatar esoteric-programmer avatar gamerk avatar hak7a3 avatar irori avatar keiichiw avatar kray-g avatar kwakasa avatar makenowjust avatar masaedw avatar mego avatar minoki avatar nwtgck avatar ooxi avatar retrage avatar rhysd avatar samcoppini avatar sanemat avatar serprex avatar shinh avatar shogo82148 avatar syuparn avatar woodrush avatar youz avatar yshl avatar yuta-aoyagi avatar zonuexe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elvm's Issues

Not work 8cc on macOS

% make
Skip building js due to lack of nodejs
Skip building tex due to lack of tex
Skip building i due to lack of ick
out/8cc -S -I. -Ilibc -Iout out/24_cmp.c -o out/24_cmp.c.eir.tmp && mv out/24_cmp.c.eir.tmp out/24_cmp.c.eir
Usage: 8cc [ -E ][ -a ] [ -h ] <file>


  -I<path>          add to include path
  -E                print preprocessed source code
  -D name           Predefine name as a macro
  -D name=def
  -S                Stop before assembly (default)
  -c                Do not run linker (default)
  -U name           Undefine name
  -fdump-ast        print AST
  -fdump-stack      Print stacktrace
  -fno-dump-source  Do not emit source code as assembly comment
  -o filename       Output to the specified file
  -g                Do nothing at this moment
  -Wall             Enable all warnings
  -Werror           Make all warnings into errors
  -O<number>        Does nothing at this moment
  -m64              Output 64-bit code (default)
  -w                Disable all warnings
  -h                print this help

One of -a, -c, -E or -S must be specified.

make: *** [out/24_cmp.c.eir] Error 1

Grass backend

As suggested by @shinh in #118 (comment), a Grass backend would be exciting.

I'm currently developing a Grass backend at my repo GrassVM, based on LambdaVM. The VM itself is already working, and I've managed to make rot13.w.

The remaining task is to generate the assembly listing and the memory initialization list in ELVM using C. The current GrassVM code uses plant included in @susisu's Grassy toolkit to generate rot13.w, which transpiles an OCaml-like language to Grass.

I currently believe that generating the assembly list faces a tradeoff of either using a lot of the stack or requiring a tremendous amount of code size having lots of W and ws. To optimize the code size, we would use lots of vs to refresh the De Bruijn index once in a while, but that would use a lot of the main environment stack. To optimize the stack size, we would squeeze a lot of code in one v definition clause, but that would increase the De Bruijn index in the sub-environment, probably making the code size increase quadratically with the instruction length. I'll first try if the code size optimized version works. I'll be happy to discuss ideas and collaborate on building this backend.

Build errors with GCC 7

Starting with version 7, GCC generates a warning for case statements without a break, such as line 419ff or ir/ir.c:

  switch (op) {
    case LOAD:
    case STORE:
      if (g_split_basic_block_by_mem) {
        p->pc++;
        p->prev_boundary = true;
      }

Since the default compilation options of the Makefile include -Werror, this prevents building most of the toolchain. The easiest fix might be to add -Wno-implicit-fallthrough to COMMONFLAGS.

Latest commit fails tests

The latest commit, adding backend-specific CLI options, fails tests when building.

[ERROR] parse.c:684: (null): Integer expression expected, but got gv=MCF_CACHE_DEPTH
build.mk:5: recipe for target 'out/elc.c.eir' failed
make: *** [out/elc.c.eir] Error 1

Stupid question, but how do I compile C files to eir?

I tried using 8cc in out/8cc to compile a c file but all I get when I run 8cc is

`Usage: 8cc [ -E ][ -a ] [ -h ]

-I add to include path
-E print preprocessed source code
-D name Predefine name as a macro
-D name=def
-S Stop before assembly (default)
-c Do not run linker (default)
-U name Undefine name
-fdump-ast print AST
-fdump-stack Print stacktrace
-fno-dump-source Do not emit source code as assembly comment
-o filename Output to the specified file
-g Do nothing at this moment
-Wall Enable all warnings
-Werror Make all warnings into errors
-O Does nothing at this moment
-m64 Output 64-bit code (default)
-w Disable all warnings
-h print this help

One of -a, -c, -E or -S must be specified.
`
i still get this message even if i specify '-a'
What am i doing wrong?

Increment/decrement on pointers may be incorrect

The compiler always emits ADD A, 1 for the increment operator. This is incorrect if the operand is a pointer to something with size != 1. Although all primitive values in ELVM have size == 1, structures can still have it different.

bootstraping with ELVM and 8cc.

Sorry for the practical story.
Is it possible to bootstrap 8cc with ELVM and build a compiler structure with 8cc based on ELVM?

Running app gives incorrect output

For following code

#include <stdio.h>

static const int SIZE = 8;
static int* board;
static int count = 0;

int main()
{
    board = malloc(SIZE * sizeof(*board));
    nQueens(0);
    printf("%d", count);
    free(board);
    return 0;
}

int nQueens(int row)
{
    for(int i=0; i<SIZE; i++)
    {
        for(int j=0;j<row;j++)
        {
            if(board[j] == i || row - j == abs(i - board[j]))
            {
                goto OUTER;
            }
        }
        board[row] = i;
        if(row >= SIZE - 1)
        {
            ++count;
            return 0;
        }
        nQueens(row + 1);
        OUTER:
        ;
    }
    return 0;
}

int abs(int input){
    return input<0?-input:input;
}

online demo prints 2113, while being compiled with gcc it prints correct 92.

ir/ir.c doesn't compile with clang

This is not a bug as ir_error() never returns, but clang doesn't know that. :-/

cc -c -I. -std=gnu99 -m32 -W -Wall -W -Werror -MMD -MP -O -g -Wno-missing-field-initializers -Wno-missing-field-initializers ir/ir.c -o out/ir.o
ir/ir.c:343:12: error: variable 'argc' is used uninitialized whenever 'if' condition is false
      [-Werror,-Wsometimes-uninitialized]
  else if (op == (Op)DATA) {
           ^~~~~~~~~~~~~~
ir/ir.c:351:23: note: uninitialized use occurs here
  for (int i = 0; i < argc; i++) {
                      ^~~~
ir/ir.c:343:8: note: remove the 'if' if its condition is always true
  else if (op == (Op)DATA) {
       ^~~~~~~~~~~~~~~~~~~~
ir/ir.c:326:11: note: initialize the variable 'argc' to silence this warning
  int argc;
          ^
           = 0

8cc cannot find header files: stdio.h and others

When I'm trying to do anything from C involving STDIO or other C header files, I just get a report that the header file cannot be found, despite being in /usr/include/.

Command run: ~/elvm2/out/8cc -Ilibc -S wwvsim.c -o wwvsim.eir
Output: [ERROR] cpp.c:716: wwvsim.c:25:1: cannot find header file: stdio.h
Note this happens for EVERY header file. How do I fix this?

System: Devuan GNU/Linux 4 (chimaera) x86_64
Kernel: 5.10.0-18-amd64
Extra info:
/usr/include/stdio.h: File exists
In the C file I am trying to convert, stdio is included as #include <stdio.h>
Help is greatly appreciated. Thank you in advance!

Fails to build on Ubuntu 16.04

I cloned the latest master (25fade6) and built on a stock Ubuntu 16.04 (gcc 5.4.0, etc.):

$ make
Skip building js due to lack of nodejs
Skip building el due to lack of emacs
Skip building cl due to lack of sbcl
Skip building cpp due to lack of g++-6
Skip building i due to lack of ick
cp test/24_cmp2.c out/24_cmp2.c.tmp && mv out/24_cmp2.c.tmp out/24_cmp2.c
git submodule update --init
Submodule '8cc' (https://github.com/shinh/8cc) registered for path '8cc'
Submodule 'Whitespace' (https://github.com/koturn/Whitespace) registered for path 'Whitespace'
Submodule 'tinycc' (http://repo.or.cz/tinycc.git) registered for path 'tinycc'
Cloning into '8cc'...
remote: Counting objects: 4541, done.
remote: Compressing objects: 100% (187/187), done.
remote: Total 4541 (delta 112), reused 0 (delta 0), pack-reused 4354
Receiving objects: 100% (4541/4541), 1.42 MiB | 1.19 MiB/s, done.
Resolving deltas: 100% (3026/3026), done.
Checking connectivity... done.
Submodule path '8cc': checked out '842752b089019bf883e21604a98e712b55fd7727'
Cloning into 'Whitespace'...
remote: Counting objects: 54, done.
remote: Total 54 (delta 0), reused 0 (delta 0), pack-reused 54
Unpacking objects: 100% (54/54), done.
Checking connectivity... done.
Submodule path 'Whitespace': checked out '16be2c0617a6f7846c53802e1a4cb382ccf7dc8a'
Cloning into 'tinycc'...
remote: Counting objects: 8915, done.
remote: Compressing objects: 100% (2610/2610), done.
remote: Total 8915 (delta 6260), reused 8864 (delta 6217)
Receiving objects: 100% (8915/8915), 2.89 MiB | 531.00 KiB/s, done.
Resolving deltas: 100% (6260/6260), done.
Checking connectivity... done.
Submodule path 'tinycc': checked out 'c948732efaf823f36d05608fe716bfcc4a98b70c'
touch out/git_submodule.stamp
make -C 8cc && cp 8cc/8cc out/8cc
make[1]: Entering directory '/home/ondrej/repos/elvm/8cc'
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o main.o main.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o cpp.o cpp.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o debug.o debug.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o dict.o dict.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o gen.o gen.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o lex.o lex.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o vector.o vector.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o parse.o parse.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o buffer.o buffer.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o map.o map.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o error.o error.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o path.o path.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o file.o file.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o set.o set.c
cc -Wall -Wno-strict-aliasing -std=gnu11 -g -I. -O0 -DBUILD_DIR='"/home/ondrej/repos/elvm/8cc"'   -c -o encoding.o encoding.c
cc -o 8cc main.o cpp.o debug.o dict.o gen.o lex.o vector.o parse.o buffer.o map.o error.o path.o file.o set.o encoding.o 
make[1]: Leaving directory '/home/ondrej/repos/elvm/8cc'
cc -c -I. -std=gnu99 -m32 -W -Wall -W -Werror -MMD -MP -O -g -Wno-missing-field-initializers -Wno-missing-field-initializers ir/ir.c -o out/ir.o
In file included from /usr/include/stdio.h:27:0,
                 from ./ir/ir.h:4,
                 from ir/ir.c:1:
/usr/include/features.h:367:25: fatal error: sys/cdefs.h: No such file or directory
compilation terminated.
Makefile:62: recipe for target 'out/ir.o' failed
make: *** [out/ir.o] Error 1

Is this a bug, or am I missing some package in my Ubuntu installation?

The demo site does not accept input

I see that ELVM supports input, but not the demo site, input can only be passed using DevTools, so maybe it will be neccessary to add input to the demo site.

Will 9cc be supported?

9cc is a successor of 8cc, 9cc can be understood extremely easily while creating a compiler that generates reasonably efficient assembly. Will 9cc be modified to generate elvm IR?

Missing libc/sys/stat.h

$ make rb
...
cp 8cc/*.h 8cc/*.inc 8cc/include/*.h out
cat 8cc/buffer.c 8cc/cpp.c 8cc/debug.c 8cc/dict.c 8cc/encoding.c 8cc/error.c 8cc/file.c 8cc/gen.c 8cc/lex.c 8cc/main.c 8cc/map.c 8cc/parse.c 8cc/path.c 8cc/set.c 8cc/vector.c > out/8cc.c.tmp && mv out/8cc.c.tmp out/8cc.c
out/8cc -S -I. -Ilibc -Iout out/8cc.c -o out/8cc.c.eir.tmp && mv out/8cc.c.eir.tmp out/8cc.c.eir
[ERROR] cpp.c:716: out/8cc.c:1713:1: cannot find header file: sys/stat.h
build.mk:5: recipe for target 'out/8cc.c.eir' failed
make: *** [out/8cc.c.eir] Error 1

$ touch libc/sys/stat.h
then passed.

Is this project alive?

Github shows that there hasn't been any updates since October 13, 2022. Is this project still being maintained?

Volatile Semantics

Greetings.....so as far as I can tell, this compiler doesn't encode volatile semantics in the IR....(maybe I am mistaken...if so let me know).

So I guess ultimately the problem is this:

volatile unsigned char x; void foo() { while (x != 12){} }
The c backend would generate something equivalent to this:

unsigned char x; void foo() { while (x != 12){} }

And then the compiler that you feed this into (which could be an optimizing compiler) will say....oops x is never changed, so I am going to just optimize out all but the first check on the loop (https://godbolt.org/z/CXsSfg)

Anyways.....let me what are your plans are in this regard? (A legitimate response can be that is outside scope.....)

EOF is not working as intended

While I was trying to investigate some issues with fgets implementation, I've noticed that the source of the issue is EOF definition itself.

int c = getchar();
if (c < 0) {
  printf("%x\n", c);
  printf("%x\n", EOF);
}

After running the code snippet above, I have got the following result:

-1
ffffff

As you can see, EOF is 24-bit instead of 32-bit. Therefore, if (c == EOF) is always false. This can be seen in the generated EIR code as well:
image
16777215 is simply 0xffffff. This causes some bugs. Even though interpreting the EIR output with eli works fine for some cases, it still fails if you try to convert EIR to Whitespace for instance.

Since I haven't been able to pinpoint issue yet, I have made temporary fix by changing the condition from c == EOF to c < 0. I will make a pull request soon.

By the way, printf might have issues with hexadecimal representation as well since it literally printed back -1 instead of ffffffff.

Syscall, etc.

Any plans to add (limited) support for syscalls, etc.?

Invalid behavior with reverse loops

Iterator overflows at reverse loops

Test code:

#include "libc/_raw_print.h"

int main()
{
    for (int i = 0; i <= 5; i++)
    {
        print_int(i);
        putchar(' ');
    }
    
    putchar('\n');

    for (int i = 5; i >= 0; i--)
    {
        print_int(i);
        putchar(' ');
    }
}

compile: out/8cc -Ilibc -S test.c -o test.o
run: out/eli test.o

program output:

0 1 2 3 4 5
5 4 3 2 1 0 16777215 16777214 16777213 (...)

Building with make takes an absurd amount of time and space

After running make for about 60 hours, 108 GB of disk space was taken up, and the tests were not finished. I have some suggestions to make building and testing better:

  • Make a configure script to set the languages to build (via the presence of tools on the system and command-line flags like --enable-c and --disable-tm), which will generate a Makefile from a template (via automake).
  • Refactor the Makefile to be more modular:
    • Separate building and testing into different targets (build and test).
    • Create targets for building and testing individual languages (e.g. build-cpp and test-js).
    • Create a test-full target that runs expensive tests (e.g. 3-stage bootstrap tests), which are not included in the regular test target.
    • Properly clean up test files after each test to reduce filesystem impact.

Test fails on WSL

The full console output MIGHT be downloadable from this link, but the revision is 709bea7 and probably the most interesting console output lines are:

(diff -u out/elc.c.eir.out out/elc.c.eir.tex.out > out/elc.c.eir.tex.out.diff.tmp && mv out/elc.c.eir.tex.out.diff.tmp out/elc.c.eir.tex.out.diff) || (cat out/elc.c.eir.tex.out.diff.tmp ; false)
--- out/elc.c.eir.out   2021-04-22 08:11:40.740335000 +0300
+++ out/elc.c.eir.tex.out       2021-04-22 11:34:47.693720200 +0300
@@ -1,63 +1,2 @@
 === test/elc.in ===
-var main = function(getchar, putchar) {
-var a = 0;
-var b = 0;
-var c = 0;
-var d = 0;
-var bp = 0;
-var sp = 0;
-var pc = 0;
-var mem = new Int32Array(1 << 24);
-mem[0] = 1;
-var running = true;
-
-var func0 = function() {
- while (0 <= pc && pc < 512 && running) {
-  switch (pc) {
-  case -1:  // dummy
-   break;
-
-  case 0:
-   if (true) pc = 1 - 1;
-   break;
-
-  case 1:
-   a = getchar();
-   if (a == 0) pc = 3 - 1;
-   break;
-
-  case 2:
-   putchar(a);
-   if (true) pc = 1 - 1;
-   break;
-
-  case 3:
-   running = false; break;
-  }
-  pc++;
- }
-};
-
-while (running) {
- switch (pc / 512 | 0) {
- case 0:
-  func0();
-  break;
- }
-}
-};
-if (typeof require != 'undefined') {
- var sys = require('sys');
- var input = null;
- var ip = 0;
- var getchar = function() {
-  if (input === null)
-   input = require('fs').readFileSync('/dev/stdin');
-  return input[ip++] | 0;
- };
- var putchar = function(c) {
-  sys.print(String.fromCharCode(c & 255));
- };
- main(getchar, putchar);
-}

diff.mk:4: recipe for target 'out/elc.c.eir.tex.out.diff' failed
make: *** [out/elc.c.eir.tex.out.diff] Error 1

I used

make -j 3

for building. The environment:

tsw1@DESKTOP-H12EA8D:~/m_local/bin_p/ELVM/v2021_04_08/elvm$ uname -a
Linux DESKTOP-H12EA8D 4.4.0-19041-Microsoft #488-Microsoft Mon Sep 01 13:43:00 PST 2020 x86_64 GNU/Linux
tsw1@DESKTOP-H12EA8D:~/m_local/bin_p/ELVM/v2021_04_08/elvm$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

undefined sym: __builtin_mul

Frequently when using the online demo, attempting to assemble the ELVM IR just prints:

undefined sym: __builtin_mul

It seems to occur anytime multiplication is needed. For example, the code below throws the error above:

int main(void) { printf("%d", 6 * 3); }

This may not just be the online demo but I suspect that it is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.