- A compiler for a small subset of
C
. - Written in
rust
. - Targets
x86_64
. - I wrote this following Nora Sandler's Writing a C Compiler series which in turn is based on Abdulaziz Ghuloum’s An Incremental Approach to Compiler Construction.
- I also picked up some rusty ideas from
- Tristan Hume's Writing a Compiler in Rust
- Shuhei Kagawa's Writing an Interpreter and a Compiler in Rust.
- some x86 guidance from
- kesgin's Intel Instruction Set pages
- Doeppner's x64 Cheat Sheet
- the Compiler Explorer
Compilation is broken down into 9 steps.
- lexing (
lexer
): converts source code into a token stream. - parsing (
parser
): converts token stream into an abstract syntax tree (AST). - (not implemented yet) semantic analysis (
sema
): performs semantic analysis on the AST, that is:- ensure that variables are defined before use; and
- type checking.
- (not implemented yet) optimization (
astopt
): performs optimizations on the AST. - intermediate code generation (
codegen
): generates pseudo-assembly from the AST. - (not implemented yet) optimization (
asmopt
): performs optimizations on the pseudo-assembly. - generate object files (
x86_64_gen
): converts pseudo-assembly to x86_64. - (not implemented yet) optimization (
x86_64opt
): performs optimizations on the generated x86_64. - linking/loading: generate executable from x86_64. Currently, performed inside
main.rs
usinggcc
.
I wanna learn rust. I have played around with rust in the past, but never did anything non-trivial with it. This seemed like a good project for rust (pattern matching, enums, speed). I started this project in OCaml (the language Nora's series recommends) and quickly switched because OCaml didn't seem that much better than rust for this, and I wanted to learn rust.
I like C. C is simple. My long term intention is to mess with security stuff (like spectre mitigations; see wishlist below) and C seems to be the best language for that (almost all cryptography software is written in C.)
I have not run into version specific stuff yet, but for when I do, I am gonna pick ISO C18.
I wanna learn x86_64. I have just started playing CTFs and x86_64 knowledge is invaluable when disassembling binaries.
I find intel syntax to be easier to read, so the assembly produced is in the intel syntax. I started off with ATT syntax which is the one that Nora's series uses, but quickly switched because reading the produced assembly was harder than I wanted it to be.
- Implement Spectre variant
1 mitigations
like retpoline. In
particular, I would like to be able to define
secret int x;
and then have code that is independent of the value ofx
. Crypto peeps call this "constant-time", but I want constant time even in the presence of speculative execution by the processor. It might be useful to take a look the static analysis tool mentioned in this paper. - Building on the last point, if I define a variable as
secret
then it should be cleared before being freed. This would prevent secret data from being leaked into the heap or the stack.
- Implement register allocation. Currently, all return values are stored on the
stack. Check if we can store it in a register, and if so, store it in a
register. This will save tonnes of time at runtime since
store
s andload
s are really expensive. - Implement auto-vectorization, a la LLVM.
- Implement fixed-width math. Currently, everything is 64-bit. For instance,
int x
is a 64-bit int (the spec allows this!) Implementuint8_t
,uint16_t
,uint32_t
, anduint64_t
.
- It would be cool to compile to LLVM and then make use of its passes.
- It would be nice to be able to compile to WASM, so one can run
C
on the web and other WASM targets. (if this seems absurd, ask yourself, why do we use docker?) - It would be cool to compile to ARM. This shouldn't be that difficult cuz of the pseudoassembly, but I dunno, I haven't tried. I would also need to find a way to link and run ARM executables on my computer (qemu looks promising.)
I have been testing this compiler against nlsandler's Write a C Compiler!
tests. master
passes stages
1-4 of the tests.