Coder Social home page Coder Social logo

language-capstone's Introduction

Capstone

Compiler frontend for the Capstone language, built on LLVM.

Design

Capstone is a compiler frontend built on LLVM. The compiler is built from compartmentalized components encapsulated as classes.

Components

Lexer → (tokens) → Parser → (ast) → Validator → (ast) → Generator

Lexer and tokens

The lexer tokenizes the source code to provide a meaningful representation to the parser. The data is represented as an iterable series of tokens.

I (Justus) wrote the current lexer a while ago (like 7 months) and could use a rewrite, but will work for now. This is the benifit of compartmentalizing components.

Files:

  • lexer.h The lexer header declaration.
  • lexer.cc The lexer source implementation.
  • token.h The header of token types.

Parser and AST

The parser is responsible for grouping the tokens in a meaningful representation of program evaluation and control flow. The parser uses an instance of the lexer and transforms the token representation into a tree called the abstract syntax tree (AST).

The AST node source code is generated using a Python script (./scripts/ast_gen.py) from a declaration in ./src/ast.template.

Files:

  • parser.h The parser header declaration.
  • parser.cc The parser source implementation.
  • ast.h The declaration of the node classes for the AST.
  • ast.cc The implementation of the AST node methods.

Reserved Words

Capstone has 19 + 2 reserved words. Reserved words can be either a keyword, a statement, or a modifier, or multiple.

  • break Statement to break a loop.
  • class Keyword to declare a class.
  • const or final Keyword to declare an immutable variable or specify parameter immutability.
  • continue Statement to skip to the next pass of a loop.
  • else Keyword to declare the "else" body of an "if" statement.
  • false Literal for boolean false.
  • for Keyword to declare a "for" loop.
  • func Keyword to declare a function.
  • if Keyword to declare a "if" statement.
  • import Keyword to use classes and functions from another package.
  • null Literal for a null reference.
  • private Modifier to mark a field accessible only in the class scope.
  • protected Modifier to mark a field accessible only in the class and child class scope.
  • public Modifier to mark a field accessible outside the class scope.
  • return Statement to return from a function.
  • static Modifier to mark a field static instead.
  • true Literal for boolean true.
  • var or let Keyword for declaring a mutable variable.
  • while Keyword to declare a "while" loop.

Operators

In order of precedence:

Type Operators
Unary ! $ # @ []
Multiplicative * / %
Additive + -
Shift << >>
Relational < > <= >= <=>
Equality == !=
Bitwise (ordered) & ^ |
Logical (ordered) && ||
Ternary ? :
Assignment = += -= *= /= %= <<= >>= &= ^= |=

Uncommon Operators

Size of

The sizeof operator is expressed as the $ character. It is used to find the size of a variable in bits (this is yet to be determined), returned as an unsigned, 32 bit integer (u32). Usage:

var x: i32 = 5;
var y: u32 = $x;
// y = 32

Type of

This is not finalized to be in the language! The typeof operator is expressed using the # character. It is used to find the data type or class that of a variable. It returns the class.

Length of

The length of operator is expressed with the @ character. It is used to find the length of an array, and returns an unsigned, 32 bit integer (u32). Usage:

var x: i32[] = [1, 2, 3, 4, 5];
var y: u32 = #x;
// y = 5

Instance of

The instanceof operator is used to determine if a variable is an instance of a certain type, returning the true or false boolean as a single bit unsigned integer (u1). It is represented with the "spaceship" operator: <=>.

class MyObject { }

class AnotherObject { }

func main(args: String[]) u0 {
    var x: MyObject = MyObject();
    var y: AnotherObject = AnotherObject();
    var a: u1 = x <=> MyObject;
    var b: u1 = y <=> MyObject;
}

Builtin Types

Primatives

Primative datatypes are named using a character specification followed by a numerical size.

Character Meaning Sizes
i Signed integer 8, 16, 32, 64
u Unsigned integer 0, 1, 8, 16, 32, 64
f Floating point 16, 32, 64, 128

Type aliases

There are a few semi-reserved words that are builtin aliases for existing types.

Alias Type Why
void u0 Specifies returning no data
bool u1 Boolean data
char u8 Represents a unicode character

Builtin classes

There are two builtin classes: String and Error.

String

Under the hood, strings are represented as an array of characters: char[]. In addition, the String class provides useful methods like the string class in Java or C++. What makes the String class special is that it can:

  • be assigned with a string literal
  • uses + for concatanation
  • uses == for direct comparison

Error

Error handling is done through multiple return similar to a Go. However, the Error class eliminates the ugly nil checking.

func functionThatCanError(x: i32) (u32, Error) {
    if (x < 0) return null, Error("Paramter cannot be negative"); // An error with a string argument is a failure.
    return x, Error(); // An error with no string argument is a success.
}

func main(args: String[]) void {

    var x, err = functionThatCanError(-5);
    if (err.failed()) {
        // handle the error
    }

    var y, _ = functionThatCanError(-5);
    // Ignore the error, y will be null
}

Current TODO

  • Multiple declaration
  • Enum declaration

language-capstone's People

Contributors

juicestus avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.