Coder Social home page Coder Social logo

rakhithjk / scratch-code-ast Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sigalor/scratch-code-ast

0.0 1.0 0.0 3.55 MB

The abstract syntax tree implementation for scratch-code.

License: Apache License 2.0

Makefile 1.01% C++ 97.60% Python 1.06% Shell 0.32%

scratch-code-ast's Introduction

Overview

This project serves as the abstract syntax tree for scratch-code. It is able to represent basic structures of imperative programming languages, mainly focused on C.

Components

scratch-code-ast makes heavy use of object-orientation. This makes it possible to generalize instances easily. To specialize them again, while avoiding object slicing, the project uses std::shared_ptr in almost every case. This also has the advantage that objects, which often need to be referenced multiple times, do not need to be copied expensively.

Inheritance graph

When you look at the file include/AST.hpp, you can see the class hierarchy of scratch-code-ast. In the following, there is a inheritance graph kindly generated by Doxygen and Graphviz DOT.

inheritance graph

Class descriptions

As you can see, the ast::Node class is the ancestor of all other classes. Descriptions all of the classes are:

Class name Description
Node is the ancestor of all other classes and contains the parent and id
    Statement bundles class instances which are able to appear in the parsed source code
        VariableDefinition defines a variable, consisting of a type and a name
        FunctionDefinition defines a function, consisting of a return type, name, argument names and types and statement list
        Value bundles class instances which can be used as a single value
            LValue means locatable value, i.e. references a ast::VariableDefinition which has a specific and user-defined point in memory
            RValue is the opposite of ast::LValue, i.e. you do not really know where this value is actually stored as it is temporary
                RValueValue represents a simple literal, either bool (true or false), int (e.g. 42), real (e.g. 3.14159) or string (e.g. "Example string")
                FunctionCall represents calling a function previously defined by FunctionDefinition, is an RValue, as a returned value has this property
            Operation bundles unary and binary operations
                UnaryOperation combines a ast::Value with an unary operator (see below)
                BinaryOperation combines two instances of ast::Value, left hand side and right hand side, with a binary operator (see below)
        ControlFlowStatement bundles statements which are able to change the flow of execution, i.e. make it possible for it to have multiple paths
            Conditional represents if [else if]* [else]? statements, i.e. conditional objects which consist of multiple conditions (if and else if) and a final alternative (else)
            ControllableLoop bundles loops which are controllable by a ast::LoopControlStatement
                ForLoop a classic for loop, consisting of an initialization, condition, afterthought and statements
                WhileLoop a classic while loop, consisting of a condition and statements
            LoopControlStatement can control a ast::ControllableLoop with break or continue
            ReturnStatement belongs to a ast::FunctionDefintion and makes this function return the specified value, or nothing for void
    StatementList is essentially just an array of ast::Statement, used to integrate it better into the ast namespace
    VariableDefinitionList an array of ast::VariableDefinition
    ValueList an array of ast::Value

Variable types

In the strongly-typed enumeration ast::Lexer::ParsedVariableType in include/LexerTokenDefinitions.hpp, there are multiple variable types defined. scratch-code-ast has very strict typing, so these are the only types available.

Type name Default value Description
void (none) can only be used for function return types, not for variables. Indicates that the function returns no value
bool false a Boolean value, i.e. has only two states: true or false
int 0 an integer, which has no specific value range and may be positive or negative
real 0.0 a real value, i.e. a floating point value
string "" a list of characters which forms a text

Other concepts

LValues and RValues: Value categories

scratch-code-ast relies on the concept of lvalues and rvalues. Every ast::Value results in one of these types, which are called value categories in the code. The difference between these two value categories may be hard to grasp at the beginning, but you will soon notice the necessity of this concept.

In general, you can imagine lvalues as values that have a name. When you define a variable and call it by its name later in your code, you get an lvalue. This was already mentioned in the short ast::LValue class description above.

On the other hand, an rvalue is a temporary value. It cannot be accessed again after its statement is finished.

For another explanation, look at a simple assignment like a = 42; (of course a has been defined before). Here, a is an lvalue which references the previously defined variable a. Thus, you can assign something to it, in this case 42. Next, look at the incorrect statement a+3 = 123;. This makes no sense, as the left hand side of the assignment operator is now an rvalue, because a+3 is just temporary. Thus, you can only assign to lvalues. That's why this concept is essential for scratch-code-ast.

IDs

In the file include/AST.hpp you can see different hexadecimal numbers as a comment besides the class predeclarations. These are the ids used by scratch-code-ast. When you look at any class definition (let's take ast::LValue as an example), you always have the static member static const int uniqueId; and something like const int LValue::uniqueId = 0x00001311; in the final implementation. These ids are, of course, unique for a class and follow a specific pattern.

Reading the ids is done from right to left. The left-most hexadecimal digit shows the class's index on the very first level. As ast::Node is the ancestor of all other classes, it has the id 0x00000001. On the next level, there are the classes ast::Statement with 0x00000011, ast::StatementList with 0x00000021, ast::VariableDefinitionList with 0x00000031 and ast::ValueList with 0x00000041. As you can see, ast::Node has exactly four direct children and their ids are set by their index. As they all are on the second level, the changed digit is the second one from the right.

It's the same with the children of ast::Statement. As they are on the third level, only the third digit from the right is changed.

As you can see, this way it is possible to directly get the ancestors of a class when you only have the id. When you get an id like 0x00022411, you directly know that the class that has it is derived from ast::Node, ast::Statement, ast::ControlFlowStatement, ast::ControllableLoop and finally is an ast::WhileLoop.

Typecasts

In scratch-code-ast, you may convert any type to any other type. The following table shows what happens during the different kinds of typecasts.

To
bool int real string
From bool (none) 1 when true, 0 when false 1.0 when true, 0.0 when false "true" when true, "false" when false
int false when 0, true otherwise (none) value is kept decimal value as text
float false when 0.0, true otherwise value is floored to the next integer (none) decimal value as text
string true when "true", false otherwise try to parse as int, 0 on failure try to parse as real, 0 on failure (none)

Operators

There are some things to say about different kinds of operators.

  • First of all, the logical ones: LogicalNot, LogicalAnd and LogicalOr. When you apply these to a value, this value is always implicitly casted to bool beforehand, i.e. if(!a) means if(!((bool)a)). As you can see in the Allowed input types for unary operators below, a typecast to bool is possible for all types different from void, i.e. all types that imply a value.
  • Next, comparison operators like LessThan, LessThanOrEqual, GreaterThan, GreaterThanOrEqual are evaluated by implicitly casting the values to int, except that real is kept as it is.
  • The binary Add operator is not only applicable to numeric types (i.e. int and real, but also to string. In this case, it means string concatenation.
  • The difference between the prefix and postfix increment and decrement operators is rather hard to get, but if you would really like to understand it, look at this Stack Overflow answer (it is not by me though).

To decrease the width of the following tables, valcat is used instead of value category, lhs instead of left hand side and rhs instead of right hand side.

Unary operators

Another strongly-typed enumeration in include/LexerTokenDefinitions.hpp is ParsedUnaryOperation. It contains the operators used for the ast::UnaryOperation class. They all have almost the same meaning in C, making a separate explanation not necessary. The input valcat and output valcat column contents were extracted from the functions ast::Lexer::getRequiredValueCategory and ast::Lexer::getResultingValueCategory defined in src/LexerTokenDefinitions.cpp. The contents of the output types column were taken from the ast::Lexer::getResultingType function from the same file. The position is given relative to the input value, e.g. for a LogicalNot, ~a is correct, while a~ is not.

Name Symbol Position Allowed input types Input valcat Output valcat Output type
LogicalNot ! before bool, int, real, string (any) rvalue bool
BitwiseNot ~ before bool, int (any) rvalue (input type kept)
PrefixIncrement ++ before int, real lvalue lvalue (input type kept)
PrefixDecrement -- before int, real lvalue lvalue (input type kept)
PostfixIncrement ++ after int, real lvalue rvalue (input type kept)
PostfixDecrement -- after int, real lvalue rvalue (input type kept)
UnaryPlus + before int, real (any) rvalue (input type kept)
UnaryMinus - before int, real (any) rvalue (input type kept)
TypecastBool (bool) before bool, int, real, string (any) rvalue bool
TypecastInt (int) before bool, int, real, string (any) rvalue int
TypecastReal (real) before bool, int, real, string (any) rvalue real
TypecastString (string) before bool, int, real, string (any) rvalue string

Binary operators

Similarly to unary operators, binary operators are defined in the strongly-typed enumeration ParsedBinaryOperation in include/LexerTokenDefinitions.hpp as well. They are used for the ast::BinaryOperation class. Again, there is no difference to C. The value category and type information also comes from the same file. Note that, for an ast::BinaryOperation, the types of both values need to be the same. If they are not, that's what typecasts are for.

Name Symbol Allowed input types Lhs input valcat Rhs input valcat Output valcat Output type
Add + int, real, string (any) (any) rvalue (input type kept)
Subtract - int, real (any) (any) rvalue (input type kept)
Multiply * int, real (any) (any) rvalue (input type kept)
Divide / int, real (any) (any) rvalue (input type kept)
Modulo % int, real (any) (any) rvalue (input type kept)
BitwiseAnd & bool, int (any) (any) rvalue (input type kept)
BitwiseOr ` ` bool, int (any) (any) rvalue
BitwiseXor ^ bool, int (any) (any) rvalue (input type kept)
BitshiftLeft << int (any) (any) rvalue (input type kept)
BitshiftRight >> int (any) (any) rvalue (input type kept)
Assignment = bool, int, real, string lvalue (any) lvalue (input type kept)
AddAssignment += int, real, string lvalue (any) lvalue (input type kept)
SubtractAssignment -= int, real lvalue (any) lvalue (input type kept)
MultiplyAssignment *= int, real lvalue (any) lvalue (input type kept)
DivideAssignment /= int, real lvalue (any) lvalue (input type kept)
ModuloAssignment %= int, real lvalue (any) lvalue (input type kept)
BitwiseAndAssignment &= bool, int lvalue (any) lvalue (input type kept)
BitwiseOrAssignment ` =` bool, int lvalue (any) lvalue
BitwiseXorAssignment ^= bool, int lvalue (any) lvalue (input type kept)
BitshiftLeftAssignment <<= int lvalue (any) lvalue (input type kept)
BitshiftRightAssignment >>= int lvalue (any) lvalue (input type kept)
LogicalAnd && bool, int, real, string (any) (any) rvalue bool
LogicalOr ` ` bool, int, real, string (any) (any)
LessThan < bool, int, real, string (any) (any) rvalue bool
LessThanOrEqual <= bool, int, real, string (any) (any) rvalue bool
GreaterThan > bool, int, real, string (any) (any) rvalue bool
GreaterThanOrEqual >= bool, int, real, string (any) (any) rvalue bool
Equal == bool, int, real, string (any) (any) rvalue bool
NotEqual != bool, int, real, string (any) (any) rvalue bool

Operator precedence

Just like in C, operators in scratch-code-ast have precedence. For example, because multiplication has a higher precedence (i.e. a lower number in the following table) than addition, 3+4*5 is evaluated like 3+(4*5) and not (3+4)*5. This project's precendence is mainly taken from this cppreference page.

Precedence Symbols Names Associativity
1 ++ -- PostfixIncrement, PostfixDecrement left-to-right
2 ++ --
+ -
! ~
(bool) (int) (real) (string)
PrefixIncrement, PrefixDecrement
UnaryPlus, UnaryMinus
LogicalNot, BitwiseNot
TypecastBool, TypecastInt, TypecastReal, TypecastString
right-to-left
3 * / % Multiply, Divide, Modulo left-to-right
4 + - Add, Subtract left-to-right
5 << >> BitshiftLeft, BitshiftRight left-to-right
6 < <=
> >=
LessThan, LessThanOrEqual
GreaterThan, GreaterThanOrEqual
left-to-right
7 == != Equal, NotEqual left-to-right
8 & BitwiseAnd left-to-right
9 ^ BitwiseXor left-to-right
10 ` ` BitwiseOr
11 && LogicalAnd left-to-right
12 ` `
13 =
+= -=
*= /= %=
<<= >>=
&= ^= `
=` Assignment
AddAssignment, SubtractAssignment
MultiplyAssignment, DivideAssignment, ModuloAssignment
BitshiftLeftAssignment, BitshiftRightAssignment
BitwiseAndAssignment, BitwiseXorAssignment, BitwiseOrAssignment

scratch-code-ast's People

Contributors

sigalor avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.