Coder Social home page Coder Social logo

zom's Introduction

Zom

Zom is a fast and secure programming language

zom's People

Contributors

larsouille25 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

zom's Issues

LLVM backend

For the code generation and compilation (LLVM related crates), in the Zom compiler. We could replace it with the C++ library, because it's the official way to use LLVM, there is some issues in this project that is related to LLVM not working : #24, #20, #23, #21. If we replace it with an a C++ package, we could just follow the doc and not wasting its time to found the function in Inkwell, etc ..

Pros

  • We can follow the LLVM docs without any trouble to find the corresponding function in Inkwell
  • Some issues (#24, #20, #23, #21) could be solved.

Cons

  • Safety, C++ is safer than C but C++ is not safe as Rust is.
  • Some layer between Rust and C++ is needed, you can find some solution below,

Solution N°1

Write a serializer that serialize the result of the type checking (or the end of the compilation pipeline that doesn't need LLVM). And the C++ program will be executed in a child process, with the serialized input, and we deserialize and return the output desired by the compiler.

Solution N°2

Write a C++ lib that uses LLVM and emit the desired output by the compiler, and his functions is called by the the Zom compiler with the Rust FFI to C++.

Originally posted by @Larsouille25 in #25 (comment)

Add Colors to errors.

After the transition to a Zom self compiling compiler, add colors to the errors. They can be dissabled using a flag are in the configuration.

`var` and `const`

This is a tracking issue for the RFC "var-const" (zom-lang/rfcs#2).

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or
open design questions. A tracking issue is however not meant for large scale
discussion, questions, or bug reports about a feature. Instead, open a dedicated
issue for the specific matter and add the relevant feature gate label.

Steps

  • Implement the RFC
    • parse var & const with name in AST of Symbol
    • In statement LocalSymbol
    • And in item : GlobalSymbol
  • Adjust documentation.

Unresolved Questions

N/A

Implementation history

Fixing lexer issues

All of these issues are in the same issue because it's easier to follow and because there are only small errors.

  • When a non-alphanumeric char is in an ident, (e.g: a@), the error is reported an char after:
Err: Lexer, in file `<stdin>` at line 1 :
 ... |
  1  | a@
 ... |   ^
         Illegal Character
  • Remove the magic if (while fixing the error) in the lexer, line 85
  • When an underscore is in a identifier, an error occured but not the correct one,
~> func foo_bar(a, b)
 > .eof
cannot parse integer from empty string
  • Make it possible to use _ in identifiers, at the beginning and in the middle.

Tracking Issue for primitives

This is a tracking issue for the RFC "primitives" (zom-lang/rfcs#3).

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or
open design questions. A tracking issue is however not meant for large scale
discussion, questions, or bug reports about a feature. Instead, open a dedicated
issue for the specific matter and add the relevant feature gate label.

Steps

  • Implement the RFC
    • Literals ->
      • Int
      • Float
      • int: binary, octal, hexadecimal literals -> not yet, create another issue for this.
      • char
      • str
      • escape sequences
        • \xNN
        • \n
        • \r
        • \t
        • \\
        • \0
        • \'
        • \"
    • usize and isize
    • true and false keywords
    • boolean expression
    • undefined keyword
    • an undefined expression
  • Adjust documentation. -> #22

Unresolved Questions

N/A

Implementation history

Implement the base

Here is the list of things related to the language, that is needed to be implemented for the v0.1.0 milestone. And with there own RFC, -> #22

  • #40
    • return statement
  • #41
  • operators + unary expression -> #47
  • control flow
    • if / else if / else
  • loops
    • for loop -> not sure what they will be, an issue will be created when i know
    • while loop
    • loop loop -> use less, it just add a new keyword, use while (true) { .. } instead.
    • continue statement -> expression
    • break statement -> expression
  • comments -> #14
  • primitive types -> #10
  • remake blocks, no more 'automatically' returned expr, and the semicolon necessary : move sc parsing in stmts and make them optional for some expression instead of forbidden and make list expr { expr, expr, expr, ... }
  • defer statements -> will be done in another version. Their is things more important for now
  • remake call expression -> from $ident(...) to $expr(...).
  • .* deref operator and right unary operation
  • member access expr a.b
  • string expr literal
  • char expr literal

Improve the error handling in the compiler

Here is a list of things, we can change to improve error handling in the zom compiler :

  • create the crate zom_errors
  • create a struct LogStream it would contain a vector of BuiltLog with methods ->
    • add; push a BuiltLog into the vector.
    • add_built ??; build a BuiltLog from a passed Log and push into the vector
    • failed; return true if their is at least one error into the vector of logs, else false
    • render: writer, colored; render the vector of logs to the writer, with colors if colored is true
  • create an enum / struct Log that would contain all possible parsing error, with a dummy one: Custom to create custom errors where there is not a logkind and creating a logkind would be overkill. Logs will share to common fields : a reference counted to the file content and a RC to the file path. In general a log kind will take a location into the code.
    • build; converts Log into BuiltLog
  • create a struct BuiltLog containing a kind : warning / error / error with an error code, a description, an optional help message and a vector of note (a string). In general a BuiltLog is kinda dummy, it contains the line where the log cursor point, the cursor (just a range). A BuiltLog is an intermediate simple representation of a log.
    • render: writer, colored; render the BuiltLog to the writer, with colors if colored is true
  • replace the enum Log with a trait and make ExpectedToken and UnclosedDelimiter structs that implements the Log trait and then use it in the builder for the log
  • create a prelude
  • allow errors on multiple lines
  • replace the old error system with the new one.
    • lexer
    • parser

Tracking Issue for Operators

This is a tracking issue for the RFC "operators" (zom-lang/rfcs#4).

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or
open design questions. A tracking issue is however not meant for large scale
discussion, questions, or bug reports about a feature. Instead, open a dedicated
issue for the specific matter and add the relevant feature gate label.

Steps

  • Implement the RFC zom-lang/rfcs#4
    • Remake the code that lexes the operators
      • create constants of the operators equivalent in string
      • create a new function and better to detect if a char start like an operator
      • make an enum of operators, instead of storing a String
      • create enum for BinaryOperators
      • create enum for Right and Left Unary operators
      • and use them when parsing instead of the Operator enum
    • Implement the table of precedence in ParserSettings::default()
  • Adjust documentation.

Unresolved Questions

N/A

Implementation history

Add issue templates

Add issues templates for:

  • RFC tracking issue
  • Internal Compiler Error
  • Add a link in config.yml in ISSUE_TEMPLATE and the link is the pre-rfc stream of the Zulip server.

Fixing builder

  • Remove the box arround LLVMContext and try to replace it with a ContextRef.
  • Rename the Context struct by another thing because that's confusing

Explain file structure

Is your feature request related to a problem? Please describe.
Add a README file in src/ and explain all the files / sub directories

Type checker and type inference

Create type analyzer, that will infer type of expressions + create a typed AST in crate zom_typeizer

Todo:

  • Reproduce the AST in zom_parser but with types annotated to expressions.
  • make functions that converts from non-typed to typed expression
  • make functions to infer type of expression

Questions ??

  • Do we create comptime_int and comptime_float to make the type analyzer easier ??

Improve REPL

Is your feature request related to a problem? Please describe.
No, actually not but it's cool.

Describe the solution you'd like
when you're in the REPL, you can have mutli line, by pressing enter and when you finished you press ctrl + d or a thing like that.

Additional context
Like that ->

Mona v0.1.0-alpha, to exit enter `.quit`
~> func foo(bar) {
 .     var a = bar
 .     a = bar + 2
 .     // etc ..
 . }
${ctrl + d here}
~> 

where ${ctrl + d here} isn't show up but it's when the user press ctrl + d

Parsing Error need to point to the code

Is your feature request related to a problem? Please describe.
Add error that clearly point to the code.

Describe the solution you'd like

  • rename the actual Token enum into TokenType
  • Create a struct Token that stores a TokenType and a range, named the span.
  • The lexer will still put the token type and add a span for every token, the start and the end.

After that the token stream has the location in the code.

  • Create a trait CodeLocation that will have a function span() that return the span of the AST.
  • All AST nodes will store it's own span, (and so implement CodeLocation)
  • When an error occurs, you can use the span of the most coherent AST node and spawn an error. -> #30

Reverse lexer

In the reverse lexer, skip comments and empty lines.

Error format lines panic

If in the future, you have an error on a line greater than 99,999, you'll have an error by the formater.

By hardcoding the line number in the error message like that (in GeneralError, fmt implementation at line 140, at commit a190c95) and we make the REPL have a GeneralError (e.g., with float parsing error) :

// ...
num_str_fix_len(100_000, 5)
// ...

You have the following error :

thread 'main' panicked at 'attempt to subtract with overflow', src/error.rs:57:20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Packages, modules, import system in Zom

Clarify, packages, modules, create an import system in Zom:

  • Import item, in zom you can import the scope of symbols to the current file. like that :
import std.debug.println;

to import scope of the println function in the module. You can reexport symbols to another scole, with public imports,

pub import std.debug.println;
pub import std.test.assert;
pub import std.test.expect;
// ...
  • Write rules for Zom name mangling and put them into the book, then implement it (probably create its own issue)
  • Item visibility, visibility identifier
    • private, the item can only be called from the current, module, it's the default when pub isn't specified.
    • public, the item is usable in the whole package and packages that use it, use the pub keyword in front of the item.
  • export, when using export in front of an Item, the name mangling is disabled, and the symbol is exported. It's meant to facilitate interop with C. And because names aren't mangled, an error is omitted if to exported symbol have the same name
  • Modules can be declared, in another module, like that
mod childmod {
    // items ...
}

module declaration is an item so it can have a visibility identifier. Modules can also be declaraded from another file, like that:

mod modfile;
  • The global path of an Item is the path of it's container plus it's name. And the the path of the primary module is the name of the package. The primary module, is the path given to the first source file of a package. e.g:
// main.zom and package named 'example':

fn main() void {} // => example.main

pub mod somecode {
    fn hello() void { ... } // => example.somecode.hello
}

Convert `zom` to `zomc`

The idea is to make the Zom compiler like gcc or rustc, only one command with many options. And if anyone wants to make an alternative package manager they can do (for now their is no official package manager but one will come).

Todo

  • rename zom to zomc
  • remove subcommands dev, gettarget, version
  • add mandatory argument SOURCE_FILE that is the main file, e.g: bor a bin, it's main.zom, for a lib it's lib.zom
  • add options
    • --verbose, print to stdout more details.
    • --version --verbose shows the version + the commit hash + build date + target triple of the compiler.
    • --target TARGET, where TARGET is the target triple used when compiling with LLVM.
    • --emit [asm|llvm-ir|llvm-bc|obj], comma separated list of types of output for the compiler to emit.
    • --pkg-type [bin|zlib|lib|dylib|staticlib], what type of output the compiler will emit. zomlib will be covered in an RFC, and for now it'll not be done but will be their.
    • --pkg-name NAME, specify the name of the package, used when the output filename isn't specified to determine a filename.
    • --link-dir -L PATH, add a directory to the library search path.
    • --link -l NAME[:RENAME], link the generated package to the specified native library NAME. The RENAME is used if in the Zom source of the package you use another name for the library NAME.
    • --error -E [human|json], how stdout will look like. human is beautiful error messages, human readable. json is to print a json with the error if there is.
    • --opt LEVEL, optimization level, used by LLVM optimizer.
    • --output -O FILENAME, write output to FILENAME
    • --output-dir DIR, write the output in the DIR with the filename based on the package name.
  • remake a panic hook like before.

Change name

First Mona was chosen because I had the idea to make the syntax in French. So Idk why but the idea of Mona Lina (la Joconde in french) popped up in my head. But with almost a month passed, I don't like the name Mona, the project changed a lot, the first idea was an interpreted language like Python. But now it's a safe and still light alternative to C.

Todo after found the new name

  • Create a branch
  • Rename all crates with the new name <name>_fe, <name>_codegen etc
  • Replace all Mona in the source code
  • Replace all Mona in the docs.
  • Merge the branch, with a PR, referencing this issue #32
  • In the open issues, replace Mona
  • Change the GitHub description with the new name
  • Rename and move to its own organization
  • Replace all links to this repo to the new one (with the new name)
  • Readd projects to all issues / PR

Link files with LLD

  • Create an lld library to interact with in rust and latter in Zom.
  • Link to those crates types bin, staticlib, lib, dylib.
  • Add an argument, stripped to strip a binary or a lib.
  • Create the zlib file format. That will contains all metadata, like function name, function full path, enum entries, structs fields and types of structs, everything. It will be used to compile Zom Library, it will be used as default to Zom libs. That will facilate the compiler job, and in particular the @Import(..) builtin function.

Questions
I don't know exactly if just use the metadata things provided by LLVM could do the job, or if I .zlib will be archive that contains the compiled code + metadata ?

Improve multiple errors output in the Parser

It's said in e769064 that the parser can partialy return multiple errors, it's normal and it can only return multiple errors when an unexpected token is in the while let loop of the function parse() because after the smaller function need to return something and for the moment I don't know how to do it.

But it's planed to when an error occur while parsing a function definition or declaration, to skip tokens related to this AST and push the error into the list and continue parse other function definition / declaration etc ...

Originally posted by @Larsouille25 in #30 (comment)

Some improvements :

  • (MAYBE) in the expect_token! macro when the token we got isn't what we would want, push the error with context.push_err and try to recall the macro, but idk if it's a good idea ??

CI bug

Describe the bug
The CI doesn't have LLVM so when working code is pushed, there is a compile error

Expected behavior
The CI should, if the code works says that's working.

Adding Tests

Kinda, for verifying and test Mona, it should have unit tests.

Adding Unit / Integration test for every important functions / struct like Lexer, Parser, ASTNode etc...

  • zomc
  • zom_fe
  • zom_common
  • zom_codegen
  • zom_compiler

Replace anyhow error with custom errors

In the binary zom, replace all anyhow errors with custom made errors in zom_common.

  • implement #4 with a "reverse lexer"
  • Create the ZomError struct,
  • implement the Error trait.
  • Make parameter location optional.
  • Replace actual struct like IllegalCharError with a function (or macro) that create a new ZomError.
  • Add EOF token
  • In crate zom, replace anyhow errors with ZomError.
  • Remove anyhow = { version = "1.0.71", default-features = false }, in zom/Cargo.toml
  • Create a panic hook to create and print the internal error.

To facilitate debugging of the code and why there is errors,

  • Transform the return type of the lexer to a Vec<Result<..,..>> instead of Result<vec<..>,..>
  • The same for the parser: from Result<(Vec<ASTNode>, Vec<Token>), ZomError> to Result<Vec<ASTNode>, Vec<ZomError>> -> #30 (comment)
  • And try for the compiler and the codegen too if it's possible -> #44

In another branch,

Transform documentation to RFC and document the Compiler

To clarify and standardize how Zom work, I've created the repository zom-lang/rfcs that contain all document, but we need to wrote all RFC.

Needs to RFC :

  • Functions, RFC-0001, tracking issue #40
  • var & const, RFC-0002
  • Primitive types & values, RFC-0003
  • Operator, RFC-0004
  • Control Flow : if, else if, else; RFC-0005
  • for loop & while loop & loop, RFC-0006
  • comments, RFC-0007, tracking issue #14
  • at the end, delete /docs/lang/ and its content.

Mark as done when the PR is merged.

=> see zom-lang/rfcs#9 so, before writing the docs

  • create a branch named feat/docs-compiler
  • delete all the content of docs/
  • make the next steps inside this branch
  • after make a pr etc..

After, document how the compiler works, with markdown document and graphs if needed, use MdBook:

  • Keywords : strict, reserved, weak
  • Syntax explenation
  • Project architecture explanation
  • Lexer
  • Parser
  • Intermediate Zom Representations ??
  • Zom Name Mangling
  • Typecheck
  • Code generation
  • Compilation
  • Linkage
  • zlib -> make an RFC for zlibs

Other docs to remake (in main branch, not in feat/docs-compiler)

Licensing

Change the Apache-2.0 + MIT dual licensing to a Apache-2.0 with LLVM Exception ?

Fix the CI

Is your feature request related to a problem? Please describe.
The CI says LLVM isn't installed but there is an action to install it. After an issue opened here, the CI needs to install llvm-config with the according version of LLVM. (for the moment as long as this issue is not closed, the CI will be deactivated)

Describe the solution you'd like
Install, rather compile llvm with https://github.com/llvmenv/llvmenv

Describe alternatives you've considered
Create a github action that compiles LLVM and install it but too long and maybe later I'll do it but now I don't have time.

Make an AOT

I'm not very happy with the current, project, I realize that Mona is more a Ahead Of Time compiled programming language than a Just In Time compiled one. So here is the steps to modify Mona to be a AOT compiled one :

  • Remove all things related to JIT, so the REPL, in src/driver.rs.

  • Move the src/main.rs in src/bin/main.rs because later, subcommands will have there own module, in the binary.

  • #28

    • mona_fe will be the lexer, parser, token.
    • mona_common will be all common things like errors.
    • mona_codegen will be a crate that transform an AST to "LLVM IR".
    • mona_build mona_compiler will be the thing that transform the "LLVM IR" to object files
    • mona will be the binary, the thing that use all of the other ones to make a project.
      later on the Garbage Collector will be in mona_mem or mona_gc
  • Make a CI for mona_fe

    • and mona_common
  • Adding subcommand bobj that transform a .mn to an object file.

    • With parameters, input file and a flag that take where the result goes.
  • #29 -> This is for 0.2.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.