Coder Social home page Coder Social logo

kcreate / charly-vm Goto Github PK

View Code? Open in Web Editor NEW
94.0 11.0 5.0 5.65 MB

Fully parallel dynamically typed programming language

C++ 98.15% Shell 0.32% CMake 0.62% Python 0.91%
bytecode-interpreter compiler coroutines programming-language fiber parallel pointer-tagging

charly-vm's Introduction

Charly Programming Language Logo, Image credit: DALL-E

Charly Programming Language

Unit Test

Note: This is the rewrite branch of charly-vm. Lots of stuff isn't working yet. The main branch contains the previous fully functional version of charly-vm.

This launches a REPL which (at the moment) doesn't do very much.

This launches a REPL which (at the moment) does some cool stuff, but still not a lot.

This launches a REPL which supports some cool stuff, but still not a lot

./debug.sh [path/to/file.ch]

Dependencies

  • sudo apt-get install libboost-all-dev

Installation

Follow the steps below to install the charly executable on your system.

  1. git clone https://github.com/KCreate/charly-vm charly-vm
  2. cd charly-vm
  3. git checkout rewrite
  4. git submodule init
  5. git submodule update
  6. Set the CHARLYVMDIR environment variable to the project's root folder
    • e.g. export CHARLYVMDIR=/home/user/github/KCreate/charly-vm
  7. ./install.sh

The last step might request sudo permissions in order to access the relevant system directories.

Running the unit tests

$ ./tests.sh
[ 31%] Built target libcharly
[ 87%] Built target Catch2
[ 89%] Built target Catch2WithMain
[100%] Built target tests
===============================================================================
All tests passed (1422 assertions in 10 test cases)

charly-vm's People

Contributors

kcreate avatar tekknolagi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

charly-vm's Issues

Outdated context_catchtable inside generators

Generators are not updating the context_catchtable field when there's a new table outside of the generator.

Example:

func create_gen {
  throw 25
  yield 1
}

let gen = create_gen()

try {
  gen()
} catch(e) {
  // this is never called, instead the vm halts and
  // complains that it can't find a catchtable
  print(e)
}

Store local variables inside the MemoryCell of a Frame.

This could reduce the memory footprint by a little bit, depending on how many slots there are in the Frame.

Use a similar approach as is currently implemented for strings:

  • charly_frame_read_local
  • charly_frame_write_local

Implement asynchronous file system tasks

Similar to issue #45, the asynchronous counterpart of the file system methods need to be implemented. Also, charly-side abstractions for things like streams or file descriptors need to be developed. No direction interactions with file descriptors or such should be necessary to operate with these methods.

// Async File System Operations
fs_access,
fs_append_file,
fs_chmod,
fs_chown,
fs_close,
fs_copy_file,
fs_exists,
fs_fchmod,
fs_fchown,
fs_fdatasync,
fs_fstat,
fs_fsync,
fs_ftruncate,
fs_futimes,
fs_lchmod,
fs_lchown,
fs_link,
fs_lstat,
fs_mkdir,
fs_mkdtemp,
fs_open,
fs_opendir,
fs_open_sync,
fs_read,
fs_readdir,
fs_readfile,
fs_readlink,
fs_realpath,
fs_rename,
fs_rmdir,
fs_stat,
fs_symlink,
fs_truncate,
fs_unlink,
fs_unwatch_file,
fs_utimes,
fs_watch,
fs_watch_file,
fs_write,
fs_write_file,
fs_writev,

// File Descriptor Events
fd_ondata,
fd_onclose,
fd_onend,
fd_onerror,
fd_onreadable,

// Readline events
rl_onclose,
rl_online,
rl_onpause,
rl_onresume,
rl_sigcont,
rl_sigint,
rl_sigstp,

// Readline operations
rl_clearscreendown,
rl_cursorto,
rl_movecursor

Optimize short arrays

Similar to how short strings are optimized, we can optimize short arrays to store their contents in the memory cell as well.

C++14 compatibility

Remove usage of the following things:

  • std::optional
  • structured bindings
  • template folding expressions
  • try_emplace
  • emplace returning a reference to the created object
  • std::swap

Dynamic lookup of symbols?

Lookups a local variable by it's string representation.

Every frame would need a new field which maps from a symbol to an offset into the environment table. The frames local variable container is not turned into a map to still allow random access to each element.

Implementing a REPL would become trivial with these instructions.

ReadDynamic

Bytecode arguments:

  • symbol
  • symbol

SetDynamic

Stack arguments:

  • value

Bytecode arguments:

  • symbol

Add GrowEnvironmentSize instruction

Adds a variable amount of local variable slots to the current frame.

This is needed to support a REPL

GrowEnvironmentSize

Bytecode arguments:

  • count

Fix assignment to some special variables

The following code samples all fail to compile

Assignment to argument index:

$0 = foo // undefined variable $0

Assignment to class variable:

class Foo {
  property bar

  func method {
    bar = 0 // undefined variable bar
  }
}

The reason is that inside the visit_assignment method of the LVarRewriter we don't perform the same kind of checks as we do inside visit_identifier method.

Nice interface to the compiler

Come up with a nice interface to the compiler.
Currently error handling is tedious and manual work, requiring tons of comparisons and custom error handling methods.

The compiler should eventually be invokable with just a string, spitting out an instruction block in the process. Any error should be handled automatically and printed to some output stream.

Source mappings

  • Map ranges of instructions to row / column pairs in the source file.
  • Exceptions thrown can now show where the error occurred.
  • Allows showing a short highlighted snippet of code from where the error occurred.

Peephole Optimizations

The compiler should rewrite small inefficiencies like the following:

Example:

; old
setlocal 1, 0
readlocal 1, 0

; new
setlocalpush 1, 0

Keep track of module and function addresses

Store the addresses of compiled methods and modules somewhere in an index. This is useful to resolve paths inside the file import logic and also makes it very easy to display a nice stack-trace.

The idea is to keep two indexes somewhere:

  • The function index
  • The module index

The function index stores the addresses and names of functions that are generated during the compilation process. Contains an index into the module index, pointing to the module this function is contained inside.

The module index stores the addresses and filenames of files that were compiled.

Omit the arguments field if not needed

The arguments array should only be inserted into functions which need it.
The usage of the arguments symbol can be detected at compile-time and would then set a flag in the AST node of the function.

The VM would then only need to create the arguments array for functions which have the variadic attribute.

Exception safety in the compiler

Allocated AST nodes have to be deallocated if an exception is thrown.
Currently we leak tons of memory because of this.

shared_ptr might be able to solve this issue.

Also solvable without shared_ptr using some template shenanigans to automatically register each allocated node in a vector the parser later cleans up on an error.

Embeddability

Should the VM be embeddable into other programs or should it only work as a standalone program?

AND assignment for bitwise operators

Implement the following AND assignment operators:

&=   // AND assignment
|=   // OR assignment
^=   // XOR assignment
<<=  // left-shift assignment
>>=  // right-shift assignment

Import statement

Description

import statements are used to include libraries and other charly source files.

Import

  • following the import keyword with an identifier will import that identifier
// source
import foo

// desugared
const foo = __import("foo", "/tmp/1234")

Import from expression

  • if the library name is anything other than an identifier it will be treated as an expression
    • this can be used to dynamically import files
  • the expression may be wrapped in parens
    • note: parens cannot be used to form a tuple-like syntax
// source
import "myfile.ch"
import (foo) as foolib
import getlibname() as lib

// desugared
__import("myfile.ch", "/tmp/1234") // no declarations generated
const foolib = __import(foo, "/tmp/1234")
const lib = __import(getlibname(), "/tmp/1234")

Renamed import

  • the as keyword can be used to assign the imported library to a different name
// source
import foo as myfoo
import "bar" as mybar

// desugared
const foo = __import("foo", "/tmp/1234")
const myfoo = foo
const mybar = __import("bar", "/tmp/1234")

Import specific fields

  • curly braces can be used to extract specific fields from a library
  • imported fields can be renamed via an as keyword
    • both declarations should be accessable
  • the library name needs to be passed after a from keyword
  • this syntax can be combined with the as syntax
// source
import { open, read, close } from fs
import { open, read, close } from fs as thefslib
import { foo as f, bar as b } from foobarlib

// desugared
const fs = __import("fs", "/tmp/1234")
const { open, read, close } = fs

const fs = __import("fs", "/tmp/1234")
const thefslib = fs
const { open, read, close } = fs

const foobarlib = __import("foobarlib", "/tmp/1234")
const { foo, bar } = foobarlib
const f = foo
const b = bar

Module resolution

  • Runtime has table of known builtin modules
    • Can return those directly without any filesystem lookups
  • Absolute paths (paths starting with a /)
    • Path is looked up directly
    • No further lookups are done
  • All other paths are looked up in the folder hierarchy of the file where the import occured

The import statement import "somelib" in the file /foo/bar/baz/index.ch would result in the following filesystem lookups:

  • /foo/bar/baz/somelib
  • /foo/bar/baz/somelib.ch
  • /foo/bar/somelib
  • /foo/bar/somelib.ch
  • /foo/somelib
  • /foo/somelib.ch
  • /somelib
  • /somelib.ch

Module caching

  • When a module is executed, a reference to it is stored in a cache somewhere
  • If in the future the same module gets included again, the cached version is returned
    • The cache stores the mtime of the source file
    • If the file was modified after it was put into the cache, the cache entry is cleared and the module is executed again
  • If two fibers import the same module, only a single instance of the module should be created
    • Only a single module should be executed
    • Fibers need to acquire some kind of lock on the import path

String optimizations

Currently a string is allocated every time there is a string operation.

  • Short string optimization, store strings in the MemoryCell itself.
  • Copy on write if strings are copied / moved around.
  • Store 6 bytes of string in the VALUE itself (will require NAN-boxing)

VM Primitive classes mapping

The VM has to keep track of the primitive classes.
This is achievable by storing them in a separate field in the VM class.

The Object, Array variables accessible from user programs are merely additional
pointers to the classes and are not relevant to the execution at all. If the user overwrites these variables with something else it should have no impact on the program.

Implement file system library

Implement the main synchronous file system calls:

  • fs_open
  • fs_read
  • fs_close
  • fs_stat
  • fs_lstat
  • fs_fstat
  • fs_gets
  • fs_exists
  • fs_print
  • fs_flush
  • fs_read_bytes
  • fs_read_char
  • fs_write_byte
  • fs_expand_path
  • fs_fd_path
  • fs_unlink
  • fs_readdir
  • fs_mkdir
  • fs_rmdir
  • fs_chmod
  • fs_chown
  • fs_link
  • fs_symlink
  • fs_readlink
  • fs_rename
  • fs_utime
  • fs_writable
  • fs_readable
  • fs_truncate
  • fs_raw

Fiber deadlock detection

  • Fibers awaiting themselves
  • Two fibers awaiting each other
  • A ring of fibers awaiting each other, forming a circle

Stack allocate some variables

Allow the local variable allocator to store some variables on the stack rather than in a frame. This would help with the implementation of a match statement.

The variable allocator should know what is on the stack at every moment.

This would also require three new instructions (ReadStack, SetStack, SetStackPush)

Special syntax for variadic functions

The following syntax should be implemented:

func foo(a, b, c...) {
  typeof a // number
  typeof b // number
  typeof c // array       [3, 4, 5, 6]
}

foo(1, 2, 3, 4, 5, 6)

Document the whole codebase

This will help get an overview over the whole project and learn some techniques on how to generate documentation maybe.

Primitive classes

Primitive classes

Primitive classes are the classes which contain methods available on the language's primitive values.

The list of primitive classes is:

Object
Class
Array
String
Number
Function
Boolean
Null

If a symbol isn't found inside a primitive class, the Object primitive class is searched as well.

Make the GC and the vm's object-creation interface thread-safe

Methods that are implemented directly in C and are currently running in a separate thread (via worker thread for example) need to be able to allocate and temporarily lock the garbage collector for themselves so that they can allocate their variables and mark them as temporaries.

VM bootstrapping

Things which need to be performed on VM startup:

  • Initializing the basic global objects
  • Loading basic libraries
    • io
    • array
    • string
    • all other primitive classes
  • Setup aliases for some library methods
    • print for io.stdout.print
    • write for io.stdout.write
    • etc.

Void* wrappers for C++ modules

Modules for Charly written in C++ can return a new datatype to the VM CPointer.

struct CPointer {
  Basic basic;
  uintptr_t data;
  uintptr_t destructor;
};

uintptr_t deconstructor is a function pointer pointing. The signature of the destructor is as follows: void destructor(uintptr_t data).

The garbage-collector calls the destructor method once the object is being freed.

Testing

Probably something similar to the way the old Interpreter written in Crystal is tested.

super inside class instance methods

  • If self is an object, check the klass value
  • Check each parent if they have an instance method with the same name as the original function

The super functionality could also be implemented on top of other existing bytecodes but making it it's own instruction gives us a bit more wiggle room.

PutSuper

Bytecode instructions:

  • symbol

Garbage Collector optimizations

  • Reorder the freelist every once in a while to make sure cells which are allocated sequentially are also beneath each other in physical memory. This should (hopefully) improve the cache locality of the data.

  • Treat persistent nodes as root nodes instead of comparing every cell to every persistent node on every single mark phase.

Add DeleteMemberSymbol & DeleteMemberValue instructions

Allows the user to remove a symbol from an object.

// DeleteMemberSymbol
const obj = { foo: 25 }
delete obj.foo
obj // {}

// DeleteMemberValue
const obj = { foo: 25 }
const key = "foo"
delete obj[key]
obj // {}

DeleteMemberSymbol

Stack Arguments:

  • target

Bytecode Arguments:

  • symbol

DeleteMemberValue

Stack Arguments:

  • target
  • value

NAN-box floats

NAN-boxing allows us to encode full 64 bit floating point numbers in the VALUE type.
This way we can store pointers, integers, booleans and null in a single 64-bit number.

This allows the removal of the PutFloat instruction.

Because floats are not just as memory-efficient as integers, remove the use of integers as numeric values. This also removes a lot of type-checking as there is only one numeric type left.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.