Coder Social home page Coder Social logo

hirevo / som-rs Goto Github PK

View Code? Open in Web Editor NEW
11.0 3.0 5.0 1.85 MB

An alternative implementation of the Simple Object Machine, written in Rust

Home Page: https://hirevo.github.io/som-rs/som_interpreter_bc/index.html

License: Apache License 2.0

Rust 100.00%
rust interpreter parser lexer vm language som smalltalk

som-rs's Introduction

The Simple Object Machine

This is an implementation of the Simple Object Machine, written in Rust.

This project includes two SOM interpreters:

  • som-interpreter-ast: An abstract-syntax-tree-based interpreter.
  • som-interpreter-bc: A bytecode-based interpreter.

This repository is organized as a Cargo workspace, containing multiple crates (libraries).
Here is a rundown of these different crates (as of now, layout may change in the future):

Name Description
som-core Core SOM types and abstractions, shared across the workspace.
som-interpreter-ast The AST-based SOM interpreter library and binary.
som-interpreter-bc The bytecode-based SOM interpreter library and binary.
som-lexer The SOM lexical analyzer.
som-parser-core The common foundational types for building parsers
som-parser-text A SOM parser that works directly with text (without a lexer).
som-parser-symbols A SOM parser that works with som-lexer's output.

How to build and run

The interpreter is already usable, you can use it to start evaluating from a file or to have a Read-Eval-Print Loop to play around with the language.

To compile the program, you'll need to have Rust installed on your machine.
You can find the instructions on how to install on the official Rust language website.
We recommend using whatever is the latest stable toolchain (which was 1.44 at the time of this writing).

Once you have Rust installed, simply run:

# the '--release' flag indicates to build with optimizations enabled.
# you can remove this flag if you wish to have more debug information in the emitted binary.
cargo build --release

This will compile the project and take care of fetching and building dependencies for you.
Once the build is finished, you should have a target/ folder created in the current directory.
You'll find the interpreter's binaries at ./target/release/som-interpreter-{ast,bc}.

To start the REPL, you can run:

# the '-c' flag instructs the interpreter where to find the SOM standard library.
./target/release/som-interpreter-bc -c core-lib/Smalltalk

# you can pass multiple paths to '-c' by just keeping on adding arguments.
./target/release/som-interpreter-bc -c core-lib/Smalltalk core-lib/Examples

You'll get a prompt in which you can type SOM expressions and see what they get evaluated to.
The REPL makes the latest value successfully evaluated available via the it variable, so you can keep poking at the result of a previous expression, like this:

(0) SOM Shell | #(4 5 6)
returned: #(4 5 6) (Array([Integer(4), Integer(5), Integer(6)]))
(1) SOM Shell | it at: 2
returned: 5 (Integer(5))
(2) SOM Shell | it timesRepeat: [ 'hello from SOM !' println ]
hello from SOM !
hello from SOM !
hello from SOM !
hello from SOM !
hello from SOM !
returned: 5 (Integer(5))

To evaluate from a file, simply pass the file as another argument to the interpreter.
But, since the '-c' accepts multiple files, you might need to add the '--' argument before that file, like so:

./target/release/som-interpreter-bc -c core-lib/Smalltalk -- core-lib/Examples/Hello.som

When using the bytecode interpreter, you have the option to dissasemble a given class' methods using -d (or --disassemble), like so:

./target/release/som-interpreter-bc -c core-lib/Smalltalk -d core-lib/Examples/Hello.som
# OR:
./target/release/som-interpreter-bc -c core-lib/Smalltalk core-lib/Examples -d Hello
# OR (for disassembling only specific methods):
./target/release/som-interpreter-bc -c core-lib/Smalltalk core-lib/Examples -d Hello first:method: second:method:

For other purposes, you can use '-h' (or '--help') to print the complete help message:

./target/release/som-interpreter-bc -h

License

Unless otherwise noted (below and/or in individual files), this project is licensed under either of

at your option.

The SOM core library (in the core-lib/ folder) is licensed under SOM's own terms:

https://github.com/SOM-st/SOM

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

som-rs's People

Contributors

hirevo avatar octavelarose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

som-rs's Issues

`Object>>#hashcode` recurses infinitely on objects with cyclic references

While working with graphs to test various pathfinding algorithms in SOM, I discovered that the current implementation of the Object>>#hashcode primitive infinitely recurses, if the object contains a reference cycle anywhere in itself and/or in its locals.

Minimal example code:

CyclicDataTest = (
    run: args = (
        | first second |

        first := Array new: 1.
        second := Array new: 1.

        first at: 1 put: second.
        second at: 1 put: first.

        " this call to `Object>>#hashcode` recurses infinitely until the stack overflows "
        first hashcode println.
    )
)

Running this code results in:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted

Improper parsing of symbols

The current state of how symbols are parsed in both interpreters in som-rs is somewhat non-standard, compared to other SOMs.

This issue stands to track the cases where som-rs behaves differently from other SOMs, in order to get them all fixed.

Here are the problematic cases that I am currently aware of:

  • Spaces between # and identifier (ex: # foo, accepted by most SOMs, rejected by som-rs)
  • Spaces between # and operator (ex: # +, accepted by most SOMs, rejected by som-rs)
  • Spaces between # and string literal (ex: # 'foo', accepted by most SOMs, rejected by som-rs)
  • Non-leading successive colons in selector (ex: #foo::, rejected by most SOMs, accepted by som-rs)
  • Leading digits after colons (ex: #foo:2:, rejected by most SOMs, accepted by som-rs)

Somewhat related to this issue is the situation with array literals, which suffer from a similar problem due to also using the # in the syntax:

  • Spaces between # and ( (ex: # (1 2 3), accepted by most SOMs, rejected by som-rs)

Most of these issues are due to the fact that the lexer is currently tokenizing the whole symbol at once (as: Token::Symbol(String)) instead of simply outputting its fragments (something like: [Token::Pound, Token::Selector(String)]).
Delegating the construction of the symbol to the parser would likely be the way forward to address these problems.

`Object>>#hashcode` primitive is broken

The Object>>#hashcode primitive is broken.
Its current implementation makes that the hash of the same value can change when recomputed.
This is because of the following code (the current implementation):

let mut hasher = DefaultHasher::new();

// Should be fine, since we do not mutate anything ??
let raw_bytes: &[u8] = unsafe {
    std::slice::from_raw_parts(
        (&value as *const Value) as *const u8,
        std::mem::size_of_val(&value),
    )
};
hasher.write(raw_bytes);

let hash = (hasher.finish() as i64).abs();

Return::Local(Value::Integer(hash))

The unsafe bit is the broken bit (predictably):
It takes a instance of Value, which is an enum, and hashes all of its bytes.
The reason it is broken has to do with the fact that Rust implements enums using tagged unions.
And, within unions, not all bytes are always used by every variants, meaning that there can be some uninitialized bytes within an enum (which is normally not a problem because Rust doesn't expose these bytes to safe code in any way).
These uninitialized bytes are why a hash can change, even when the value is actually the same.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.