Coder Social home page Coder Social logo

partcl's Introduction

Partcl - a minimal Tcl interpreter

Build Status

Features

  • ~600 lines of "pedantic" C99 code
  • No external dependencies
  • Good test coverage
  • Can be extended with custom Tcl commands
  • Runs well on bare metal embedded MCUs (~10k of flash is required)

Built-in commands:

  • subst arg
  • set var ?val?
  • while cond loop
  • if cond branch ?cond? ?branch? ?other?
  • proc name args body
  • return
  • break
  • continue
  • arithmetic operations: +, -, *, /, <, >, <=, >=, ==, !=

Usage

struct tcl tcl;
const char *s = "set x 4; puts [+ [* $x 10] 2]";

tcl_init(&tcl);
if (tcl_eval(&tcl, s, strlen(s)) != FERROR) {
    printf("%.*s\n", tcl_length(tcl.result), tcl_string(tcl.result));
}
tcl_destroy(&tcl);

Language syntax

Tcl script is made up of commands separated by semicolons or newline symbols. Commnads in their turn are made up of words separated by whitespace. To make whitespace a part of the word one may use double quotes or braces.

An important part of the language is command substitution, when the result of a command inside square braces is returned as a part of the outer command, e.g. puts [+ 1 2].

The only data type of the language is a string. Although it may complicate mathematical operations, it opens a broad way for building your own DSLs to enhance the language.

Lexer

Any symbol can be part of the word, except for the following special symbols:

  • whitespace, tab - used to delimit words
  • \r, \n, semicolon or EOF - used to delimit commands
  • Braces, square brackets, dollar sign - used for substitution and grouping

Partcl has special helper functions for these char classes:

static int tcl_is_space(char c);
static int tcl_is_end(char c);
static int tcl_is_special(char c, int q);

tcl_is_special behaves differently depending on the quoting mode (q parameter). Inside a quoted string braces, semicolon and end-of-line symbols lose their special meaning and become regular printable characters.

Partcl lexer is implemented in one function:

int tcl_next(const char *s, size_t n, const char **from, const char **to, int *q);

tcl_next function finds the next token in the string s. from and to are set to point to the token start/end, q denotes the quoting mode and is changed if " is met.

A special macro tcl_each(s, len, skip_error) can used to iterate over all the tokens in the string. If skip_error is false - loop ends when string ends, otherwise loop can end earlier if a syntax error is found. It allows to "validate" input string without evaluating it and detect when a full command has been read.

Data types

Tcl uses strings as a primary data type. When Tcl script is evaluated, many of the strings are created, disposed or modified. In embedded systems memory management can be complex, so all operations with Tcl values are moved into isolated functions that can be easily rewritten to optimize certain parts (e.g. to use a pool of strings, a custom memory allocator, cache numerical or list values to increase performance etc).

/* Raw string values */
tcl_value_t *tcl_alloc(const char *s, size_t len);
tcl_value_t *tcl_dup(tcl_value_t *v);
tcl_value_t *tcl_append(tcl_value_t *v, tcl_value_t *tail);
int tcl_length(tcl_value_t *v);
void tcl_free(tcl_value_t *v);

/* Helpers to access raw string or numeric value */
int tcl_int(tcl_value_t *v);
const char *tcl_string(tcl_value_t *v);

/* List values */
tcl_value_t *tcl_list_alloc();
tcl_value_t *tcl_list_append(tcl_value_t *v, tcl_value_t *tail);
tcl_value_t *tcl_list_at(tcl_value_t *v, int index);
int tcl_list_length(tcl_value_t *v);
void tcl_list_free(tcl_value_t *v);

Keep in mind, that ..._append() functions must free the tail argument. Also, the string returned by tcl_string() it not meant to be mutated or cached.

In the default implementation lists are implemented as raw strings that add some escaping (braces) around each iterm. It's a simple solution that also reduces the code, but in some exotic cases the escaping can become wrong and invalid results will be returned.

Environments

A special type, struct tcl_env is used to keep the evaluation environment (a set of functions). The interpreter creates a new environment for each user-defined procedure, also there is one global environment per interpreter.

There are only 3 functions related to the environment. One creates a new environment, another seeks for a variable (or creates a new one), the last one destroys the environment and all its variables.

These functions use malloc/free, but can easily be rewritten to use memory pools instead.

static struct tcl_env *tcl_env_alloc(struct tcl_env *parent);
static struct tcl_var *tcl_env_var(struct tcl_env *env, tcl_value_t *name);
static struct tcl_env *tcl_env_free(struct tcl_env *env);

Variables are implemented as a single-linked list, each variable is a pair of values (name + value) and a pointer to the next variable.

Interpreter

Partcl interpreter is a simple structure struct tcl which keeps the current environment, array of available commands and a last result value.

Interpreter logic is wrapped around two functions - evaluation and substitution.

Substitution:

  • If argument starts with $ - create a temporary command [set name] and evaluate it. In Tcl $foo is just a shortcut to [set foo], which returns the value of "foo" variable in the current environment.
  • If argument starts with [ - evaluate what's inside the square brackets and return the result.
  • If argument is a quoted string (e.g. {foo bar}) - return it as is, just without braces.
  • Otherwise return the argument as is.

Evaluation:

  • Iterates over each token in a list
  • Appends words into a list
  • If the command end is met (semicolor, or newline, or end-of-file - our lexer has a special token type TCMD for them) - then find a suitable command (the first word in the list) and call it.

Where the commands are taken from? Initially, a Partcl interpeter starts with no commands, but one may add the commands by calling tcl_register().

Each command has a name, arity (how many arguments is shall take - interpreter checks it before calling the command, use zero arity for varargs) and a C function pointer that actually implements the command.

Builtin commands

"set" - tcl_cmd_set, assigns value to the variable (if any) and returns the current variable value.

"subst" - tcl_cmd_subst, does command substitution in the argument string.

"puts" - tcl_cmd_puts, prints argument to the stdout, followed by a newline. This command can be disabled using #define TCL_DISABLE_PUTS, which is handy for embedded systems that don't have "stdout".

"proc" - tcl_cmd_proc, creates a new command appending it to the list of current interpreter commands. That's how user-defined commands are built.

"if" - tcl_cmd_if, does a simple if {cond} {then} {cond2} {then2} {else}.

"while" - tcl_cmd_while, runs a while loop while {cond} {body}. One may use "break", "continue" or "return" inside the loop to contol the flow.

Various math operations are implemented as tcl_cmd_math, but can be disabled, too if your script doesn't need them (if you want to use Partcl as a command shell, not as a programming language).

Building and testing

All sources are in one file, tcl.c. It can be used as a standalone interpreter, or included as a single-file library (you may want to rename it into tcl.h then).

Tests are run with clang and coverage is calculated. Just run "make test" and you're done.

Code is formatted using clang-format to keep the clean and readable coding style. Please run it for pull requests, too.

License

Code is distributed under MIT license, feel free to use it in your proprietary projects as well.

partcl's People

Contributors

skrasser avatar zserge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

partcl's Issues

Segmentation Faults 2017-06-06

Hello, I was using American Fuzzy Lop (afl-fuzz) to fuzz input to the tcl program on Linux. Is fixing the crashes from these input files something you're interested in? The input files can be found here: https://github.com/rwhitworth/partcl-fuzz/tree/master/2017-06-06

The files can be executed as ./tcl id_filename to cause the issues. This was tested against git commit 2f03722

Let me know if I can provide any more information to help narrow down this issue.

gdb backtraces:

id:000000,sig:11,src:000000,op:havoc,rep:32

[New LWP 9217]
Core was generated by `/root/partcl/tcl'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  tcl_next (s=<optimized out>, n=<optimized out>, from=0x7fff9e051940, to=0x7fff9e051948, q=0x7fff9e051960) at tcl.c:39
39        for (; !*q && n > 0 && tcl_is_space(*s); s++, n--)
#0  tcl_next (s=<optimized out>, n=<optimized out>, from=0x7fff9e051940, to=0x7fff9e051948, q=0x7fff9e051960) at tcl.c:39
#1  0x0000000000405cdf in main () at tcl.c:622

id:000033,sig:08,src:000226,op:havoc,rep:2

[New LWP 20091]
Core was generated by `/root/partcl/tcl'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0  0x00000000004056a8 in tcl_cmd_math (tcl=0x7ffd0ce5f648, args=<optimized out>, arg=<optimized out>) at tcl.c:532
532         c = a / b;
#0  0x00000000004056a8 in tcl_cmd_math (tcl=0x7ffd0ce5f648, args=<optimized out>, arg=<optimized out>) at tcl.c:532
#1  0x0000000000403ca4 in tcl_eval (tcl=0x7ffd0ce5f648, s=<optimized out>, len=<optimized out>) at tcl.c:350
#2  0x0000000000405e14 in main () at tcl.c:627

id:000036,sig:06,src:000248,op:havoc,rep:8

Core was generated by `/root/partcl/tcl'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f0383e62067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  0x00007f0383e62067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f0383e63448 in __GI_abort () at abort.c:89
#2  0x00007f0383ea01b4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f0383f92cb3 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f0383f25aa7 in __GI___fortify_fail (msg=msg@entry=0x7f0383f92c4a "buffer overflow detected") at fortify_fail.c:31
#4  0x00007f0383f23cc0 in __GI___chk_fail () at chk_fail.c:28
#5  0x00007f0383f230fc in __strncat_chk (s1=<optimized out>, s2=<optimized out>, n=<optimized out>, s1len=<optimized out>) at strncat_chk.c:37
#6  0x0000000000402f90 in strncat (__dest=0x7ffe95166110 "set ", '%' <repeats 191 times>, "\027%%%%"..., __src=0x6d0b89 '%' <repeats 191 times>, "\027%%%%%%%%"..., __len=1317) at /usr/include/x86_64-linux-gnu/bits/string3.h:150
#7  tcl_subst (tcl=0x7ffe95166398, s=<optimized out>, len=<optimized out>) at tcl.c:298
#8  0x00000000004032ab in tcl_eval (tcl=0x7ffe95166398, s=<optimized out>, len=<optimized out>) at tcl.c:324
#9  0x0000000000405e14 in main () at tcl.c:627

Issues I found during fuzzing

I ran AFL fuzzer on your program and found pleny of crashes. But, mostly that happened because of passing NULL pointer to function such as strcmp. Most of the crashes happened at line 272 when strcmp is called. I fixed the problem by checking whether passing arguments to the tcl_var() are empty or not. For example, in tcl_cmd_set() function, you called tcl_var(). You can simply solve the problem by checking if var pointer is empty or not and then return FERROR if it is empty.

Division by Zero

After second round of fuzzing, I realized that you do not check division by zero. It is in tcl_cmd_math() function. You can simply add an if statement and solve the problem.
Additionally, I suggest that you check arguments of math operation before passing it to tcl_int().
Because you used atoi() function in tcl_int() and it basically convert the digits from the beginning of the string until it reaches any non-digit character. That means, you can pass "22sdfe" (which gives 22) or even "sdfdsf" (which gives 0) and it will work. But, it is better to show an error ("?!" string in your app) to let the user know that.?!

Possible memory leaks

Hello,
first thanks for this library. I plan to use it as scripting language inside my STM32 project.

As MCUs are short on memory and generally have problems with alloc/free I'm using memory manager inside statically allocated array. This allows me to see how the memory is used or freed after tcl_eval() and tcl_destroy(). I found one problem which I yet do not understand. When I try to run this tcl:
= y";
I receive lexer error of course, but I see 2 not freed fragment inside memory:

=
y

I tried to add debug messages to all function that use tcl_malloc() or tcl_realloc, but I did not find any match to pointer address.

I quess there can be some addition to tcl_next(), somewhere after :

  } else if (*s == '"') {
    *q = !*q;
    *from = *to = s + 1;
    if (*q) {

to check for quote being closed. But it would be just to indicate to user what is the problem.

Not I'm not sure where it gets allocated, I have no idea how to free it.
Thanks for any advice.

Tcl on steroids (i.e. merge between Tcl and REBOL)

I like Tcl a lot and would want to have it more widespread. The only thing which I find rather unfortunate is that Tcl treats everything as string by default.

This property has one side-effect - the amount of generic optimization (i.e. not by implementing some commands in native code instead of Tcl itself) has a limit which is unfortunately quite low considering current HW (even "embedded" one).

After I read your blog post https://zserge.com/posts/tcl-interpreter/ where you @zserge encourage readers to come over here with contributions I decided to express an idea I have in my head for quite some time already.

I envision a Tcl-like language which doesn't build upon strings, but directly AST (abstract syntax tree). The ideal is merging Tcl with REBOL. Why not REBOL directly? Because REBOL (and its current successor Red) tries to get it to an extreme leading to some very very opinionated (in contrast to practical & fair-minded) decisions.

There is also Spry trying to get away from those opinionated decisions of REBOL/Red, but it lacks development force for many years ๐Ÿ˜ข (and thus suffers from some long-standing issues as well as the "missing delimiter issue" (where all functions/methods/commands/... have always fixed number of arguments which one has to remember because there is nothing like a delimiter in the language at all)).

I actually think that one could make "something like Spry", but even smaller by merging in some cool ideas from Tcl languages. Using some clever mutable tree structure which is cache oblivious could make such language really fast (currently the state of the art seems to be MinWEP ; naive pointer-based trees or the good old B-trees are by far insufficient nowadays and here is why).

Thoughts?

Program received signal SIGSEGV, Segmentation fault

I found a code snippet which results in a segfault of the interpreter

#define TEST
#include "tcl.c"

#define TCL_BENCH "set i 0; while {< $i 1} {if {== [- $i] -1} {} {set i [+ $i 1]}}"

int main() {
    struct tcl tcl;
    tcl_init(&tcl);

    tcl_eval(&tcl, TCL_BENCH, sizeof(TCL_BENCH));

    tcl_destroy(&tcl);

    return 0;
}

Typo in README

The second sentence in the Language syntax paragraph has a typo : )

Implement TIP #440

Hey there! Could you implement tcl_platform(engine) from TIP #440 to make it possible for Tcl code to tell if it's running in ParTcl?

P.S.: It's always cool to see another tiny Tcl implementation. I've add yours to the list at https://tcl.wiki/Small%20Tcl.

Idea- Better implementation of list vars?

Taking inspiration from the definition of the object in uLisp you could use a C union for tcl_var to be able to represent linked lists as more tcl_vars:

 struct tcl_var {
   tcl_value_t *name;
+  union {
     tcl_value_t *value;
+    struct tcl_var *values; // Pointer to nested list or hashmap
+  }
+  char type; // Flag to determine which pointer in the union to take
   struct tcl_var *next;
 };

The name would hold a string of the index (possibly in hexadecimal, to speed array lookup).

Now that a var can point to another var, this could also lead to hashmaps (name being the key string etc) and nested data structures.

Does this sound like a possibility?

UTF-8

Can I use UTF-8 variable, string and comments?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.