Coder Social home page Coder Social logo

udem-dlteam / pnut Goto Github PK

View Code? Open in Web Editor NEW
368.0 7.0 9.0 2.03 MB

A Self-Compiling C Transpiler Targeting Human-Readable POSIX Shell

Home Page: https://pnut.sh

License: BSD 2-Clause "Simplified" License

Shell 39.84% C 49.76% Scheme 9.23% Makefile 0.50% Python 0.66%

pnut's Introduction

🥜 Pnut: A Self-Compiling C Transpiler Targeting Human-Readable POSIX Shell

Pnut compiles a reasonnably large subset of C99 to human-readable POSIX shell scripts. It can be used to generate portable shell scripts without having to write shell.

It's main uses are:

  • As a transpiler to write portable shell scripts in C.
  • As a way to bootstrap a compiler written in C with an executable version that is still human readable (See reproducible builds).

Main features:

  • No new language to learn -- C code in, shell code out.
  • The human-readable shell script is easy to read and understand.
  • A runtime library including file I/O and dynamic memory allocations.
  • A preprocessor (#include, #ifdef, #define MACRO ..., #define MACRO_F(x) ...).
  • Integrates easily with existing shell scripts.

The examples directory contains many examples. We invite you take a look!

Other than being able to compile itself, Pnut can also compile the Ribbit Virtual Machine which can run a R4RS Scheme Read-eval-print loop directly in shell. See repl.sh for the generated shell script.

Install

Pnut can be distributed as the pnut.sh shell script, or compiled to executable code using a C compiler.

To install pnut:

> git clone https://github.com/udem-dlteam/pnut.git
> cd pnut
> sudo make install

This installs both pnut.sh and pnut in /usr/local/bin.

Compilation options

Certain compilation options can be used to change the generated shell script:

  • -DRT_COMPACT reduces the size of the runtime library at the cost of reduced I/O performance.
  • -DSH_SAVE_VARS_WITH_SET reduces the overhead of local variables at the cost of readability. This can reduce the execution time of certain programs by more than 50%.
  • -DSH_INCLUDE_C_CODE includes the original C code in the generated shell script.

They can be set using make install BUILD_OPT="...".

How to use

The pnut compiler takes a C file path as input, and outputs to stdout the POSIX shell code.

Here's an example of how to compile a C file using Pnut:

> pnut.sh examples/fib.c > fib.sh # Compile fib.c to a shell script
> chmod +x fib.sh                 # Make the shell script executable
> ./fib.sh                        # Run the shell script

Mixing C and shell code

The #include_shell "{file.sh}" directive can be used to include shell code in the generated shell script. This makes it possible to call system utilities from C code, or to use shell scripts generated by Pnut as a library. See select-file.c and posix-utils.sh for how to use this feature.

Which shell to use

Because Pnut generates purely POSIX shell code, the generated shell scripts can be run on any POSIX compliant shell. However, certain shells are faster than others. For faster scripts, we recommend the use of ksh, dash or bash. zsh is also supported but tends to be slower on large programs.

Reproducible builds

Because Pnut can be distributed as a human-readable shell script (pnut.sh), it can serve as the basis for a reproducible build system. With a POSIX compliant shell, pnut.sh is sufficiently powerful to compile itself and, in the future, to bootstrap TCC. Because TCC can be used to compile GCC, this will make it possible to bootstrap a fully featured build toolchain from only human-readable source files and a POSIX shell.

Because pnut.sh cannot support certain C features used by TCC, Pnut features a native code backend that supports a larger subset of C. We call this compiler pnut-exe, and it can be compiled using pnut.sh. The work to make pnut-exe compatible with TCC is ongoing.

Once pnut-exe supports a large enough subset of C99 to compile TCC, the following steps will be taken to bootstrap TCC from pnut.sh:

  1. Compile pnut-exe.c to pnut-exe.sh using pnut.sh. pnut-exe.sh is a shell script that turns C code into machine code.
  2. Compile pnut-exe.c to pnut-exe using pnut-exe.sh. This version of pnut-exe is an executable and is much faster.
  3. Compile TCC using pnut-exe.

Limitations

Unfortunately, certains C constructs don't map nicely to POSIX shell which means:

  • No support for floating point numbers and unsigned integers.
  • goto and switch fallthrough are not supported.
  • The address of (&) operator on local variables is not supported.

Known issues

  • The preprocessor is not perfect and may fail on some edge cases. #if and #elif are not supported. #include <...> are ignored.
  • All local variable declarations must be at the beginning of a function.
  • Aggregate types (arrays and structures) cannot be stack-allocated, passed by value or nested in a structure.
  • do { ... } while(...) is not supported at the moment.

Contributing

Pnut is a research project and contributions are welcome. Please open an issue to report any bugs or to discuss new features.

To make sure your changes are good, a good practice is to attempt the bootstrap of pnut.c using pnut.sh. This can be done using the ./bootstrap-pnut.sh command. Using ksh, this should take around 30s.

pnut's People

Contributors

feeley avatar kraft-cheese avatar laurenthuberdeau avatar leo-ard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pnut's Issues

Documentation for supported shell versions

I've being playing around with pnut on various versions of bash/ksh. I have found that with very old versions pnut.sh will error out (versions from the 90s). With fairly old versions of bash/ksh (early 2000s like bash 2.05) they often work so far, but stop compiling pnut.c part way through. I'm assuming I'm hitting some internal limits within the shell. The earliest version of ksh I have had success with is ksh93u-20120801. I managed to build that version of ksh on an old slackware-8.1 vm.

Improvement ideas

Some ideas on how to improve the project:

New features

  • Support do {...} while (...); statements.
  • Support variadic functions.
  • Support & on local and global variables in Shell backend.
  • Generate executables compatible with macOS and Windows. Bonus points if they are αcτµαlly pδrταblε εxεcµταblε.
  • ARM/RISC-V backends.

Code quality / Tech debt

  • Unify the way the environment is tracked in the shell and exe backends.
  • Add a CI/CD pipeline that tests Pnut on all supported platforms and makes releases for them.
  • Improve errors produced by Pnut.

Other

  • Improve README.md.
  • Make a landing page for Pnut on pnut.sh.
  • Add more examples.

Building tcc with pnut

In the readme file it indicates that pnut.exe is capable of building tcc. I can't seem to figure out how to actually do that though. Can you please advise whether this is possible? Simply passing --cc=pnut.exe to the tcc configure script won't work:

configure: error: 'pnut.exe' failed to compile conftest.c

If I try this (which works with gcc after the configure script has been run):

pnut.exe tcc.c

I get the following error:

codegen_glo_decl: unexpected declaration

I'm using a stock tcc 0.9.27 release from 2017 (rather than a random version of their mob branch). Same issue when i use the i386 backend.

Printf "-=" is recognized as an option and not a format string

Bug

➜  ksh -E 'printf "-="' 
ksh: printf: -=: unknown option
Usage: printf [ options ] format [string ...]

Escaping the leading - makes it print -= as expected:

➜  ksh -E 'printf "\-="'
-=%

But not on dash, which prints the \:

➜  dash -c 'printf "-="' 
dash: 1: printf: Illegal option -=
➜ dash -c 'printf "\-="'
\-=%

Top level characters are not supported

Context

The following program crashes when compiled by pnut-sh. This is because the corresponding character ident is not yet defined. There are a few options to solve this:

  1. Generate all globals at the end, after the runtime library and character literal constants. This changes the structure of the code and globals are often declared beside functions using them so we probably don't want that.
  2. Generate the character literal constant before defining the global variable. This means character constants can be defined in many places and doesn't look nice.
  3. Use the numeric value instead of the character literal constant. Maybe add a comment beside it indicating what the ascii code correspond to.
char chr = 'a';

void main() {
  return;
}

Deduplicate strings passed to defstr makes pnut slower

Context

In the shell backend, the variables passed to defstr are allocated sequentially, even when the same string is used multiple times.

The laurent/deduplicate_defstr_strings branch implements sharing of string variables for identical strings. This requires interning strings like we do for identifiers (using the same table) which slows down tokenizing of identifiers and strings. This is because there are more conflicting entries in the hash table which results in linear probing. This results in a slower bootstrap for a minor benefit in code quality (and even then it's debatable since it can make it harder to associate string and string variables and moves pnut away from being single pass).

There seems to be a few options:

  • Using a larger hash table seems to help (even with a modest increase with HASH_PARAM = 1026, HASH_PRIME = 1009), but it's unclear what size we should use. For larger programs, we'd probably want a larger table.
  • Use a different hashing algorithm.

This is low priority, so creating a ticket to dump the progress on this problem.

Ksh allows variables to contain negative 0 (-0) which breaks equality

Because we use =/!= instead of -eq/-ne to compare numbers in conditions, having variables containing -0 breaks this optimization and causes conditions to misfire. The bug doesn't happen when using the variable containing -0 in an arithmetic expansion since -0 is reduced to 0, so it only happens in conditions

See #73 for instructions on how to reproduce the bug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.