meithecatte / miniforth Goto Github PK

View Code? Open in Web Editor NEW

116.0 9.0 9.0 317 KB

A bootsector FORTH

License: MIT License

Assembly 8.89% Python 10.09% Nix 0.09% Forth 80.69% Shell 0.23%

forth bootsector bootstrappable

miniforth's Introduction

miniforth

miniforth is a real mode FORTH that fits in an MBR boot sector. The following standard words are available:

- ! @ c! c@ dup swap u. >r r> : ; load

Additionally, there are two non-standard words.

| switches between interpreting and compilation, performing the roles of both [ and ].
s: ( buf -- buf+len ) will copy the rest of the current input buffer to buf, and terminate it with a null byte. The address of said null byte will be pushed onto the stack. This is designed for saving the code being ran to later put it in a disk block, when no block editor is available yet.

The dictionary is case-sensitive. If a word is not found, it is converted into a number with no error checking. For example, g results in the decimal 16, extending the 0123456789abcdef of hexadecimal. On boot, the number base is set to hexadecimal.

Backspace works, but not how you're used to — the erased input will be still visible on screen until you write something else.

Various aspects of this project's internals are described in detail on my blog.

Trying it out

You can either build a disk image yourself (see below), or download one from the releases page.

When Miniforth boots, no prompt will be shown on the screen. However, if what you're typing is being shown on the screen, it is working. You can:

do some arithmetic:

7 5 - u.
: negate  0 swap - ;
: +  negate - ;
69 42 + u.

load additional functionality from disk: 1 load (see Onwards from miniforth below).

Building a disk image

You will need yasm and python3, which you can obtain with nix-shell or your package manager of choice. Then run ./build.sh.

This will create the following artifacts:

build/boot.bin - the built bootsector.
build/uefix.bin - the chainloader (see below).
miniforth.img - a disk image with the contents of block*.fth installed into the blocks.
build/boot.lst - a listing with the raw bytes of each instruction. Note that the dd 0xdeadbeef are removed by scripts/compress.py.

The build will print the number of used bytes, as well as the number of block files found. You can run the resulting disk image in QEMU with ./run.sh, or pass ./run.sh build/boot.bin if you do not want to include the code from *.fth in your disk. QEMU will run in curses mode, exit with Alt + 2, q, Enter.

Blocks

load ( blk -- ) loads a 1K block of FORTH source code from disk and executes it. All other block operations are deferred to user code. Thus, after appropriate setup, one can get an arbitrarily feature-rich system by simply typing 1 load — see Onwards from miniforth below.

Each pair of sectors on disk forms a single block. Block number 0 is partially used by the MBR, and is thus reserved.

System variables

Due to space constraints, variables such as STATE or BASE couldn't be exposed by creating separate words. Depending on the variable, the address is either hardcoded or pushed onto the stack on boot:

>IN is a word at 0x7d00. It stores the pointer to the first unparsed character of the null-terminated input buffer.
The stack on boot is LATEST STATE BASE HERE #DISK (with #DISK on top).
STATE has a non-standard format - it is a byte, where 0 means compiling, and 1 means interpreting.
#DISK is not a variable, but the saved disk number of the boot media

`-DAUTOLOAD`

For some usecases, it might be desirable for the bootsector to load the first block of Forth code automatically. You can achieve this by building boot.bin with -DAUTOLOAD. Unfortunately, this requires 7 more bytes of code, making miniforth go over the threshold of 446 bytes, which makes it no longer possible to put an MBR partition table in the boot sector.

The partition table is required for:

the filesystem code in blocks/filesystem.fth
booting in the BIOS compatibility mode of most UEFI implementations, due to a common bug/misfeature.

To work around this, scripts/mkdisk.py will use the small chainloader in uefix.s when it detects that miniforth is larger than 446 bytes. Instead of the default disk layout, which looks like this:

LBA 0   - boot.bin
LBA 1   - unused
LBA 2-3 - Forth block 1
...       ...

...the disk image will looks as follows:

LBA 0   - uefix.bin
LBA 1   - boot.bin
LBA 2-3 - Forth block 1
...       ...

Onwards from miniforth

The main goal of the project is bootstrapping a full system on top of Miniforth as a seed. Thus the repository also contains various Forth code that may run on top of Miniforth and extend its capabilities.

In blocks/bootstrap.fth (1 load):
- A simple assembler is implemented, and then used to implement additional primitives, which wouldn't fit in Miniforth itself. This includes control flow words like IF/THEN and BEGIN/UNTIL, as well as calls to the BIOS disk interrupt to allow manipulating the code on disk.
  
  For the syntax of the assembler, see No branches? No problem — a Forth assembler.
- Exception handling is implemented, with semantics a little different from standard Forth. See Contextful exceptions with Forth metaprogramming.
- A separate, more featureful outer interpreter overrides the one built into Miniforth, to correct the ugly backspace behavior and handle things such as uncaught exceptions and vocabularies.
In blocks/grep.fth (2f load), a way of searching for occurences of a particular string in the code stored in the blocks is provided:
- 10 20 grep create searches blocks $10 through $20 inclusive for occurences of create
- If your search term includes spaces, use grep" — the syntax is similar to s" string literals: 10 20 grep" : >number"
In editor.fth (30 load), a vi-like block editor is implemented. It can be started with e.g. 10 edit to edit block 10.
- Non-standard keybindings:
  - Q to quit back to the Forth REPL.
  - [ to look at the previous block.
  - ] to look at the next block.
- After first use, you can use the shorthand ed to reopen the last-edited block.
- Use run to execute the last-edited block. This sets a flag to prevent a chain of --> from loading all the subsequent blocks.
- Changes are saved to disk whenever you use run or open a different block with edit or the [/] keybinds. You can also trigger this manually with save.
In filesystem.fth (50 load), there's support for a simple filesystem, which is currently hardcoded to be in the first partition listed in the MBR. Some limits are lower than you might expect, but for the purposes I'm interested in, they shouldn't become a problem:
- Partition size: up to 128 MiB
- File size: 8184 KiB
One file can be open at a time. Directories are supported, but there isn't any path parsing. For user-level file manipulation:
- ls ( -- ) will print the list of files in the current directory.
- chdir ( name len -- ) will enter a subdirectory.
- .. ( -- ) will go back to the parent directory.
- mkdir ( name len -- ) will create a directory.
- exec ( name len -- ) will execute the contents of a file as Forth.
- rm ( name len -- ) will delete a file.
- rmdir ( name len -- ) will delete an empty directory. Recursive delete is not implemented yet. For writing programs involving files:
- fopen ( name len -- ) will open an existing file, or throw an exception if it doesn't exist.
- fopen? ( name len -- t|f ) will instead return a boolean indicating whether the file could be found.
- fcreate ( name len -- ) will create a new file, or if it already exists, truncate it to 0 bytes. The new file is opened.
- fread ( buf len -- ) will read data starting at the current position

All this code was originally developed within Miniforth itself, which meant it was stored within a disk image — a format that's not very friendly to tooling like Git or GitHub's web interface. This disparity is handled by two Python scripts:

scripts/mkdisk.py takes the files and merges them into a bootable disk image;
scripts/splitdisk.py extracts the code from a disk image's blocks and splits it into files.

Free bytes

At this moment, not counting the 55 AA signature at the end, 445 bytes are used, leaving 1 byte for any potential improvements.

Byte saving leaderboard:

Ilya Kurdyukov saved 24 bytes. Thanks!
Peter Ferrie saved 5 bytes. Thanks!
An article by Sean Conner allowed me to save 2 bytes. Thanks!

If a feature is strongly desirable, potential tradeoffs include:

7 bytes: Don't push the addresses of variables kept by self-modifying code. This essentially changes the API with each edit (NOTE: it's 7 bytes because this makes it beneficial to keep >IN in the literal field of an instruction).
?? bytes: Instead of storing the names of the primitives, let the user pick their own names on boot. This would take very little new code — the decompressor would simply have to borrow some code from :. However, reboots would become somewhat bothersome.
?? bytes: Instead of providing ; in the kernel, give a dictionary entry to EXIT and terminate definitions with \ |untilimmediateand;` can be defined.

miniforth's People

Contributors

Stargazers

Watchers

Forkers

richard-lyman rigidus sirocyl merlin77777 dfischer 01luna thamesynne peterferrie eventi

miniforth's Issues

Image releases?

Have you thought about using github releases to publish assembled disk image? I wanted to try it out, but setting up build environment on windows is kinda difficult. I will try setting up a linux vm, but having it prebuilt would be nice.

P.S. I've really enjoyed reading your articles. They motivate me to work on my (tangentially related) project :)

can you use rv32i as the target ISA for demo?

this project is super cool, but x86 assembly is a huge and complex.
is it possible to use riscv assmbly to rewritten this so people who dont know assembly lang before could understand(by a crash course) and extend it easily?

How does UDOT handle space?

In your blogpost you mention:

The space is printed by pushing a fake "digit" that will get converted into a space.

but if I read correctly https://github.com/NieDzejkob/miniforth/blob/2842b9a303c07d8d8ace8dc890c531d55866348e/boot.s#L326-L345

You go:

from " " - "0" = 32 - 48 = -16 https://github.com/NieDzejkob/miniforth/blob/2842b9a303c07d8d8ace8dc890c531d55866348e/boot.s#L328
to -16 + "0" = 32 https://github.com/NieDzejkob/miniforth/blob/master/boot.s#L337
to 32 + "A" - "0" - 10 = 32 + 65 - 48 - 10 = 39 ≠ 32 https://github.com/NieDzejkob/miniforth/blob/2842b9a303c07d8d8ace8dc890c531d55866348e/boot.s#L340

Am I missing something or should "A" - "0" - 10 be "A" - "0" - 17 ?

qemu seems to hang on `Booting from Hard Disk...`

Hi.

Neat project, thanks!

Let's say I started here...

sebboh@debian:~/prj/miniforth$ git status && rm -v boot.bin boot.lst disk.img raw.bin && \
yasm -f bin boot.s -o raw.bin -l boot.lst && python3 compress.py && python3 mkdisk.py
On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        boot.bin
        boot.lst
        disk.img
        raw.bin

nothing added to commit but untracked files present (use "git add" to track)
removed 'boot.bin'
removed 'boot.lst'
removed 'disk.img'
removed 'raw.bin'
504 bytes used
Found 6 block files
sebboh@debian:~/prj/miniforth$

Then, I invoke run.sh.

I get this screen and put some input in but I do not get output when I hit enter:

SeaBIOS (version 1.14.0-2)


iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+xxxxxxxx+xxxxxxxx xxxx



Booting from Hard Disk...
1 1 +

So, I attempt to gather some information about what it is doing. I hit alt+2 and type a command:

compat_monitor0 console
QEMU 5.2.0 monitor - type 'help' for more information
(qemu) xp /20xi 0
0x00000000:  53                       pushw    %bx
0x00000001:  ff 00                    incw     (%bx, %si)
0x00000003:  f0                       .byte    0xf0
0x00000004:  53                       pushw    %bx
0x00000005:  ff 00                    incw     (%bx, %si)
0x00000007:  f0                       .byte    0xf0
0x00000008:  c3                       retw
0x00000009:  e2 00                    loop     0xb
0x0000000b:  f0                       .byte    0xf0
0x0000000c:  53                       pushw    %bx
0x0000000d:  ff 00                    incw     (%bx, %si)
0x0000000f:  f0                       .byte    0xf0
0x00000010:  53                       pushw    %bx
0x00000011:  ff 00                    incw     (%bx, %si)
0x00000013:  f0                       .byte    0xf0
0x00000014:  54                       pushw    %sp
0x00000015:  ff 00                    incw     (%bx, %si)
0x00000017:  f0                       .byte    0xf0
0x00000018:  53                       pushw    %bx
0x00000019:  ff 00                    incw     (%bx, %si)
(qemu)

Seems repetitive! Hm, I don't know many qemu monitor commands nor how to interpret the output of this one. @NieDzejkob or anyone, I'd be happy to gather more information and paste it here if you tell me how. Or if enough information is present already, then let's call this a bug report? :)

Cheers, thank you,
--sebboh

Word-wrap long lists.

Some other Forth implementations (OF, for instance) will determine if a word in a wide list print function (e.g., words) is going to run off the end of the screen or terminal, and if it's shorter than the width of the screen, insert a cr before printing the next one.

Running words on a freshly-loaded miniforth prints the words directly in one line, wrapping and breaking the words when it encounters the end of the screen. It's readable, but not very pretty.

Feature request: automatic `load` on start.

Another tricky one, but the premise is simple - load and execute one Forth block without any user intervention on boot.

There's quite a bit that could be done with this, including making the environment feel more like an operating system proper.

If the sector it loads is empty, it does nothing, and you're back to the interpreter, as usual.
If this behavior is not desirable, the code could possibly be commented out/removed prior to build.

I noticed that 80 (the current disk number) is on the stack when the interpreter is presented - meaning a load command with no arguments will do 80 load.
So, I use this in a bit of a hack, where I renamed the bootsector load word to forth; at block 80, is only 80 1 forth; and in block 1, I had changed the load word copy, so that it copies forth over to load.
Now, you only type forth to start the rest of the system. :)

Feature request/suggestion: `load` floppy/8088 BIOS compatibility (CHS support/LBA translation)

okay, so I get that this is possibly a big one here. I'm mostly posting this here to track it for myself, because I might implement this independently (perhaps in a branch or fork).

Essentially, I'd like to use CHS-type addressing to load a block from the floppy disk in the bootloader, rather than the LBA model; using the int13h AH=02 function rather than AH=42, and disk ID DL=00 (FDD 0) instead of DL=80 (HDD 0).

I'd be entirely ignoring/sero-setting C/H until the initial few blocks of forth code are loaded in, to save ASM space. Most floppies have 8, 9 or 18 sectors per track, but I'm sure LBA calculation could even be done in the first 1 or 2 sectors, in forth, along with 'extended' disk access; and load can be rewritten for such a thing on-the-fly. It might also save a tiny bit of space, since a 'packet' for the extended disk protocol is no longer necessary in the bootloader.

One reason for this, is to enable this forth environment on vintage and hobbyist kit computers. I couldn't get it done in time for the VCF East event (https://vcfed.org), but I wanted to create a booter floppy which has a text adventure game in it, with a minimal kernel of code for strings/formatting, saving game progress, and reading/evaluating user commands to advance the plot. Forth, and especially a minimal one like this, is best suited for such a thing.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.