Note that this issue is about assembler syntax, functionality and ergonomics; it is, g

One useful feature of ca65 is the <a href="https://cc65.github.io/doc/ca65.html#.ASIZE

<a href="https://www.nesdev.org/wiki/Synthetic_instructions" rel="nofollo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

For SNES development, the best debugger available (<a href="https://mesen.ca/" rel="no

[Assembler] Improved ergonomics for 65816 (and other) subtargets,about llvm-mos/llvm-mos

Comments (14)

JohnMBrooks commented on September 25, 2024 1

I am not a seasoned 65816 developer, so I'd like feedback from people who are so that they can let us know what features they benefit from, what features they don't consider necessary, and what else could be added that I might have missed.

I was a professional assembly programmer for Apple IIe and IIGS software in the 1980s and 1990s, and currently maintain the Apple II ProDOS operating system (all assembly).

The bulk of legacy assembly source code for the IIGS is in Merlin format, as Merlin was the dominant assembler for Apple IIe developers who upgraded to the GS. The Orca/APW assembler and MPW IIGS cross-assembler were also popular with professional 65816 programmers and have quite a bit of 65816 assembly source in use in the current dev community.

Your draft of proposed changes looks sound and should cover the main 65816 addressing mode issues.

Q:
Can you clarify what assembler ergonomic issues you would like feedback on?
Are you hoping to attract current 65816 assembly developers?
Is the goal to allow compiler-generated 65816 assembly source to be used as a starting point for a full-assembly application?
Is the goal to allow existing 65816 code be converted to the llvm-mos assembler for integration with llvm compiled code?

Here are a some thoughts as an old-school 65816 assembly programmer (random order):

I prefer Merlin's "mx %00" syntax to the multi-line 'longa', 'longi' approach.
Code relocation: A big problem with many 65xx assemblers is when code is ORGd at multiple locations, but assembled at another address (due to runtime relocation or code patching). This often confuses an assembler's addressing mode decisions, and is one reason why Merlin was more popular than other assemblers.
Local labels: Merlin allows named local labels in functions and macros, which is often a weakness of other assemblers.
Macros: 65816 assembly programmers (like w/6502 asm) would rely on a small library of helper macro routines. Macros are a minefield of nonstandard implementations from one assembler to another, but are a key ergonomic feature.
Support for ProDOS API calls and GS toolbox API calls (usually via macros).
Support for OMF relocatable file format (likely via translation from elf?) as there are several tools and loaders for 65816 OMF as used in GS/OS. Key features include loading into specific banks (usually bank 0, 1, $E0, or $E1), and alignment.
For hand-coded assembly, Merlin's LUP psuedo-op is frequently used to unroll code and generate data tables.
A way to map a sequence of labels to memory by size (DUM pseudo-op), akin to C struct or enum
CYC ON to display the 65816 cycle count of each assembled instruction and total cycles per loop or subroutine

Hope that helps,
-JB

from llvm-mos.

undisbeliever commented on September 25, 2024 1

One useful feature of ca65 is the .asize and .isize pseudo variables, which output the corresponding register size.

I have found .asize and .isize useful in two situations.

Creating synthetic instruction macros that work in with both .a8 and .a16.

;; Arithmetic shift right
;; IN: A
;; OUT: A
.macro asr
    .if .asize = 8
        cmp     #$80
        ror
    .else
        ; 16 bit A
        cmp     #$8000
        ror
    .endif
.endmacro

Adding static asserts to ensure macros are called with the correct .asize and/or .isize.
- Example: An assert that confirms an INC32 macro is invoked with .a16
- Example: An assert that confirms a user supplied VBlank macro exits with 8 bit A

I do not know how easy this would be to implement (or even if llvm-mos's assembler supports pseudo variables), so I would understand if this is infeasible.

I do believe something like .asize and .isize (or standardised macro asize/isize tagging and warnings if pseudo variables are not possible) would be useful for creating a synthetic instruction macro pack.

Expanding on NovaSquirrel's comment about prefixes in ca65. The zeropage, absolute and far prefixes can also be applied when a symbol is imported and exported.

However, this prefix is global and a warning can occur if there is an address size mismatch between the export and import statements.

a.s

.export zpVar   : zp   := $12
.export absVar  : abs  := $801234
.export farVar  : far  := $801234

b.s

.importzp zpVar
.import   absVar : abs
.import   farVar : far


    lda     zpVar
    lda     absVar
    lda     farVar

Would assemble to:

8084B3 A5 12         lda $12 [$000012] = $FF         A:8000 X:005F Y:0000 S:0100 D:0000 DB:7E P:nvMxdIZc V:224 H:316 Fr:1160
8084B5 AD 34 12      lda $1234 [$7E1234] = $00       A:80FF X:005F Y:0000 S:0100 D:0000 DB:7E P:NvMxdIzc V:224 H:322 Fr:1160
8084B8 AF 34 12 80   lda $801234 = $00               A:8000 X:005F Y:0000 S:0100 D:0000 DB:7E P:nvMxdIZc V:224 H:329 Fr:1160

I'm wondering if llvm-mos could track the .directpageat, .databankat, asize and isize values at the subroutine entry, return position (RTS/RTL) and caller (JSR/JSR)? Possibly have llvm-mos emit a warning if there is a mismatch?

Again, I will completely understand if this is infeasible.

I only ask because it sounds like you need to track asize and isize for the llvm-objdump disassembly and there might be some llvm feature (that I do not know about) that could make this easy to implement.

In my experience with SNES 65816 assembly, I've encountered a lot of crashes that were caused by a memory/index register size mismatch between the assembler/code and CPU (usually caused by modifying code that involves branches
or misremembered optimised code). They are mostly caught in testing or by proof-reading, but occasionally one slips through and makes it into the committed code.

Same goes for Data Bank and Direct Page register mismatches.

from llvm-mos.

NovaSquirrel commented on September 25, 2024

I think it's important to consider what different syntax styles mean for macros. The 6502 family is pretty minimal when it comes to what operations it supports, and I find it really helpful to be able to use macros that fill in the "gaps" in the instruction set. ca65 seems to agree with that, coming with built-in macro packs you can turn on that add fake instructions for things like conditional jumps and add-without-carry. See also: https://www.nesdev.org/wiki/Synthetic_instructions

And with these kinds of macros taken into account, I think it's important to communicate sizes with directives or with operand prefixes (or similar syntax) instead of 68000-style mnemonic suffixes. ca65's add macro just looks like this:

.macro  add     Arg1, Arg2
        clc
        .if .paramcount = 2
                adc     Arg1, Arg2
        .else
                adc     Arg1
        .endif
.endmacro

and because of ca65's design decisions, this simple macro can cleanly handle both accumulator sizes as well as addresses of any size.

from llvm-mos.

Dizzy611 commented on September 25, 2024

I'm not sure I quite understand why prefixes are better than suffixes when it comes to things like the macro provided above? It would seem to me like either would make sense and lead to equal ability to cleanly handle a macro like that, but I guess I'm probably not understanding.

Overall I'd say I really like the idea of the explicit mos24() cast, and I'm also kind of in the boat that my syntax preferences for less verbose casting syntax come straight down to what I'm used to. I learned 65816 assembly (assembly in general, really) using tutorials that exposed me to WLA-DX, so that's what I know and what feels natural to me.

I like the .directpageat and .databankat directive idea and I definitely like the idea of having synonyms that specifically address the quirks of the SPC700.

I do kind of think I'm probably not the ideal audience for this question, as almost all of my 65816 experience has been strictly in 8-bit mode and not doing a lot of casting etc, but regardless those are my thoughts.

from llvm-mos.

NovaSquirrel commented on September 25, 2024

I'm not sure I quite understand why prefixes are better than suffixes when it comes to things like the macro provided above? It would seem to me like either would make sense and lead to equal ability to cleanly handle a macro like that, but I guess I'm probably not understanding.

I can do stuff like add foo, add f:foo, add #$1234, etc. and the adc inside will adapt as necessary. I guess you could have multiple macros named things like add.w and add.l but having to make multiple copies of a macro already starts to feel a bit less clean, and it gets worse if the macro has to take multiple parameters that each may be different sizes.

from llvm-mos.

Dizzy611 commented on September 25, 2024

Fair point

from llvm-mos.

johnwbyrd commented on September 25, 2024

https://www.nesdev.org/wiki/Synthetic_instructions
.macro  add     Arg1, Arg2
...

That is one good reason, of several good reasons, why we should consider starting a set of convenience macros, in GNU assembler format obviously, as part of the llvm-mos-sdk. Macros are a nice configurable way to add sugar to more complicated expressions, not just on the 65816 but for all 65xx variants, and right now there's not a lot of experience in the llvm-mos community with writing them. For inspiration, consider https://github.com/davidgiven/cpm65/blob/master/include/zif.inc , as well of the rest of @davidgiven 's work in that directory.

from llvm-mos.

asiekierka commented on September 25, 2024

@JohnMBrooks Thank you for your feedback! To answer your questions (I'll edit some of that into the original post above):

Can you clarify what assembler ergonomic issues you would like feedback on?

The main focus is the issues described in the original post, as I consider them essential for effective 65816 development. However, any feedback relevant to 6502/65816 development is welcome and may be considered as separate/additional features going forward. While LLVM-MOS's focus is to provide an excellent compiler backend, I've always believed in the potential of mixed C/ASM codebases for retro hobbyist development, and as such I'd like LLVM-MOS to be, at minimum, a good assembler as well.

Is the goal to allow compiler-generated 65816 assembly source to be used as a starting point for a full-assembly application?

It is possible to use Clang to emit compiler-generated assembly source, but I would not say this is a specific goal here; the expectation is to use the LLVM linker to compile a project consisting of only assembly, or a mix of C and assembly sources.

Is the goal to allow existing 65816 code be converted to the llvm-mos assembler for integration with llvm compiled code?

This should be possible regardless, if the set of LLVM-MOS's features is sufficient to allow expressing typical 65816 code.

Code relocation [...] Local labels [...] Macros [...] LUP pseudo-op [repeat] [...]

LLVM-MOS's assembler follows the feature set of the portable GNU assembler; as such, we should already have these features, just with different syntax. However, if we are to ever target the Apple II community more directly, it might be a good idea to create a wiki page describing how things are done in Merlin vs. how they are achieved in LLVM-MOS.

Out of those which pertain to the assembler, we don't have the following features:

Support for ProDOS API calls and GS toolbox API calls. This is a concern of llvm-mos-sdk, which provides platform-specific functions, macros and linker scripts. There has been some preliminary work done on Apple II support, but I am unfamiliar with that platform, so I cannot be of much help.
Support for the OMF relocatable file format. Translation from ELF would match what we do for other targets already (like cpm65), and as such, also belong in llvm-mos-sdk.

We already have an issue for Apple II support, so I've added these notes there.

We don't have an equivalent for CYC ON, but I can't intuitively tell how much effort it would be to add such a feature as part of disassembly - however, I agree that assembler developers would appreciate it.

from llvm-mos.

NovaSquirrel commented on September 25, 2024

For SNES development, the best debugger available (Mesen) provides a profiler that shows the real best/worst/average cases for each subroutine, so I don't think I would use a cycle counter provided by an assembler. The SNES also has a quirk where some cycles are 2.68MHz and some are 3.58MHz so a cycle counter doesn't provide the full picture, either. But I definitely see the value in a tool that will inherently work for any 6502 family device regardless of what debuggers are available.

I like the idea of being able to specify both the M and X bits on the same line, but if you're required to give both then that sounds like it would complicate things when a macro only wants to change one. Maybe the syntax could include some sort of way to specify "don't change" for one of the two bits.

from llvm-mos.

johnwbyrd commented on September 25, 2024

I was a professional assembly programmer for Apple IIe and IIGS software in the 1980s and 1990s, and currently maintain the Apple II ProDOS operating system (all assembly).

Hi @JohnMBrooks , I suspect you don't remember me, but I interviewed for you a few years after you had started Blue Shift. I recall that we bonded over programming 6502-based machines. Seems that you have ended up at Visual Concepts, which I worked with when it was doing first-party Sega titles. I am very glad to have you along on the llvm-mos project and welcome your involvement and perspective.

from llvm-mos.

oziphantom commented on September 25, 2024

Sorry for the novel, but I feel I need to establish some context for this thread..

I feel we might be seeing the battle but missing the war. Do we actually want to be able to tell the assembler what the access size is? And if we are telling the assembler what size things are how does it handle it?

We are not here to make a new 65816 assembler, that is pointless, 64tass and KickAss cover the 65816 and 4510 completely and will be far more feature complete than this ever will.

Rather we are allowing you to inline asm into a C/C++ project, and that is a very different problem. With variables and code made by C I have no idea where things are, or how far away from me they will be, that’s the point.

The problem is the linker. WLA-DX has the same issue, oddly as it has a linker. On WLA-DX you basically have to always specify the access size because the assembler doesn't know where anything is, that is the linkers job, and then linker can't change any code, that is the assembler’s job. But to change LDA ABS to LDA DP means you have to change the opcode, and then the number of bytes which then means you have to shift down all the bytes after it, which changes every entry point and any branch target that crosses said point, which the linker can’t do.
Then if one object says access via abs and another via dp then you have a potential problem as the same variable might not be able to be accessed in both ways so now the linker can’t put 1 address for it and you get a cryptic error message “can’t access MyVar with 16bits in myfunction line 27” which makes no sense to a C dev. Thus, you kind of need C to treat the whole thing as a single blob that it looks at every single compile. Or you have a very strict this is the ABI rules and this is the layout rules and get really unoptimized code.

If you put a function in the same bank then JSR is fine, but if it has to go into another bank then you have to use JSL and then you have to update every other call to it, at which point you need to reflow the code as the number of bytes changes in every unit that calls it, and then that changes all the labels and branch offsets etc.. but if then pushes something to far then a branch won't fit and then you have make the branch go over and put in a jump, which adds more bytes and then a function doesn't fit any more so you move it to another bank and repeat, but then another function can now fit in the bank where the ousted function was placed, allowing it to drop back down to jsr/rts which shaves bytes off and then the original could fit again.. and so on.

This is why 65816 C compilers get into the case of "if you have more than 64K of code, every call is a long call" type arrangement, the linker can't rewrite the entire code set, it’s a linker. If you have more than 64K of data.. well you don’t, you have 64K and then some “long accessed data you place in my config to access sparsely”

But then you have variables, and optimized layout of variables, which can cause them to move from DP to ABS to Long. This is not such an issue on SNES as we are in ROM and all variables must be in either of the 2 banks 7E/7F which fast access windows in some banks for 8K of it. However on an Apple IIgs/Phoneix 256/C64 SuperCPU/C128 SuperCPU/That Atari thing/The Amstrad executive phone, and then for not 65816 targets the Mega65, we have RAM, so if I’m making class instances statically i.e an nice array of them then I would want all the variables and data for said classes to be in the same bank as the code for said bank, so then all the code can access them from the current bank quickly.

To allow me in asm code say load this variable with 16bits then every variable must be statically allocated by the user and the compiler's optimizer cannot move or touch them, otherwise the linker will get the wrong data and the code will crash. Nobody wants this though, we "demand" the compiler optimize layout for optimal code generation, right? So then how in my assembly code do I know how to access a variable? How do I know that it is in the DP? Which could be anywhere.

Which gets us into the assembler is either "all bets are off, anything you place here must not be touched or declared in the C side and you are responsible for it" or "you tell me what variables and what functions you want and I will sort and optimize it with placement and address optimized opcodes" to the best of my ability given I can only see my code unit.

For the first case I might as well fire up 64tass and make my code as a binary blob that adheres to the calling conventions of this compiler and carve out a asm variables section in the config for me to use and just link it in.

CA65 avoids these linking issues by, ZP can’t move from its perspective, the stack also can’t move. So the logic of DP/ABS/Long becomes straight forward. It also doesn’t really understand long so you have to tell it “This will be in another bank” manually.

64Tass solves this by having an “Internal Linker” allowing it to assemble virtually, place things, then link the code into place as it needs to adding as many passes as it needs to resolve them all as it runs through. This allows 64Tass to handle full DP relocation, auto address mode selection for all cases fully.

So this comes down to :
Does this compiler work with “this is how things will be done, DP is here, Stack is here, All calls are long, data bank is X and if you want other than X you use long” at which point we will need a solid @b,@W,@l system,
Or
I’m a optimizing compiler that will look at everything and crunch everything for optimal code, you don’t know where anything is, at which point we don’t have a @b,@W,@l system because the user can not specify where something is, that will be done by the code generation units optimizer so the linker can then just spit out a single file from the single object that is made.
?

from llvm-mos.

mysterymath commented on September 25, 2024

The problem is the linker. WLA-DX has the same issue, oddly as it has a linker. On WLA-DX you basically have to always specify the access size because the assembler doesn't know where anything is, that is the linkers job, and then linker can't change any code, that is the assembler’s job. But to change LDA ABS to LDA DP means you have to change the opcode, and then the number of bytes which then means you have to shift down all the bytes after it, which changes every entry point and any branch target that crosses said point, which the linker can’t do.

I haven't read all this yet, but LLD (and any modern linker in general) actually can and does perform size changing alterations to code. On some platforms like ARM it inserts nearby thunk sections and redirects jumps to those thunks. In RISCV the compiler or assembler inserts the most general variant and the linker "relaxes" it to a smaller faster variant if applicable. In both cases layout is done iteratively until a fixed point is reached, with a constant cap on the number of passes attempted. The operations are crafted somewhat carefully so that they progress monotonically toward the fixed point, even if that isn't perfectly optimal.

from llvm-mos.

JohnMBrooks commented on September 25, 2024

I've always believed in the potential of mixed C/ASM codebases for retro hobbyist development, and as such I'd like LLVM-MOS to be, at minimum, a good assembler as well.

Same here. My first commercial C project was porting a large C codebase for "The Hunt for Red October" to the Apple IIGS in 1989. I took the asm generated by the APW 65816 C compiler and optimized/rewrote it, enlarging the scope of the assembly subsystems until it ran fast enough.

The strength and flexibility of the assembler really come into play for mixed C/asm coding, particularly for the 6502 and 65816 where register assignment, code alignment and addressing modes make such a big difference in code size and performance.

Is the goal to allow existing 65816 code be converted to the llvm-mos assembler for integration with llvm compiled code?

This should be possible regardless, if the set of LLVM-MOS's features is sufficient to allow expressing typical 65816 code.

If you want to stress test the mos assembler (or want a speed-of-light benchmark for llvm-mos C), try assembling my 2023 Sieve benchmark (Merlin src for 6502 and 65816 versions attached). I wrote these versions of Sieve to find the peak 6502 and 65816 speed for calculating primes up to 16384, as discussed in the Byte magazine 1981 and 1983 articles which used the Sieve to benchmark CPUs and C/Pascal compilers back in the day.

https://archive.org/details/byte-magazine-1983-01/page/n291

JB2023.Sieve65.s.txt
JB2023.Sieve816.s.txt

The 65816 sieve runs in 111K cycles, or about 45ms on a 2.8MHz GS. For comparison, in 1983 the 80MHz Cray-1 with a Fortran compiler ran the Byte sieve in 11ms per iteration. The fastest C compiler tested in 1983 was Unix Berkeley C on a VAX-11/780 which took 142ms per iteration.

LLVM-MOS's assembler follows the feature set of the portable GNU assembler; as such, we should already have these features, just with different syntax.

Sadly the gnu assembler is not very feature-rich or user-friendly for serious assembly programming, especially for the 6502/65816 architecture.

from llvm-mos.

asiekierka commented on September 25, 2024

Back home, it's time to respond to all the new points.

We are not here to make a new 65816 assembler, that is pointless, 64tass and KickAss cover the 65816 and 4510 completely and will be far more feature complete than this ever will.

We already have a 6502/HuC6280/SPC700/65816/... assembler. It's an inevitability of implementing an LLVM backend that you either end up developing an assembler and matching disassembler, or (less recommended) use a port of a GNU-compatible toolchain like binutils.

In addition, we have the only (AFAIK) 6502/65816 assembler that works with the ELF format that our linker expects. The linker used has many advanced features - LTO integration, section garbage collection, DWARF debug information handling - and, as such, you can't just swap in something like 64tass. Therefore, it is expected that native LLVM-MOS projects will want to use LLVM-MOS's built-in assembler not just for inline assembly, but also for external assembly files. At minimum, the standard library of LLVM-MOS-SDK makes use of such external assembly files. (However, we do also provide ca65/ld65 integration of a sort nowadays, though that is primarily geared towards using existing libraries and code from the cc65 ecosystem in LLVM-MOS projects, and it's arguably a little hacky.)

I don't think we need to have a great assembler. Assembler-only developers will probably not choose LLVM-MOS to make their project with. However, between mixed C/ASM projects and our own needs, I think we need to at least have a good assembler.

On WLA-DX you basically have to always specify the access size because the assembler doesn't know where anything is, that is the linkers job, and then linker can't change any code, that is the assembler’s job.

That's not entirely true anymore. Linker relaxation is a technique most popular in RISC-V linker implementations which allows the linker to change code as an optimization step. However, it's quite non-trivial to get this to work for changing the instructions' size, which led us to give up on the idea for now.

Does this compiler work with “this is how things will be done, DP is here, Stack is here, All calls are long, data bank is X and if you want other than X you use long” at which point we will need a solid @b,@W,@l system, or I’m a optimizing compiler that will look at everything and crunch everything for optimal code, you don’t know where anything is, at which point we don’t have a @b,@W,@l system because the user can not specify where something is, that will be done by the code generation units optimizer so the linker can then just spit out a single file from the single object that is made.

That's a false dichotomy, in my opinion. The best place to look for prior art here, in my opinion, is 16-bit "real mode" 8086 C compilers, which had to grapple with a similar problem (segmentation dividing pointers, code calls and data accesses into "near" and "far" ones). They solved this by:

defining six memory models which provided defaults for how code, data, and stack is reached, as well as how big the typical pointer is;
allowing the user to override these assumptions with modifiers in C like __near and __far.

However, this relates to C. For the assembler, you typically assume the user knows if they're receiving a near or far pointer, if they need to do a near or far access, et cetera (one safe assumption is that everything in the same code section is accessible "near", as opposed to "far").

This is the direction my initial implementation is going in. For a final implementation, one can utilize link-time optimization (also known as whole-program optimization) to scan the entirety of C code to check for assumptions such as "is function A only called by other 'near'-reachable functions?" and "crunch everything for optimal code" - data and code defined by assembly can't be touched here, of course.

Sadly the gnu assembler is not very feature-rich or user-friendly for serious assembly programming, especially for the 6502/65816 architecture.

That's true. However, improving the situation here can only be done in steps - and I believe allowing friendly use of the 65816's total instruction set is a good first step for 65816 development. Worst case, we can fall back on our integration with ca65 for projects which need more advanced functionality.

.---

[notes about adding .asize/.isize]

I'm wondering if llvm-mos could track the .directpageat, .databankat, asize and isize values at the subroutine entry, return position (RTS/RTL) and caller (JSR/JSL)? Possibly have llvm-mos emit a warning if there is a mismatch?

Those are good questions, but I'd need to do further research on those. Maybe for a second development stage.

from llvm-mos.

[Assembler] Improved ergonomics for 65816 (and other) subtargets about llvm-mos HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent