Coder Social home page Coder Social logo

Comments (7)

mysterymath avatar mysterymath commented on May 26, 2024

For reference, the inimitable 64doc:
http://www.zimmers.net/anonftp/pub/cbm/documents/chipdata/64doc

Necessary info is under the excellently named "6510 features"

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on May 26, 2024

I'd like to confirm that I understand this issue.

When implementing volatile support, it's important to make sure that 1) the underlying memory is actually being read and written whenever the corresponding volatile is; and optionally, 2, that the underlying memory is read and written no more frequently than the corresponding volatile is.

Is this a fair problem statement?

from llvm-mos.

mysterymath avatar mysterymath commented on May 26, 2024

Yes, that's a fair assessment. To clarify, the overall ordering of volatile accesses need to agree as well.

There's exactly two things that the C standard defines something that the implementation is actually required to do. Were it not for these two clauses, the compiler could just emit RTS for every program.

They are:

  • At each sequence point, all previous volatile reads and writes are complete, and no later volatile read or writes have begun.
  • I/O operations are produced in agreement with the abstract machine semantics.

The first item roughly corresponds to (1); the standard says nothing about extra accesses, only those given in the abstract machine model. But, implementations can and do define tighter interpretations. We can also be fairly choosy with what we want to support.

For example, say you have a volatile IO reg that performs an access on read, but you define it via a struct with a bitfield:

struct S {
  int dummy : 7;
  int flag: 1;
};
volatile struct S* IO = 0x1234;

If you change one of the fields, it'll be nearly impossible for the compiler to emit something that doesn't involve a read-modify-write, since there's no direct "set one bit" operation in memory on the 6502. You have to read a full byte, modify it, then write it back.

IO->flag = 1;
LDA IO
ORA #1
STA IO

So there's not really a way to 100% agree with the abstract C semantics here, even without any CPU bugs. It says one access, but we can't do it in under two. This specific issue caused contention amongst Linux kernel developers, since GCC was happily doing 64-bit read/modify/writes for 32-bit accesses, and Linus thought it shouldn't (those accesses were actually partially outside the struct, causing memory access violations IIRC. Not all that dissimilar than our case here. I can't remember what GCC team actually decided.)

Ideally, we'd have volatile accesses agree "as much as possible," but there's obviously a sliding scale here. There's also an easy out for this work; we can take the C standard hard-line and require accesses to these sorts of IO registers to be done with inline assembly.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on May 26, 2024

First, let's go to the standards documents:

"A static volatile object is an appropriate model for a memory-mapped I/O register. Implementors of C translators should take into account relevant hardware details on the target systems when implementing accesses to volatile objects. For instance, the hardware logic of a system may require that a two-byte memory-mapped register not be accessed with byte operations; and a compiler for such a system would have to assure that no such instructions were generated, even if the source code only accesses one byte of the register. Whether read-modify-write instructions can be used on such device registers must also be considered.
Whatever decisions are adopted on such issues must be documented, as volatile access is implementation-defined. A volatile object is also an appropriate model for a variable shared among multiple processes. A static const volatile object appropriately models a memory-mapped input port, such as a real-time clock. Similarly, a const volatile object models a variable which can be
altered by another process but not by this one." Reference

There are two kinds of spurious accesses that we need to be aware of in llvm-mos:

A: When you add a carry to the MSB of an address, a fetch occurs at a garbage address.

B: The instructions INC, DEC, ASL, LSL, LSR, ROL, and ROR, will store garbage into the target address as part of the modify cycle.

Now A will only be a problem for volatiles, if the address of said volatile is calculated as a <256 byte offset from another address, and that offset happens to cross a page boundary, and that addressing is calculated at run-time as an x or y offset. I can't think of a situation where this would occur, if the effective address is a constant. In other words, I think that problem A could only occur in a volatile whose effective address is not const and not static. Is this correct?

Now B will only be a problem if you do the shift or increment on the volatile memory itself, as opposed to reading the value into an imaginary register, doing the operation on that imaginary register, and then writing it. But I am not aware of anything in your codegen than performs increments or shifts on anything except imaginary registers, which can't be mapped onto any hardware device. Anything I'm missing?

If the above assumptions are true, then we may be able to squeak by, by simply telling the user that reads on indexed volatiles that are neither static nor const, are not guaranteed to read only the effective address, i.e. they may generate spurious reads, because of hardware side effects.

If my assumptions are wrong, then I can think of the following ways of mitigating.

  1. Mark certain instructions or addressing, including adding a carry to the MSB of an address, and RMW instructions including INC, DEC, ASL, LSL, LSR, ROL, ROR, as being incompatible with volatiles, within codegen. You'll have a better opinion how to do this than me.
  2. Implement an emulator that marks certain instructions as possibly spurious during volatile access, and write test cases in lit to verify that codegen never generates them.
  3. Implement an emulator that models all instruction side effects, and verify that spurious reads never occur on any instruction during volatile access. This is not easy, even assuming that a cycle accurate emulator exists, because spurious reads are usually ok; it's only during volatile access that we get worried about them. We'd need some kind of thunk to tell the emulator to start or stop trapping on spurious reads.

So I propose the following approach.

  1. Push the information above into llvm-mos documentation;
  2. Get an instruction-level emulator base class working well enough to hello world;
  3. Subclass the emulator to abort on the shift-increment instructions on anything that is not an imaginary register, or when crossing a page boundary during reads. Not easy even with cycle accuracy, as per 3 above.

References:

http://www.textfiles.com/apple/6502.bugs.txt
http://visual6502.org/wiki/index.php?title=6502_Timing_States
http://www.visual6502.org/wiki/index.php?title=6502_State_Machine
https://docs.mamedev.org/techspecs/m6502.html
https://github.com/mamedev/mame/tree/master/src/devices/cpu/m6502
Appendix A of http://archive.6502.org/books/mcs6500_family_hardware_manual.pdf , which documents these reads as "discarded"

from llvm-mos.

mysterymath avatar mysterymath commented on May 26, 2024

Now B will only be a problem if you do the shift or increment on the volatile memory itself, as opposed to reading the value into an imaginary register, doing the operation on that imaginary register, and then writing it. But I am not aware of anything in your codegen than performs increments or shifts on anything except imaginary registers, which can't be mapped onto any hardware device. Anything I'm missing?

This is a TODO item for the code generator; the instruction selector should eventually be able to detect an entire G_STORE G_SHL G_LOAD sequence and convert it into a single memory ASL. But, it's pretty trivial to make sure that the volatile bit isn't set on the G_LOAD and G_STORE (and it's really just one bit there, passed all the way down from Clang). Not too worried about this one; just something to keep in mind when we get around to it.

Now A will only be a problem for volatiles, if the address of said volatile is calculated as a <256 byte offset from another address, and that offset happens to cross a page boundary, and that addressing is calculated at run-time as an x or y offset. I can't think of a situation where this would occur, if the effective address is a constant. In other words, I think that problem A could only occur in a volatile whose effective address is not const and not static. Is this correct?

We could end up emitting spurious accesses to volatile objects when accessing non-volatile objects as well.
For example, say we have a volatile object assigned by linker script to 0x2001, and a regular object, which the linker ends up assigning to 0x20ff:

volatile const char vol;
char nonvol[2];
vol = 0x2001;
...
assert(nonvol == 0x20ff);

If we were to assign to the high byte of nonvol, the effective address would be $2101, causing a spurious read to $2001, the volatile object:

extern char y;  // 2 at runtime
nonvol[y] = 1;
LDY y
LDA #1
STA nonvol,Y ; Effective address $2101, issues spurious read to $2001

I'm not entirely sure what the best mitigation is for this. For the compiler to be usable for I/O, we'd need to allow users to ensure that it didn't emit page-crossing indexed operations exactly one page above an IO register. The easiest way I can see to allow users to annotate that is to require all such accesses to be to volatile objects, and to completely disallow indexed addressing for volatiles. This can be done by generating the final address into an imaginary pointer, then using the indirect-indexed mode with Y=0, which is guaranteed not to cross a page.

We'd have to ensure that our linker scripts don't place any of the default sections immediately one page above IO registers triggered on read/write, but that shouldn't be too onerous. We also may be able to relax the restrictions somewhat on volatiles that are placed in those sections, since they're effectively guaranteed never to contain IO ports. It's somewhat doubtful that avoiding indexing and rmw on volatiles will ever cause that big of a performance hit, though, given the expected rarity of doing either on IO registers.

from llvm-mos.

johnwbyrd avatar johnwbyrd commented on May 26, 2024

I think your solution of disallowing indexed addressing for volatiles, is more than fair.

To show off a bit, you could even provide a pragma or target-specific compile flag, to permit the compiler to index addressing on volatiles, with the presumption that the user will page-align base addresses for indexed volatiles. That way, you could still generate optimal I/O code, if you know how to turn off the safety feature that saves you from spurious reads. -mmos-unsafe-volatile-reads, -mno-mos-unsafe-volatile-reads

Another possibility would be to require all volatile data structures of n bytes, to be 2^ceil(log2(n)) byte aligned by the linker. The linker can set aside some sections for this, but it would be up to the compiler to annotate that variable as being aligned, and this seems like overly strong medicine generally, especially on platforms with limited memory.

Don't worry too much about the linker scripts. I can only think of a few special cases where you put RAM and hardware devices in the same 6502 page. Generally, most 6502 devices tend to put I/O hardware into high, page-aligned memory, probably because it's easier from a hardware perspective to patch said hardware onto the address bus there. This is also probably why this problem tends to be relatively rare in practice... your index base will usually start on a page boundary, and will rarely exceed a page in size.

As an example of a practical memory address on a 6502 target that auto increments upon read, consider the PPUDATA flag from https://wiki.nesdev.com/w/index.php/PPU_registers ,

from llvm-mos.

mysterymath avatar mysterymath commented on May 26, 2024

I've forced the index to zero for volatile loads and stores; this should prevent page crossing from occurring for any such accesses. We don't have any RMW operations at the moment, so this issue is resolved, at least for now. I'll just have to make sure to consider this when I finally get around to adding RMW logic; this shouldn't be too difficult, because this bug has indelibly stained RMW operations in my mind.

from llvm-mos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.