usernamefodder / pmdsky-debug Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 20.0 5.19 MB

Debug info for reverse engineering PMD: Explorers of Sky

License: GNU General Public License v3.0

Rust 28.22% Makefile 0.17% C 62.41% Python 9.18% Shell 0.03%

pmdsky-debug's People

Contributors

Stargazers

Watchers

Forkers

marius851000 skytemple eleventhnov metalcape thecapypara end45 sudzoof adex-8x jawshoeuh planetmdx thedigitizedsunk anonymousrandomperson laioxy epicyoshimaster adakite1 electricgeorge ronikirla tech-ticks

pmdsky-debug's Issues

Allow mixed case conventions with resymgen checks

Is your feature request related to a problem? Please describe.
When programs use libraries, the case convention used in the application code might differ from that used in the library code. An obvious example is right here in pmdsky-debug. We use PascalCase for the majority of EoS code, but we've also found what appear to be C standard library functions, for which snake_case would really be a more natural choice.

Describe the solution you'd like
Right now, the --function-names and --data-names options for resymgen check only allow a single choice of naming convention (right now snake_case, SCREAMING_SNAKE_CASE, camelCase, or PascalCase). It should be enhanced to support more than one naming convention at once. For example, you should be able to do something like

resymgen check --function-names=PascalCase --function-names=snake_case

to enforce that all symbols follow either PascalCase or snake_case. The exact API here is TBD.

Additional context
Various people (both from the EoS community and from other communities) have expressed some distaste with C standard library functions being renamed to use PascalCase. Supporting this feature would allow us to properly name C standard library functions using their conventional snake_case names.

Investigate correlations between DS Download Play binary and release binaries

Existing information

(First pointed out by Discord user jacknet) Sky's NitroFS contains a runnable DS Download Play ROM at /SYSTEM/main.srl, which can run dungeons and has working debug menus, debug flags, and some other assertions and program trace logs that are either not included or stubbed out in the main game.

As is, it runs into crashes due to allocation failures, but (credit to irdkwia) this can be prevented with a simple patch that extends the memory arena used by the allocator from 0x180000 to 0x1A0000:

; Hook at 02000DBC with NO$ to change the instructions
:02000DBC mov  r3,#0x1A0000
:020010AC rsb  r3,r0,#0x1A0000
:020013B4 mov  r1,#0x1A0000

; Doesn't seem to be used, but may be related (set breakpoint in case it reaches this line; it didn't in my test)
:020018A8 rsb  r2,r8,#0x1A0000

When unpacked, main.srl contains only ARM9 and ARM7 binaries and no overlays. When decompressed (python3 -m ndspy.codeCompression decompress <path/to/arm9.bin>), it can be analyzed just like the release binaries. Looking at this binary might give some easy insights, given that it likely has debug code such as assertions and file names.

I wrote a script called arm5correlate.py that tries to map regions in main.srl's arm9.bin to release binaries, which could be useful for mapping known symbols in pmdsky-debug to the debug binary, and for mapping anything learned from the debug binary back to the release binaries.

Relevant addresses

These addresses are all for the NA version.

Stuff from the ROM header

ARM9 RAM load address: 0x2000000
ARM9 entrypoint: 0x2000850
ARM7 RAM load address: 0x2380000
ARM7 entrypoint: 0x2380000

Stuff from `arm5correlate.py`

The correlation script seems to show that the debug ROM shares large chunks of code with the release binaries, in particular ARM9, overlay 29, overlay 10, overlay 31, and maybe overlay 26. It seems likely that all the aforementioned overlays are concatenated in the debug binary almost in full. Here's a "correlation map" of the debug ROM (generated with chunksize=8, tolerance=512):

Here's a finer-grained version (chunksize=8, tolerance=16):

Here's the quantitative offset maps corresponding to the two correlation map diagrams above:
correlation-maps.zip. Each text dump comes from arm5correlate.py in doubly verbose mode and has 3 sections:

Raw chunk correlations between raw 8-instruction chunks
Per-file larger-chunk correlations generated by merging the raw chunk correlations
(Most useful) An overall map of the best-attempt inferred correlations from the debug binary to the relevant release binaries. The diagrams above are generated from the correlations in this section.

File names

These likely come from the __FILE__ C macro. Found by a simple regex byte search for \b\w*\.[ch]\0:
ds-download-arm9-files.txt

Other

Identified by @End45 (probably the same in NA vs. EU?):

0x200AA58: GetDebugFlag1
0x200AA68: SetDebugFlag1
0x200ABF8: DebugPrint
0x200AC94: FatalError

Also noticed by both End and me: 0x200AC10 appears to be an "assert" function, called in 825 different places with various developer assertions.

Support per-version C headers

Is your feature request related to a problem? Please describe.
@Adex-8x found that struct monster in the JP version doesn't have all the fields that exist in NA/EU. JP is missing:

in_action
field_0x64
field_0x65
field_0x66

With version-specific enums like enum script_level_id* and enum script_object_id*, the headers just define version-specific enum tags, e.g., enum script_level_id_eu. This is already somewhat unideal, and with struct monster, this approach will be cumbersome since struct monster is itself embedded in other structs like struct dungeon, so version differences would be "contagious".

We should figure out a good way to represent version differences that doesn't make the headers harder to use (e.g., importing in Ghidra).

Describe the solution you'd like
It would probably work to use #define VERSION in tandem with #if or something, and deal with conditional struct fields at the preprocessor level. This would require #define'ing the version before importing the headers. We could ship wrappers around the main pmdsky.h that define the variable for different game versions, and have people pick the one they want when importing. That way, you could still just import a single header into Ghidra.

By default, pmdsky.h should set the version to NA or EU if not defined, for compatibility.

Right now the headers shipped aren't version-specific, so in the release package, we would probably make separate copies of pmdsky.h and place them next to the already version-specific symbols.

Investigate HookReadTSC and HookReadKeypad from PMD Fragments asm patch

Existing information
The asm patches for PMD Fragments do some interesting things that can likely be translated into documentation. Do some digging into the things being done and the code being hooked into, and see if anything useful can be gleaned from it.

Relevant addresses
See https://github.com/irdkwia/fragments-official-repo/blob/main/rsc/patch.txt. Of particular interest is the stuff that hooks into system utilities starting with HookReadTSC and HookReadKeypad.

Create some subregions for large symbol files

arm9.yml and overlay29.yml are getting quite large. Now that resymgen (#52) and pmdsky-debug (#57) can handle subregions, splitting some off for the large symbol files could make them less unwieldy. We should decide on a logical way to split these files up. This will also require splitting the function prototype files in headers/functions/ to match the subregion structure in symbols/.

Support subregions in pmdsky-debug

Subregion support was added to resymgen with #37. The rest of pmdsky-debug should also be updated to support subregions, in preparation for #53 and other future subregions.

Update symbols/README.md
- The introduction and the "Files" section should have some general explanation for subregion files
- resymgen example commands should use -r option when applicable
Update GitHub Actions workflows:
- check-symbols.yml: call resymgen with the --recursive option everywhere
- release.yml: call resymgen gen with the --sort option
Update headers/symbol_check.py to handle subregions, assuming the function prototype files in headers/functions/ matches the subregion structure in symbols/.
Update tools/symbols_vfill.py to handle subregions

script_variable_meanings.txt: Followup script engine implementation research

See the follow-up ideas suggested in #143 (comment). It would probably be enlightening to dig into how the script engine is actually implemented in the game's code. Further understanding on this front would probably complement the community's current high-level understanding of scripting, and could be mutually beneficial for both sides.

Existing information

arm9.yml already has a block of functions obviously related to the script engine in the [NA]0x204B*** and [NA]0x204C*** area.
overlay11.yml has some particularly important-sounding functions like ScriptCommandParsing and ScriptSpecialProcessCall.
ram.yml has a couple data symbols that seem to back certain script variables.
enum script_var_id, and enum special_process_id, have useful notes.

Relevant addresses
These NA addresses are mentioned in the comments of enum script_var_id and could use symbol-ification:

Include global variables in headers

Is your feature request related to a problem? Please describe.
The type of global variables is not always clear (see e.g. FRACTIONAL_TURN_SEQUENCE in overlay29, which is an array of 16-bit integers according to Ghidra, but the description doesn't mention that).

Describe the solution you'd like
Adding global variables to the headers would solve two problems/inconveniences:

The type of all globals would be clearly defined
Global variables wouldn't have to be added manually to C injection projects with extern struct ... and show up in auto-complete

Describe alternatives you've considered
Adding types to the globals in YAML files in a structured way would also be helpful (and a header file could be generated from them), but I don't really see a benefit over directly adding the fields to the headers.

Auto-generated Changelog for Symbol Updates

Is your feature request related to a problem? Please describe.
At the moment when updating libraries that use pmdsky-debug such as SkyTemple Files and (to a lesser extend) c-of-time/Rust of Darkness, it's a bit of a "guessing game" / going through the history to see what symbols have changed and what might break. With SkyTemple it's especially problematic due to the lack of options for static analysis (though that's probably also that could be improved over there). SkyTemple does have some tests that check for some symbols, but they aren't really 100% reliable for this...

Describe the solution you'd like
It would be great if there was a changelog with each release that outlined the renamed / removed / added / updated symbols. Ideally this would not (just) be in the Github releases, since they are very inconvenient to get an overview of changes over multiple releases.

Describe alternatives you've considered

Doing nothing; just relying on external libs to do checks themselves
Doing this manually (not practical)

Additional context
/

Resymgen merge does not skip symbols out of range unless the version is NA

When merging symbols from a CSV file, if I use the option -v EU then resymgen adds all symbols to the file, even if they are outside the address range of the block. This does not happen with -v NA. I noticed that if I manually edit the versions field of the YAML file and put EU at the top, then it filters out symbols correctly. I also tried it with -v JP and it also does not work correctly.

To reproduce
This is the exact command I used (working directory is pmdsky-debug/symbols):

resymgen merge -x -f csv -v EU -i ../../symbol_table_eu.csv overlay29.yml

I've also tested it on arm9.yml and overlay10.yml, with the same results.

Expected behavior
Symbols outside the address range of the block should be filtered out automatically.

Desktop environment

Windows 10
Tested both in the WSL terminal and Powershell, and it behaves the same

Additional context
These are the CSV files I was trying to merge:

Require symbols for JP ROMs for new Pull Requests

Is your feature request related to a problem? Please describe.
At the moment a vast amount of symbols (and patches, but that's outside the scope of this repo) are only available for the US/NA and EU ROMs.

Describe the solution you'd like
I think with the recent contribution through #175 we should aim to update more symbols to have JP addresses and require future contributions to have JP symbols.

Describe alternatives you've considered

We could do nothing, but this would probably eventually lead to issues of the JP ROMs being barely supported by projects using pmdsky-debug.
We could incentivize contributors by adding it to the PR template (which doesn't exist yet) as a soft requirement

Additional recommendations

A Pull Request description template should be created
A document describing contribution should be added (unless there is already one and I'm just blind)

Sound Engine

A substantial amount of the process occurs in the ARM7, which itself works over two domains: The ARM7 binary and the shared WRAM. Since there's not a lot of info on those two aspects, I'll include some relevant stuff of theirs aswell. All of this information is for North America ROM

I'll try to summarize the process as far as I've been researching it. Due to the large complexity of the system, I'll try to not add a lot all at once. If some detail in particular is needed before it's added, please ask for it!

This is my idea of how a request to play music goes from start to finish, in function calls. I'm not fully certain what some steps are meant to achieve in the big picture yet so I apologize for leaving them blank

0x20198b8 void BGM_Play(int bgmId, int unk0, int unk1)
0x206d9bc
0x206db7c
0x2073504
0x2073edc In this function, the two arguments passed in are somehow used to buffer writes to an array of structs in 0x22b7a6c (size of each: 0x15c, and there's one for every sound channel in the DS)
Since the writes are buffered, this is the end of this avenue. To further follow the process, we need to start elsewhere
0x2071014
0x2074a58 For each of the structs in the previously mentioned array, some of their fields are used to create the sound_ops that are passed via IPC
0x207ca6c void PushSoundOp_Type0xE(uint channel, uint format, uint source, int repeat_mode, uint loop_start, uint word_amt, uint volume, uint volume_div, int timerVal, int panning)
This is the most complete sound op I could find and it overwrites a huge amount of the sound registers in 4000400. Basically all of the names are from GBATEK
0x207cbcc void PushSoundOp(uint type, uint arg0, uint arg1, uint arg2, uint arg3)
0x207cedc void PushSoundOpToLinkedList(sound_op* op)
The linked list in question has pointers to the start and end at 0x22b9a28 and 0x22b9a2c respectively. Anyway, having buffered these operations, they're finally all sent through IPC over the dedicated sound channel (n. 7), see #20

STRUCTS:
Sound Op: Stores the operation type, four ints (16 bytes) used to store any kind of arguments in any bit layout, and a pointer to the next sound op (making it a linked list node, which is a VERY common data structure observed in the sound engine overall)
0x0: ptr to next sound op
0x4: type (u32)
0x8: arguments (u32[4])
Total size: 24 bytes

FUNCTIONS ARM7:
0x3801f1c: Processes operations sent from the ARM9 over IPC fifo, most of them are values to be written to the sound registers (0x40004XX)

0x2380118: Copies the WRAM binary stored in ARM7 (0x23801e8) to 0x37f8000. It also copies stuff to 0x27e0000 but I'm not sure what for

Support file imports in resymgen

Context

Currently, the resymgen YAML specification requires all the symbols within a given block to be in a single file. The specification allows some flexibility by supporting multiple independent blocks, which can either share the same file or be split across files. This fits naturally with the use case of researching multiple binaries, which may be related but are largely independent.

However, with a single large binary, putting all symbols in a single file can quickly become unwieldy, and the YAML file can become quite large. Currently, the best workaround would be to split the symbol table across multiple files that all share the same block name, and using the merging capabilities of resymgen to combine the files back into a single symbol table when generating artifacts. Unfortunately, this comes with a few downsides:

It becomes harder to work with the overall symbol table as a single entity, since resymgen commands will need to be run separately on each subtable file.
Certain validity checks, such as non-overlapping symbols, cannot be enforced jointly across subtables.
The need to be able to merge subtables back together also means each of the files must define a block with the same name, address bounds, and description (or leave the description empty in one or more subtables). On top of repetition, this also introduces the possibility of different subtables covering interleaved address spaces (which may or may not be desirable, but isn't very natural when dealing with a raw binary file).

Proposed solution

`subregions` field on blocks

In the style of the Rust module system, support an optional subregions field on blocks, like so:

block_name:
  versions:
    - v1
  address: 0x0
  length: 0x1000
  description: foo
  subregions:
    - "sub1.yml"
    - "sub2.yml"
  functions: []
  data: []

Within a file foo.yml, items in the subregions list should be names of files within the sibling foo/ directory. Files should be valid resymgen YAML files, which should define one or more subregions (as arbitrarily named blocks) contained within the address range of the parent block, with accompanying lists of symbols. This will allow large address spaces to be subdivided cleanly into collections of smaller (but still internally contiguous) regions, possibly spread over many different files, while still allowing a top-level file for metadata and symbols that don't belong in a specialized subregion.

Since the subregion directory name is implied by the file name, this means that a file with multiple blocks must put subregions for different blocks in the same directory. But if blocks are grouped into a single file, it makes sense that their subregions should go in the same directory. If further splitting is desired, the (separate) subregion files can themselves define additional subregions.

Considerations

Arbitrary layers of nesting should be supported.
The gen command should resolve subregions automatically, and include imported symbols as if they were part of the parent symbol table.
For speed and flexibility, the fmt command should not resolve subregions by default, and should only sort the subregion list alphabetically (since order doesn't have meaning). It should omit the subregions field if the list is empty.
- If the -r, --recursive option is provided, the formatter should run on all files in the import tree.
For speed and flexibility, the check command should check the main file's contents but ignore the subregions field. The -u, --unique-symbols check should additionally ensure that none of the file names in the subregion list are repeated.
- If the -r, --recursive option is provided, resymgen should:
  1. Recurse into subregion files and run the same checks on them as on the main file.
  2. Run some additional relationship checks between files if certain check flags were specified:
  - -V, --complete-version-list: Ensure the main file's version list contains all versions specified in the subregion files' version lists.
  - -b, --in-bounds-symbols: Ensure the bounds of subregions fall within the main file's address bounds.
  - -o, --no-overlap (note the name change from --no-function-overlap): Ensure that none of the subregion bounds overlap, and that none of the main file's symbols (functions or data) overlap with any of the subregion bounds.
  - -u, --unique-symbols: Ensure no symbol name is repeated between the main file and subregions, or across subregions.
The merge command should recurse into subregions, and try to place incoming symbols in the appropriate subregion if one is present. Otherwise, it should fall back to merging into the main symbol table.

Additional repository changes

Update resymgen docs (resymgen.md and docstrings).
Update pmdsky-debug symbols docs.
Update GitHub Actions workflows as needed. This should only require updating any checks to use the --recursive flag; release package generation should be unaffected with the proposed change to the gen command.
Update function headers and symbol_check.py to deal with symbol table import trees.
Update symbols_vfill.py to deal with symbol table import trees.

Alternative solutions

C/C++ style `includes`

This would be like the main proposal, but with support for arbitrary file paths, which would look like this:

block_name:
  versions:
    - v1
  address: 0x0
  length: 0x1000
  description: foo
  includes:
    - "path/to/sub1.yml"
    - "path/to/sub2.yml"
  functions: []
  data: []

This seems like more flexibility than would be useful, especially since it's never expected that one would need to reuse the same subregion in multiple different parents. Arbitrary file path inclusion also introduces the risk of circular dependencies.

Directories as aggregate entities

This option would support running resymgen commands on a whole directory, automatically merging all contained files and treating the contents as one unified table. This would provide a simple way to split up a large file into multiple, and is probably less work to implement than adding a new block field. However, this approach has some disadvantages:

It lacks an elegant way to define a parent-child relationship between a main file and subregions, which is particularly desirable for documentation purposes, since the existence of a main file provides a natural and obvious place to document top-level address bounds and overall notes about the binary. This also causes issues for the merge command, which benefits from having a main file as a fallback destination for symbols.
It fits less cleanly with the current resymgen YAML spec, particularly when files contain multiple blocks. Splitting up a multi-block file would require either more nesting, or a single flat directory containing subregions of multiple different blocks at once (the main proposal has the same property, but is less confusing because a clear parent file exists).
It relies on the merge functionality of resymgen, which was experimental and idiosyncratic to begin with.

New file format

Introducing a new file format would be similar to the imports proposal, except the imported files would be some simpler format rather than standalone resymgen YAML files. This could mean slightly less boilerplate around the main goal of splitting up long functions and data lists. But adding a whole new format is probably even more complex than using the existing one. It also loses some of the nice properties of subtables being standalone, such as being able to run checks and formatting on individual subtables, and trivial arbitrary nesting support.

YAML inclusion with custom tag handles

While standard YAML doesn't have any kind of "include" statement, some YAML loaders like PyYAML's support user-defined handlers for tag handles, which enables the use of constructs such as !include <filename> directly within a YAML file. Unfortunately, yaml-rust does not currently support this, and even if it did, such a construct wouldn't be reliably portable (which would make the symbol tables harder to use). Furthermore, direct textual includes wouldn't fit very well with resymgen because of the separation between the function and data lists; separating out a subregion containing both functions and data would require the use of two separate !includes in different places. It would also hinder error reporting, since inclusion would happen at the YAML loader layer, which would hide it from the resymgen layer.

Document script engine: Transfer Resarch from SkyTemple SSB Debugger & PsyCommando's notes

In early 2020 I made a lot of research based on Psy's research on the game's script engine. A lot of this made it into SkyTemple's script engine debugger and also pmdsky-debug.

Some of the most important undocumented things are related to the script engine thread state structs and other entities (actors, events, map, objects, performers, global script):

Additionally I took notes from Psy and made my own, mostly connecting the dots between his research and traces I did with the emulator integration.

I uploaded all my notes I have here: https://github.com/theCapypara/pmdsky-ground-state-notes

Effects of abilities, IQ skills and held items

Unlike moves, passive effects like abilities and held items do not seem to have a specific piece of code that implements them. Instead, they are just flags in an entity that the code checks when performing certain actions. Abilities like Klutz are all over the place, because whenever the effect of a held item needs to be applied the code also checks for Klutz. On the other hand, others are checked only in a specific instance, for example those which can only activate at the end of a turn (like Speed Boost or Shed Skin). There are also some which do have their own functions, both very simple (like Truant which just tries to apply the pause status) and very complex (like Forecast which has to manage multiple form changes).

Given these inconsistencies I feel like the best course of action would be to search for calls to AbilityIsActive, IqSkillEnabled and CanUseHeldItem (this is a name I decided for the function at 0x022E3CBC which checks for Klutz and then calls HasHeldItem) to see where the code checks for skills, abilities, and items, and then start reversing the functions containing those calls. This is what I'm currently trying to do, at least for abilities.

Relevant addresses
All of these have been found in the EU version of the game.

0x022E3CBC bool CanUseHeldItem(entity*, item_id): Checks if the entity holds an item and does not have Klutz. Also at 0x022EECC8, 0x022F6350 and 0x02311A94
0x022E9FA4 bool EntityIsNotNull(entity*): Returns true if the entity is not ENTITY_NOTHING. Used in many different places before checking parameters of an entity. Also at 0x022F7D1C, ~~0x022FA12C~~, ~~0x0230248C~~, 0x0235FB8, 0x02308924, and 0x02311A70
0x022EBB28 TrySwitchPlace(entity* user, entity* target): Attemps to switch places between user and target, checks for Suction Cups and Mold Breaker
0x022ED08C: Probably the function that applies the effect of Plus and Minus
0x022F9C48: Function that triggers Slow Start, takes no parameters and returns nothing. Probably called at the start of the floor and checks all pokémon
0x022F9CE4: Function that triggers abilities that change the weather. Also returns nothing and takes no parameters
0x022FA0D8: Returns the result of AbilityIsActive on an entity if another entity does not have Mold Breaker, otherwise returns false
0x022FA1FC TryAbilityTruant(entity*): Tries to inflict the pause status if the entity has truant
0x022FEEDC: Long function that checks for Mobile Scarf and the Absolute Mover skill, might be related to checking the type of tile an entity is moving into
0x02300988: A function that checks for Unburden, Chlorophyll and Swift Swim and returns 1 or 2, which I assume is the number of times a move has to be used
0x02302844 bool IsLevitating(entity*): Checks if an entity is levitating (has Levitate and there is no gravity)
0x02310604: A function that checks for the No-Slip Cap and probably tries to make a random item sticky
0x02310698: Another long function. This one seems to check for many items and abilities that would trigger at the end of a turn, such as the Warp Scarf, things related to belly drain, abilities triggered by weather like Sand Veil, abilities connected to status effects like Poison Heal and Magic Guard, and also Speed Boost. I think it also does many other things unrelated to abilities/skills/items
0x02311A38: A function that checks regen-related effects like the Heal Ribbon and the Quick Healer skill. Probably modifies the amount of HP regenerated per turn.
0x02335BB0: Probably the function that applies the effect of Forecast and changes Castform's forms

For now the information I've found is very incomplete, so I decided to open this issue before making any contributions. I'm new to reversing this game and I don't know if there is a better way to research this, let me know if I'm missing something.

Symbol "MonsterGroundIdleAnim" from SkyTemple missing (OV11)

Describe the bug
The symbol "MonsterGroundIdleAnim" from SkyTemple was never ported to pmdsky-debug:
https://github.com/SkyTemple/skytemple-files/blob/463dd6f5f9714eaa1b2458d483b02718477da8fe/skytemple_files/_resources/ppmdu_config/pmd2data.xml#L161

To Reproduce
n/a

Expected behavior
The symbol should be defined.

Screenshots
n/a

Desktop environment
n/a

Additional context
n/a

Support symbol name aliases

Is your feature request related to a problem? Please describe.
Various projects rely on pmdsky-debug symbols, including https://github.com/SkyTemple/pmdsky-debug-py and https://github.com/SkyTemple/c-of-time. When symbols are renamed, it can cause compatibility issues downstream.

Describe the solution you'd like
Add support for an optional aliases field in the YAML files, like this:

- name: SomeFunction
  aliases:
    - SomeFunctionAlias1
    - SomeFunctionAlias2

This will make things symbols searchable by more names (including old ones), while making it possible for tooling to decide whether or not to use the extra alias names.

resymgen gen should include aliases in the .ghidra (as extra symbols) and .json (in a separate "aliases" property) output formats, since Ghidra can support alternate symbol names. The .sym output format should only use the primary name, since No$GBA only supports one symbol name per address.
resymgen fmt should order alias names alphabetically.
resymgen check:
- Alias names should be subject to the same naming convention checks (--function-names and --data-names) as primary symbol names.
- Alias names should be included in the --unique-symbols check so they don't clash with any other symbol or alias names.
resymgen merge should properly deduplicate aliases when merging alias lists.
The resymgen docs and unit tests should be updated accordingly.

Aliases should not be included in the header files, since this would lead to a lot of duplication. Instead, they should be auto-generated for projects that need them (like c-of-time). The headers-with-docstrings artifact should include these generated aliases.

The following tooling should also support aliases:

If possible, the GitHub Actions PR checks should compare the symbol tables in the base commit with the new changes, and fail if a symbol has been renamed without adding the old name as an alias. This is probably doable by seeing if there are any symbol names in the base commit that no longer exist in the new version (if an old symbol is being moved, there's nothing that can be done with aliases).
symdiff.py should detect updates to alias lists.
import_symbols_json.py and import_symbols_ntr_ghidra.py should create additional labels for aliases.
add_header_docstrings.py should support adding aliases alongside header docstrings.

Describe alternatives you've considered

Never rename symbols. Doesn't seem practical.
Start using some sort of review procedure around renames, and scrutinize renames more carefully. We should probably do this anyway, but it would probably be more effective when coupled with aliases. Just scrutinizing things would still require downstream repos to make changes to ensure compatibility, which is still cumbersome.
Duplicate symbols instead of renaming them. This is the same idea as aliases, but worse, leading to more duplication and potential issues with code that assumes function addresses are unique.

Additional context
This has been a long-running shortcoming of pmdsky-debug. It's probably about time to try to do something about it.

@theCapypara and @End45 raised concerns around #236, see discussion on that PR.
tech-ticks raised concerns around #230 and the effects of renaming compiler builtins on c-of-time.

symdiff doesn't work for recent revisions

Describe the bug
tools/symdiff.py errors on recent diffs.

To Reproduce
Steps to reproduce the behavior:

in tools run python3 symdiff.py bf40a32..HEAD

Expected behavior
I expect it to work.

Python environment
Python 3.10.8

aiodns==3.0.0
aiohttp==3.8.1
aiosignal==1.2.0
antlr4-python3-runtime==4.8
appdirs==1.4.4
argcomplete==2.0.0
astroid==2.11.5
async-timeout==4.0.2
attrs==21.4.0
black==22.3.0
Brotli==1.0.9
bsdiff4==1.2.2
cairocffi==1.3.0
CairoSVG==2.5.2
cchardet==2.1.7
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.12
click==8.0.3
colorama==0.4.4
crcmod==1.7
cssselect2==0.4.1
defusedxml==0.7.1
dill==0.3.5
-e git+ssh://[email protected]/SkyTemple/dungeon-eos.git@069bffe4290b6ff746e46b112059bd5c713f36ec#egg=dungeon_eos
execnet==1.9.0
-e git+ssh://[email protected]/SkyTemple/ExplorerScript.git@ca2502370eab78aba95e243225e17380b6413c9e#egg=explorerscript
frozenlist==1.3.0
gbulb==0.6.2
gitdb==4.0.9
gitdb2==4.0.2
GitPython==3.0.8
gql==3.3.0
graphql-core==3.2.1
idna==3.3
igraph==0.9.9
importlib-metadata==4.11.3
iniconfig==1.1.1
isort==5.10.1
Jinja2==3.1.2
jsonschema==4.1.2
keystone-engine==0.9.2
lazy-object-proxy==1.7.1
libcst==0.4.5
lru-dict==1.1.8
lxml==4.9.0
MarkupSafe==2.1.1
mccabe==0.7.0
mido==1.2.10
multidict==6.0.2
mypy==0.931
mypy-extensions==0.4.3
natsort==8.1.0
ndspy==3.0.0
nest-asyncio==1.5.4
ninfs==1.6.1
ordered-set==4.1.0
packaging==21.3
parameterized==0.8.1
pathspec==0.9.0
pep517==0.6.0
Pillow==9.1.1
platformdirs==2.5.2
pluggy==1.0.0
pmdsky-debug-py @ file:///home/marco/dev/skytemple/skytemple/pmdsky_debug_py/src
-e git+ssh://[email protected]/SkyTemple/pmdsky-debug-py.git@df846f7bd3aa4876a3e36d13d7b2be624baf83e4#egg=pmdsky_debug_py_generator&subdirectory=generator
psutil==5.9.0
py==1.11.0
# Editable install with no version control (py-desmume==0.0.5)
-e /home/marco/.virtualenvs/skytemple/lib/python3.10/site-packages
pycairo==1.20.1
pycares==4.1.2
pycln==1.3.5
pycparser==2.21
pycryptodomex==3.14.1
pyenchant==3.2.2
pygal==3.0.0
PyGObject==3.42.2
pygtkspellcheck==5.0.1
pylint==2.13.9
pyparsing==3.0.7
pypresence==4.2.1
pyrsistent==0.18.1
pytest==7.0.1
pytest-dotenv==0.5.2
pytest-forked==1.4.0
pytest-xdist==2.5.0
python-dateutil==2.8.2
python-dotenv==0.19.2
-e git+ssh://[email protected]/Parakoopa/python-for-android.git@42be0f74e17035f27c0c5f54232d382d73401a13#egg=python_for_android
pytoml==0.1.21
PyYAML==6.0
range-typed-integers==1.0.1
requirements-parser==0.5.0
sentry-sdk==1.5.6
sf2utils==0.9.0
sh==1.14.2
six==1.16.0
-e git+ssh://[email protected]/SkyTemple/skytemple.git@6ff7193334b5d4461707f5f6691eeef3268d5e4c#egg=skytemple
skytemple-3rdparty-typestubs==1.0.0.post2
-e git+ssh://[email protected]/SkyTemple/skytemple-dtef.git@282559b882cabd3def6e695a85af84337cae9d35#egg=skytemple_dtef
-e git+ssh://[email protected]/SkyTemple/skytemple-files.git@8615defeb2afd095364f87d942b0a7a136f1109a#egg=skytemple_files
-e git+ssh://[email protected]/SkyTemple/skytemple-icons.git@352f12fb83abf54bf5c7a4af163b50bc4931aa07#egg=skytemple_icons
-e git+ssh://[email protected]/SkyTemple/skytemple-randomizer.git@abcbcaa8a53520367106f2bab48a86638b264ca2#egg=skytemple_randomizer
skytemple-rust==1.4.0
-e git+ssh://[email protected]/SkyTemple/skytemple-ssb-debugger.git@ef9346c6fc53ffd58687df34d85ff9c066462c74#egg=skytemple_ssb_debugger
smmap==5.0.0
sortedcollections==2.1.0
sortedcontainers==2.4.0
strictyaml==1.6.1
texttable==1.6.4
# Editable install with no version control (tilequant==0.4.1)
-e /home/marco/.virtualenvs/skytemple/lib/python3.10/site-packages
tinycss2==1.1.1
toml==0.10.2
tomli==2.0.1
tornado==6.1
typer==0.4.1
types-Pillow==8.3.8
types-PyYAML==6.0.4
types-setuptools==57.4.9
types-toml==0.10.7
typing-inspect==0.7.1
typing_extensions==4.1.1
urllib3==1.26.8
webencodings==0.5.1
wrapt==1.14.1
xmldiff==2.4
xmltodict==0.13.0
yarl==1.7.2
yq==2.14.0
zipp==3.8.0

Use reverse chronological order for symbol versions

Explorers of Sky released, from latest to earliest, for EU, NA, and JP. It would make sense to use the same order for the symbol table versions so that "newer" versions take precedence for sorting, especially since it's likely the EU binary added more new symbols than it removed, as @End45 pointed out in #47.

Almost all of the symbols already have EU addresses. The only missing ones to fill in are C_ROUTINES and OBJECTS in overlay 11.

Undocumented struct monster_summary

Existing information
The function 0x22F89CC (NA) appears to use an undocumented struct that is essentially a summary or snapshot of the monster's current condition. From what I can tell it is used when you open up the summary page and exist in the dungeon struct at +0x2CA7C when getting out of a dungeon. Because the struct is undocumented, I wanted to double check with others that my observations were correct before pushing to the repo. Also want to ensure that the struct and its contents are reasonably named and I didn't miss anything obvious during my analysis.

The function begins by copying basic information (hp, exp, species, etc) into the struct and then making a list of enum_status_id's (up to 0x1D, the 0x1Eth status is set to 0 as a kind of terminator).

// Please ignore minor mistakes, they will be corrected before pushing to the repo.
struct monster_summary{
    struct monster_id_16 id; // 0x0
    char  monster_name[10]; // 0x2: Exact size is a guess
    undefined field_0xD;
    undefined field_0xE;
    undefined field_0xF;
    undefined field_0x10;
    undefined field_0x11;
    undefined field_0x12;
    undefined field_0x13;
    undefined field_0x14;
    undefined field_0x15;
    struct type_id_8 types[2]; // 0x16
    struct ability_id_8 abilities[2]; // 0x18
    struct dungeon_id_8 joined_at; // 0x1A
    uint8_t joined_at_floor; // 0x1B
    struct item held_item; // 0x1C
    int32_t hp; // 0x24:
    int32_t max_hp; // 0x28: Actual max HP
    uint32_t level; // 0x2C
    int exp; // 0x30
    uint8_t offensive_stats[2]; // 0x34: {atk, sp_atk}
    uint8_t defensive_stats[2]; // 0x36: {def, sp_def}
    bool is_team_leader; // 0x38
    uint8_t attack_boost; // 0x39: from things like Power Band, Munch Belt
    uint8_t special_attack_boost; // 0x3A
    uint8_t defense_boost; // 0x3B
    uint8_t special_defense_boost; // 0x3C
    undefined field_0x3D;
    int16_t iq; // 0x3E
    undefined field_0x40;
    undefined field_0x41;
    undefined field_0x42; // Set to 0.
    undefined field_0x43; // Set to 0.
    // 0x44: Appears to be some kind of evolution status related number?
    // If DUNGEON_PTR + 0x75A (unknown) is 0, it gets set to 3.
    // Otherwise gets set to output from GetMonsterEvoStatus.
    uint8_t some_evo_status;
    bool inflicted_with_gastro_acid; // 0x45
    undefined field_0x46;
    undefined field_0x47;
    uint32_t iq_skill_menu_flags[3]; // 0x48
    struct tactic_id_8 tactic; // 0x54
    undefined field_0x55;
    undefined field_0x56;
    undefined field_0x57;
    // 0x58: Appears to be a list of all the currently inflicted statues in their enum form. The last entry (30th)
    // appears to always be STATUS_NONE to serve as a terminator for the list.
    enum status_id active_statuses[30];
};
ASSERT_SIZE(struct monster_summary, 0x76);

A few notes:

It looks like the function that lists all the stat boosts doesn't do anything with the extra hp (probably because it's already in max_hp_boost for the monster).
The max hp is calculated twice, once for storing into the struct and once for the low health status.
The values for the boosts for stats (0x39 -> 0x3C) is initialized to 0 and function 0x205A40 adds the boosts to 0 instead of setting the value.

IPC System

Existing information
The game implements an interface over the IPC (Inter Process Communication) system the hardware features, which simplifies sending data between both processors for different purposes, over a total of 32 channels. Currently I've only possibly identified one of those channels being used for the purpose of setting the Sound Registers, which only the ARM7 can manipulate, although the system seems to be used with plenty of other channels; who knows on what we could be missing!

The way the messages are sent is very simple: 5 bits for the channel (giving the total of 32 channels), 1 bit for an error flag, and the remaining 26 bits for the payload. Obviously that's not a lot of space, so what the game does is usually using those 26 bits for passing a pointer to the data that wants to be sent

Relevant functions, symbols and addresses

enum processor {
ARM9 = 0,
ARM7 = 1
};

enum ipc_send_result {
FIFO_FULL = -2,
ERROR = -1
OK = 0
};

enum ipc_channel {
SOUND = 7
};

struct ipc_message {
uint8_t channel : 5;
bool error : 1;
uint32_t payload : 26;
};

typedef void ChannelReceiveHandler(enum ipc_channel channel, uint32_t payload, bool error);

// -- ARM9
void __veneer_InitIPC(void); //@ 0x207d9a4
void InitIPC(void); //@ 0x207d9b0
void OpenChannel(enum ipc_channel channel, ChannelReceiveHandler* handler); //@ 0x207dab0
bool IsChannelOpen(enum ipc_channel, enum processor processor); //@ 0x207dafc
ipc_send_result SendChannel(enum ipc_channel, uint32_t payload, bool is_error); //@ 0x207db20
void ReceiveIPC(void); //@ 0x207dba8
// This function doesn't seem to be called manually under normal circumstances, instead it's called via IRQ

uint16_t InitializedIPC; //@ 0x22bb564
ChannelReceiveHandler* OpenChannelMessageHandlers[32]; //@ 0x22bb568
// A mapping of functions that each open channel may call to process incoming messages.
// Closed channel functions are NULL

// -- ARM7
// Most functions are similar or identical to the ARM9, so their notes apply here
void __veneer_InitIPC(void); //@ 0x37fe2b4
void InitIPC(void); //@ 0x37fe2c0
void OpenChannel(enum ipc_channel channel, ChannelReceiveHandler* handler); //@ 0x37fe39c
bool IsChannelOpen(enum ipc_channel, enum processor processor); //@ 0x37fe3ec
ipc_send_result SendChannel(enum ipc_channel, uint32_t payload, bool is_error); //@ 0x37fe410
ipc_send_result SendMessage(struct ipc_message message); //@ 0x37fe448
void ReceiveIPC(void); //@ 0x37fe4a8

uint16_t InitializedIPC; //@ 0x380785c
ChannelReceiveHandler* OpenChannelMessageHandlers[32]; //@ 0x3807860

// -- Shared
uint32_t OpenChannels[2]; //@ 0x27fff88
// Represents all IPC channels that have a handler assigned to them (aka, open), one bit per channel
// Sending a message to a closed channel will bounce the message with an error
// This is an array; index 0 represents ARM9 channels, index 1 are ARM7 channels

Duplicate addresses for GUMMI_IQ_STRING_IDS and GUMMI_LIKE_STRING_IDS

Describe the bug
In overlay 29, GUMMI_IQ_STRING_IDS and GUMMI_LIKE_STRING_IDS both point to NA 0x23532D0, creating a conflict.

To Reproduce
Open symbols/overlay29.yml and find GUMMI_IQ_STRING_IDS.

Expected behavior
GUMMI_IQ_STRING_IDS should have the correct address and not a duplicate.

Screenshots
N/A

Desktop environment
N/A

Additional context
The actual value within the ROM has a size of 8, which matches GUMMI_LIKE_STRING_IDS. GUMMI_IQ_STRING_IDS has a size of 0xA, so this one likely has a mislabeled address.

Hidden stairs

Just dumping my findings from this Discord thread before I forget about everything.

Existing information
Various symbols related to hidden stairs, especially hidden_stairs_pos in dungeon_generation_info and hidden_stairs_type

Relevant addresses (US)
The following was found by reverse engineering ExecuteMonsterAction switch case 0x26 (ACTION_USE_STAIRS):
0x02338708: bool IsHiddenStairsPosition(struct position* pos)
0x02338850: enum hidden_stairs_type GetHiddenStairsTypeCached()(retrieved from dungeon_generation_info)
0x02338898: void SetDungeonGenerationInfo0xc(enum hidden_stairs_type ty)
So dungeon_generation_info unknown 0xc seems to be the hidden stairs type that is used to load the next floor. Forcing this value to be 1 or 2 via memory editing would probably spawn the player in the Hidden Room or Hidden Bazaar.

Port symbols from Irdkwia's ASM.ods

The irdkwia_doc archive has a huge ASM.ods spreadsheet with tons of symbols that could be ported to pmdsky-debug. It'd probably be too much effort to stringently verify all of them, but it would still be useful to semi-blindly port them over as is, accompanied by notes in the description fields about their unverified status.

Sheets

Update ghidra-setup.md when Ghidra 10.2's ARM disassembler is fixed

Ghidra 10.2's ARM disassembler is currently broken, and a warning was added to ghidra-setup.md in #85. When Ghidra fixes the issue, the doc should be updated again to state which Ghidra version included the fix.

usernamefodder / pmdsky-debug Goto Github PK

pmdsky-debug's People

Contributors

Stargazers

Watchers

Forkers

pmdsky-debug's Issues

Existing information

Relevant addresses

Stuff from the ROM header

Stuff from arm5correlate.py

File names

Other

Context

Proposed solution

subregions field on blocks

Considerations

Additional repository changes

Alternative solutions

C/C++ style includes

Directories as aggregate entities

New file format

YAML inclusion with custom tag handles

Sheets

Recommend Projects

Recommend Topics

Recommend Org

Stuff from `arm5correlate.py`

`subregions` field on blocks

C/C++ style `includes`