The actual reason I stumbled across mold was, somehow, searching for information about

Personally I would like more discoverability declarat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

P.S. this doesn't fail if the .gbaheader or .gbamain sections are

Is there any update on this situation? Is a linker language possible/planned?</

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Placing .bss into data segment is as easy as this is: <a href="https://gist.github.com

Possibility of a sane linker script language?,about rui314/mold

Comments (19)

rui314 commented on July 18, 2024 12

I have to think very carefully how to satisfy users' need that is currently (more or less) satisfied with linker scripts. The current GNU-style linker script is too complicated, and I have no plan to support it as-is. Being said that, I still have to support a subset of it, as (at least on Linux) /usr/lib/x86_64-linux-gnu/libc.so is not a shared object file but actually an ASCII text containing a linker script.

I have a few random ideas:

Since mold is pretty small (just about 5000 lines of code), it is easy to modify its source code to output whatever you want. This is a crazy idea but may not a bad idea because it is (in some sense) most straightforward to do whatever you want. You can just use C++ instead of an underdocumented, domain-specific language.
I have to investigate how users are using linker scripts. If I can identify common patterns, I can implement them instead of implementing the whole language.
Embedding a language, such as Lua as you wrote, or even Javascript, is an option. I'm not happy to do that, but if it turns out to be the best option, I wouldn't turn it down.
We might want to run a "script" as an external process. I can add a feature to mold to dump input section names, input section sizes, alignments, etc, in a text file format so that an external program can compute a layout. The external program then dump the exact layout in a text file format, which is consumed by mold to proceed.

from mold.

rui314 commented on July 18, 2024 2

I implemented an experimental feature as a replacement for the linker script language. It's not documented, but you can see how to use in this test file: https://github.com/rui314/mold/blob/main/test/elf/section-order.sh

The --section-order option allows you to specify section addresses and section layout order, as well as defining symbols at specific locations.

The problem here is that I don't know if the feature is towards the right direction, so I cannot make it an official feature of the linker yet.

from mold.

deliciouslytyped commented on July 18, 2024 1

Personally I would like

more discoverability
declarative, deterministic input/output
reproducible output
inspectability
reasonable escape hatches, caveat emptor, if you really really need to do something (and then collecting best practices :P)

from mold.

marler8997 commented on July 18, 2024

@rui314 I saw this thread a few months ago and I have a use case for you and would be interested in your opinion on this.

The linker script in question is here: https://github.com/wendigojaeger/ZigGBA/blob/291f66ff0fc70dc9e5ed4a33b8425e798e14357d/GBA/gba.ld

This linker script is able to take object files with sections for the header and code and link them to create a valid GameboyAdvance ROM file. I thought it was pretty clever when I saw it, however, I can't help but ask whether it would be simpler to implement this using a custom tool that just merges the files together. As far as I can tell the script isn't doing any relocation, but then again I don't quite understand everything the script is actually doing. It does appear to be doing some things with different sections, but again, it's hard to tell without spending enough time learning how GNU's linker script language works.

If I understood your comment above, it sounds like you would recommend axing this linker script and just writing a custom script or tool to combine the sections together. Is that right?

P.S. this script doesn't fail if the .gbaheader or .gbamain sections are missing, so if you don't have them you don't get an error till you try to use the ROM. Do you know if that can be fixed in the linker script? If not then that's a big reason not to use the script.

from mold.

ketsuban commented on July 18, 2024

P.S. this script doesn't fail if the .gbaheader or .gbamain sections are missing, so if you don't have them you don't get an error till you try to use the ROM. Do you know if that can be fixed in the linker script?

It's not possible with the script as written since the .gbaheader and .gbamain input sections get clumped into the .text output section, but if you include the ROM header/initialisation code in their own output section you can do something like ASSERT(SIZEOF(.init), "missing ROM header").

(I do GBA stuff in Rust and wrote my own linker script which I uploaded as a gist with some explanatory comments. It provides some flexibility as to where code and data get put since the GBA has multiple blocks of work RAM with different properties. This might be useful to you as another data point for "interesting things you can do with linker scripts".)

from mold.

Riteo commented on July 18, 2024

Is there any update on this situation? Is a linker script language possible/planned?

from mold.

medhefgo commented on July 18, 2024

The problem here is that I don't know if the feature is towards the right direction, so I cannot make it an official feature of the linker yet.

@rui314 I just played around with this to see if we can use this for systemd-boot. For various reasons we have to build EFI binaries as ELF and then convert to PE (cross-building is a pain, and not all arches can be built natively to PE). It would basically replace https://github.com/systemd/systemd/blob/main/tools/elf2efi.lds.

From what I can tell, what we want can mostly be done with:
--section-order='=0x0 !__ImageBase TEXT %0x200 RODATA %0x200 BSS %0x200 DATA %0x200 .sdmagic %0x200 .osrel %0x200 .sbat %0x200

But applying the section alignment is rather repetitive. Would be nice if we could use wildcards for section names --section-align=.*=0x200 or --section-order=%%0x200 TEXT DATA which would apply alignment to all following sections. Also, currently, section alignment doesn't seem to (always) get applied to TEXT/DATA/RODATA/BSS.

But we still need a means to merge sections so that we can put BSS into DATA. Both, gnu-efi and edk2 state that there are firmwares that don't handle BSS correctly. Though, they also merge got etc. into it too, which we also do. Maybe --section-merge=SECTION=OTHER1,OTHER2 or --section-order SECTION <OTHER,OTHER2?

Also, since there are several sections that we are going to discard anyways, it would be nice if we could move them to the end of our custom order so that the final vma address layout is more compact. Would be easy and generic if section names could be defined with wildcards, which you could put at the end of section-order. This way one could even do more specific layouts like .debug*.

from mold.

rui314 commented on July 18, 2024

@medhefgo Thank you for trying out that feature!

As to the repeated %0x200, you wanted to make sure that each segment starts at a 0x200 boundary and don't overlap in a 0x200 block? If so, -z max-page-size=0x200 -z separate-loadable-segments might work for you.

As to turning bss into data, we could simply add a new command line option to tell the linker to turn all bss sections into data.

As to /DISCARD/, debug sections doesn't consume VMA but instead they just consume the disk space. You can trim them off with strip. Did you have any memory-mapped-at-runtime section that you want to discard?

from mold.

rui314 commented on July 18, 2024

Placing .bss into data segment is as easy as this is: https://gist.github.com/rui314/1922c5f8b177b459be82aab601ffde9d

from mold.

medhefgo commented on July 18, 2024

If so, -z max-page-size=0x200 -z separate-loadable-segments might work for you.

We already set max and common page size to 4K, and manpage says separate-loadable-segments is the default. But that doesn't help anyways as PE doesn't have the concept of loadable segments. PE sections are the equivalent to ELF segments and must be aligned to page boundaries. So I also wouldn't mind something like sections-as-segments, that would make one ELF segment per ELF section (which then should already be aligned as we need due to page size setting).

As to turning bss into data, we could simply add a new command line option to tell the linker to turn all bss sections into data.

I suppose that would work. But I would prefer to merge sections as there is no way of telling if some borked up firmware can handle multiple rw sections. As it is now, EFI binaries with edk2/gnu-efi only have one rw data section. And microsoft does the same thing with their bootloader.

As to /DISCARD/, debug sections doesn't consume VMA but instead they just consume the disk space. You can trim them off with strip. Did you have any memory-mapped-at-runtime section that you want to discard?

While converting the ELF file, we already skip any sections we don't care for. Moving any sections we don't care for is mostly my OCD wanting to have a compact VMA space. Though, it would reduce runtime cost as the PE memory is allocated in one big allocation by EFI loaders and afaik, there is no virtual address space at this point (not that these few KB make a real difference, though).

from mold.

rui314 commented on July 18, 2024

@medhefgo In mold, we do not provide a way to control the layout of segments, and you can only control the layout of sections. mold automatically creates the minimum number of segments that make sense and covers all the given sections. I think I like this design because it is very straightforward. On the other hand, GNU ld allows users to specify both segment and section layout using linker script, and they can conflict with each other (i.e. you can easily specify impossible layout with the script.)

I guess when edk2/gnu-efi copies file contents from an ELF file to an PE file, it works based on sections? I think it should work based on segments instead, becasue that's how the Unix linker works. I.e. instead of copying .text or some other sections from ELF to PE, copy the RX segment from ELF to PE. That way, it doesn't matter whether .bss is in .data or not; as long as they are next to each other and in a RW segment, they are copied as a single piece of data.

from mold.

medhefgo commented on July 18, 2024

In mold, we do not provide a way to control the layout of segments

That's fine by me, and I would agree that linker scripts are an abomination. sections-as-segments was merely a suggestion as you mentioned separate-loadable-segments for alignment. But the ELF segments are already properly aligned, the problem are the sections inside it aren't. And I would prefer setting the alignment once instead of having to do it for every section explicitly (hence my original idea of allowing wildcards when using section-align/section-order).

I guess when edk2/gnu-efi copies file contents from an ELF file to an PE file, it works based on sections? I think it should work based on segments instead, becasue that's how the Unix linker works. I.e. instead of copying .text or some other sections from ELF to PE, copy the RX segment from ELF to PE. That way, it doesn't matter whether .bss is in .data or not; as long as they are next to each other and in a RW segment, they are copied as a single piece of data.

Well, it has to be section based copy as PE does not have a notion of segments. Copying like this would have to cram all sections that are part of a segment together into one PE section, no? But those sections need to be accessible on their own right. In particular .sbat, or shim will refuse to load it.

Right now bfd produces this with our linker script:

There are 24 section headers, starting at offset 0x5f650:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000001000 001000 01c5f8 00  AX  0   0  1
  [ 2] .rodata           PROGBITS        000000000001e000 01e000 004c35 00   A  0   0 32
  [ 3] .data             PROGBITS        0000000000023000 023000 0003b8 00  WA  0   0 32
  [ 4] .sdmagic          PROGBITS        0000000000024000 024000 000034 00   A  0   0 32
  [ 5] .osrel            PROGBITS        0000000000025000 025000 000051 00   A  0   0 32
  [ 6] .sbat             PROGBITS        0000000000026000 026000 0000e2 00   A  0   0 32
  [ 7] .dynsym           DYNSYM          00000000000260e8 0260e8 000018 18   A  8   1  8
  [ 8] .dynstr           STRTAB          0000000000026100 026100 000001 00   A  0   0  1
  [ 9] .dynamic          DYNAMIC         0000000000026108 026108 000100 10  WA  8   0  8
  [10] .gnu.hash         GNU_HASH        0000000000026820 026820 00001c 00   A  7   0  8
  [11] .note.gnu.build-id NOTE            000000000002683c 02683c 000024 00   A  0   0  4
  [12] .debug_info       PROGBITS        0000000000000000 026860 01e6bc 00      0   0  1
  [13] .debug_abbrev     PROGBITS        0000000000000000 044f1c 003954 00      0   0  1
  [14] .debug_aranges    PROGBITS        0000000000000000 048870 000450 00      0   0  1
  [15] .debug_rnglists   PROGBITS        0000000000000000 048cc0 000122 00      0   0  1
  [16] .debug_line       PROGBITS        0000000000000000 048de2 00939b 00      0   0  1
  [17] .debug_str        PROGBITS        0000000000000000 05217d 004cc9 01  MS  0   0  1
  [18] .debug_line_str   PROGBITS        0000000000000000 056e46 000525 01  MS  0   0  1
  [19] .debug_frame      PROGBITS        0000000000000000 057370 0038c0 00      0   0  8
  [20] .rela.dyn         RELA            0000000000026208 026208 000618 18   A  7   0  8
  [21] .symtab           SYMTAB          0000000000000000 05ac30 0034b0 18     22 538  8
  [22] .strtab           STRTAB          0000000000000000 05e0e0 001479 00      0   0  1
  [23] .shstrtab         STRTAB          0000000000000000 05f559 0000f2 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

Elf file type is EXEC (Executable file)
Entry point 0xc023
There are 6 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x001000 0x0000000000001000 0x0000000000001000 0x01c5f8 0x01c5f8 R E 0x1000
  LOAD           0x01e000 0x000000000001e000 0x000000000001e000 0x004c35 0x004c35 R   0x1000
  LOAD           0x023000 0x0000000000023000 0x0000000000023000 0x003860 0x003860 RW  0x1000
  DYNAMIC        0x026108 0x0000000000026108 0x0000000000026108 0x000100 0x000100 RW  0x8
  NOTE           0x02683c 0x000000000002683c 0x000000000002683c 0x000024 0x000024 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01     .rodata
   02     .data .sdmagic .osrel .sbat .dynsym .dynstr .dynamic .gnu.hash .note.gnu.build-id .rela.dyn
   03     .dynamic
   04     .note.gnu.build-id
   05

Which gets converted to this PE layout:

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         0001c5f8  0000000101301000  0000000101301000  00000400  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00004c35  000000010131e000  000000010131e000  0001ca00  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .data         000003b8  0000000101323000  0000000101323000  00021800  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  3 .sdmagic      00000034  0000000101324000  0000000101324000  00021c00  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .osrel        00000051  0000000101325000  0000000101325000  00021e00  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .sbat         000000e2  0000000101326000  0000000101326000  00022000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .reloc        0000008c  0000000101327000  0000000101327000  00022200  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

So if we copy by segment, we'd use segment 0 as .text, 1 as .rodata, 2 as .data. But then other EFI consumers would lose the ability to inspect the EFI binary and access .sdmagic, .osrel and .sbat as they would be crammed into one .data section.

from mold.

medhefgo commented on July 18, 2024

So I looked into this a little more and sure, one can use section-order+allocate-bss to get a segment list that is similar to the PE sections layout we want. But that still requires that special sections like .sbat etc are split to their own segments or we cannot make them addressable as specific PE sections. As I see it, that would still require a means to force certain sections to get their separate segments.

Ultimately, this would force the conversion script to understand both sections and segments so that we only copy segments/sections we care for and not copy things like .dnymic/.dynstr which are meaningless once we converted to PE. This all seems far more complicated than just walking through the section table itself and far less error prone.

from mold.

medhefgo commented on July 18, 2024

Wrt section merging: How about a means to rename sections instead? So if one were to pass --section-rename=.bss=.data (along with allocate-bss), the similar section merge process ought to kick in, no? This would then result in only one .data section. Similar renames could be done for got/plt etc, and if you add wildcard support to all --section parameters, they would be less cumbersome to use.

from mold.

medhefgo commented on July 18, 2024

@rui314 You might be qualified to answer this.

I'm considering dropping the linker scripts altogether and do the things it does as part of the post-processing tool. The binaries are static-pies and only contain relative relocations (there are no DSOs).

Now given that, is it legal to apply extra extra offsets to the output sections to satisfy the alignment requirements, possibly even doing negative offsets to reorder sections (which would then also allows merging sections). This all assumes we then apply the appropriate offsets when converting the relocations.

Would this break some assumption that the linker/compiler makes? (Also, if this can only be properly done by linking with -r and dealing with all the gazillion relocation types across different arches, I'd rather not bother.)

from mold.

rui314 commented on July 18, 2024

@medhefgo What do you mean by negative offsets? Section/segment offsets are addresses, so they are naturally non-negative. Are you asking if you can overlay sections at the same address?

from mold.

medhefgo commented on July 18, 2024

@medhefgo What do you mean by negative offsets? Section/segment offsets are addresses, so they are naturally non-negative. Are you asking if you can overlay sections at the same address?

Any offsets would only be done so that no sections would overlap and any original section alignment would still be satisfied:

.a @ 0x1000-0x1500
.b @ 0x1500-0x3000

To, for example, re-order these you'd apply +0x1500 to .a and -0x500 to .b and their relocations respectively:

.b @ 0x1000-0x2500
.a @ 0x2500-0x3000

(This is talking about the output VMAs.)

from mold.

medhefgo commented on July 18, 2024

Now that I think of it, re-arranging things during post-processing would break debugging (even if it's never used in practice) as we have to use the ELF binary for symbolization and therefore its must have identical memory layout...

from mold.

medhefgo commented on July 18, 2024

For anyone interested: I went with something similar to the suggested copying of ELF segments. I simply copy+concat any desired ELF section while adding padding to observe page and section alignments and splitting according to page permissions. It gives me the same result as copying by segments while making it easy to strip unwanted sections such as ELF dynamic linking stuff which has no use after conversion.

Since I cannot guaranteed whether special info sections like .sbat are not followed by other sections on the same page, we simply emit them as no-alloc and copy them over explicitly.

Now we can easily support any linker that has static pie and -z separate-code support without any linker scripts (yay) or linker specific args.

from mold.

Possibility of a sane linker script language? about mold HOT 19 OPEN

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent