stephenrkell / liballocs Goto Github PK

Meta-level run-time services for Unix processes... a.k.a. dragging Unix into the 1980s

Home Page: http://humprog.org/~stephen/research/liballocs

License: Other

Makefile 3.59% C 61.09% Shell 6.59% Assembly 0.01% Pascal 0.14% M4 0.82% C++ 14.46% Python 5.35% OCaml 7.05% Awk 0.54% Dockerfile 0.30% HTML 0.01% NASL 0.06%

liballocs's People

Contributors

Stargazers

Watchers

liballocs's Issues

bit-fields test failure

On at least Ubuntu 18.04.2, the bit-fields test fails with:

/usr/include/inttypes.h:351:102: warning: '__gnu_inline__' attribute ignored [-Wattributes]
   return __wcstol_internal (nptr, endptr, base, 0);
                                                                                                      ^  
/usr/include/inttypes.h:363:103: warning: '__gnu_inline__' attribute ignored [-Wattributes]
     __gwchar_t **__restrict endptr, int base))
                                                                                                       ^  
xed could not decode instruction at 0x00007f0dbd68a10d
xed could not decode instruction at 0x00007f0dbfe12b46
bit-fields: can't mmap the section headers for instruction at 0x7ffe39edc000, filename [vdso], load addr 0x7ffe39edc000
bit-fields: can't mmap the section headers for instruction before 0x7ffe39ede000, filename [vdso], load addr 0x7ffe39edc000
bit-fields: Warning: mapping of (null) could not extend preceding bigalloc
bit-fields: Warning: mapping of (null) could not extend preceding bigalloc
bit-fields: Warning: mapping of (null) could not extend preceding bigalloc
bit-fields: Warning: mapping of (null) could not extend preceding bigalloc
bit-fields: Warning: mapping of (null) could not extend preceding bigalloc
bit-fields: Warning: mapping of (null) could not extend preceding bigalloc
Assertion failed: u

and the stack trace is:

xed could not decode instruction at 0x00007ffff586e10d

Program received signal SIGILL, Illegal instruction.
0x00007ffff63ed8d2 in close () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff63ed8d2 in close () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff67e258a in vaddr_to_nearest_instruction (
    search_addr=search_addr@entry=0x7ffff7dd5000 "\177ELF\002\001\001", 
    fname=fname@entry=0x7fffffffa1c4 "/lib/x86_64-linux-gnu/ld-2.27.so", 
    backwards=backwards@entry=false, out_base_addr=out_base_addr@entry=0x7ffffff79118) at elfutil.c:166
#2  0x00007ffff67e1ac3 in trap_one_executable_region (begin=0x7ffff7dd5000 "\177ELF\002\001\001", 
    end=0x7ffff7dfc000 <error: Cannot access memory at address 0x7ffff7dfc000>, 
    filename=0x7fffffffa1c4 "/lib/x86_64-linux-gnu/ld-2.27.so", is_writable=<optimized out>, 
    is_readable=<optimized out>) at trap.c:262
#3  0x00007ffff67dfe4d in maybe_trap_map_cb (interpreter_fname_as_void=<optimized out>, 
    linebuf=0x7fffffffb1d0 "7ffff7dd5000-7ffff7dfc000 r-xp 00000000 08:01 138238", ' ' <repeats 21 times>, "/lib/x86_64-linux-gnu/ld-2.27.so\n7ffff7fcc000-7ffff7fd7000 rw-p 00000000 00:00 0 \n7ffff7fdd000-7ffff7fed000 rw-p 00000000 00:00"..., ent=0x7fffffffa1a0)
    at /home/jryans/Projects/liballocs/src/systrap.c:259
#4  process_one_maps_entry (cb=0x7ffff67cf530 <maybe_trap_map_cb>, arg=<optimized out>, 
    entry_buf=0x7fffffffa1a0, 
    linebuf=0x7fffffffb1d0 "7ffff7dd5000-7ffff7dfc000 r-xp 00000000 08:01 138238", ' ' <repeats 21 times>, "/lib/x86_64-linux-gnu/ld-2.27.so\n7ffff7fcc000-7ffff7fd7000 rw-p 00000000 00:00 0 \n7ffff7fdd000-7ffff7fed000 rw-p 00000000 00:00"...) at ../include/maps.h:257
#5  for_each_maps_entry (get_a_line=<optimized out>, bufsz=8192, 
    cb=0x7ffff67cf530 <maybe_trap_map_cb>, arg=<optimized out>, entry_buf=0x7fffffffa1a0, 
    linebuf=0x7fffffffb1d0 "7ffff7dd5000-7ffff7dfc000 r-xp 00000000 08:01 138238", ' ' <repeats 21 times>, "/lib/x86_64-linux-gnu/ld-2.27.so\n7ffff7fcc000-7ffff7fd7000 rw-p 00000000 00:00 0 \n7ffff7fdd000-7ffff7fed000 rw-p 00000000 00:00"..., handle=<synthetic pointer>) at ../include/maps.h:269
#6  __liballocs_systrap_init () at /home/jryans/Projects/liballocs/src/systrap.c:363
#7  0x00007ffff67cbaa3 in __mmap_allocator_init ()
    at /home/jryans/Projects/liballocs/src/allocators/mmap.c:980
#8  0x00007ffff67cc5b9 in global constructors keyed to 65535_1_systrap.o.9247 ()
    at /home/jryans/Projects/liballocs/src/allocators/stack.c:55
#9  0x00007ffff7de5733 in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7dd60ca in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000002 in ?? ()
#12 0x00007fffffffd7e0 in ?? ()
#13 0x00007fffffffd81c in ?? ()
#14 0x0000000000000000 in ?? ()

Both allocators and wrappers should have run-time identity

Currently there is some confusion about what an "allocator" is, owing to the presence of struct allocator instances that actually cover more than one allocator, such as __generic_malloc_allocator. All these struct allocator objects are statically defined -- we never generate them. If we could generate these and related structures, we could maintain a richer run-time model of allocators, allocator wrappers and allocation call sites. This would be useful for automating some meta-level policy, such as deleting an allocation when it's no longer needed. Currently the per-allocator free call is too stupid to allow this, as it can't identify which is the right free function to call when there are many alternatives (in cases where freeing and finalisation are baked into the same operation, for example).

Currently, link-time code generation (using the macros in tools/stubgen.h) is used to wrap each allocator wrapper (yes, two levels of wrapping) and also to wrap any linked-in definitions of malloc (all in allocscompilerwrapper.py). All this is a big mess that needs rationalising.

Each allocator instance should get its own struct allocator instance, which should be generated and linked into the object defining its "first" entry point (f.s.v.o. "first"... what I'm envisaging is rather like C++, where the translation unit defining the first virtual function is the one that gets the vtable).
Consequently, we no longer have a ___generic_malloc_allocator -- rather, each instance of malloc should have its own struct allocator. For example, if we link an executable that defines malloc, we generate a fresh struct allocator including implementations of its operations (which can call into the indexing code currently used by __generic_malloc_allocator, but acting on a separate index instance, which will have to be declared and statically constructed, also in the generated code).
At the same time as generating the struct allocator instance, we also link in any wrapper that are necessary to observe the allocator. For example, if we have a malloc definition, we link in wrappers that do the indexing. (See the mention of 'callee' wrappers in tools/stubgen.h and tools/allocscompilerwrapper.py.) This already happens, and probably needs to be maintained and generalised to other allocators besides malloc.
The libc is handled slightly specially, since it's preload-interposable and we don't expect the libc to be liballocs-compiled. So we have a struct allocator instance called __libc_malloc_allocator which is static, rather like the current __generic_malloc_allocator. And the libc malloc does not have 'callee wrappers' -- we use the preload mechanism instead. Since malloc and libc are special, there is probably not much to change here.
The proposed change assumes we have some way to identify when a link job contains an allocator definition. The simplest way is to match symbols by name -- e.g. we consider an allocator to be defined by a set of symbols such as {malloc, free, calloc, realloc}, and pick the first of these (malloc) as the one that triggers generation of the struct allocator instance. (The other may still require callee-side wrapping, though.) Optional symbols that map closely to one of the liballocs meta-operations, like malloc_usable_size, probably need to be recognised too. Currently, we effectively have hard-coded one such set of symbols.
As well as allocators themselves, each declared allocator wrapper should also (somehow) have a run-time identity. We could perhaps define a simple structure, and generate these at the same time as we generate the struct allocator instances (at link time). Currently we don't know which allocator is wrapped by a given wrapper; we might want to infer this dynamically by observation, and fill it in.
Similarly, allocator call sites should have a run-time identity. The new static metadata handling (when it's ready) will have a notion of call sites encompassing allocator calls, system calls and perhaps others. We may want to remember call chains; each node in a call chain, if it is a call from a particular site, should be able to link to the structure describing that site (e.g. the allocator wrapper call site's allocation type record) rather than just to the raw address.

Investigate porting to additional platforms

It would be good to check just how much work is involved in porting to platforms beyond Linux, such as:

FreeBSD
macOS

Incorrect calculation of bit index in bitmap_delete?

In generic_malloc.c:

bitmap_clear_l(bitmap, ((uintptr_t) userptr - (uintptr_t) info->bitmap_base_addr) / (MALLOC_ALIGN * BITMAP_WORD_NBITS))

...wouldn't dividing by BITMAP_WORD_NBITS not give us the intended bit index but instead the bitmap index within bitmap_word_t *bitmap? In fact, we divide by BITMAP_WORD_NBITS again in librunt/bitmap.h presumably to get the bitmap index there.

Proposing instead:
bitmap_clear_l(bitmap, ((uintptr_t) userptr - (uintptr_t) info->bitmap_base_addr) / MALLOC_ALIGN)

Uniqtype relationships don't capture 'which alias?', but might

This is a bit head-twisting.

When one uniqtype is related to another, we record this using a struct uniqtype_rel_info. These contain pointers to the related uniqtype.

I long ago took the decision that aliased types, say created by typedef, have the same identity, so the same address in memory and indeed the same structure. That still seems right. But they have different ELF symbols. In our program, unless it is statically linked and stripped (unlikely), those symbols exist, and sometimes it's useful to talk about them. For example, if I want to pretty-print structures, and say I'm printing an instance of struct stat, it's nice to know that the field st_mode is an mode_t not merely an unsigned int, and so on.

So maybe our related stuff should additionally reference the ELF symbols, somehow. I think there is room. Each related structure is two words in size. Probably a post-pass on the meta-DSO could fill in the dynsym offsets. Since a uniqtype pointer needs only 44 bits even on a 64-bit machine, we have room for this already.

One problem is that there are too many symbols, and we may make that worse. In a program with many libraries and many meta-DSOs, we already have many copies of recurring uniqtypes and their sometimes-many aliases. All this makes for large dynsyms and potentially a lot of wasted space that we might be wanting to optimise away, rather than further rely on. Need to revisit this once we get serious about memory usage and uniqtype compactness (#12). It may also affect alias semantics (#18).

realloc-multi-union test failure

On at least Ubuntu 18.04.2, the test realloc-multi-union fails with:

/usr/include/inttypes.h:351:102: warning: '__gnu_inline__' attribute ignored [-Wattributes]
   return __wcstol_internal (nptr, endptr, base, 0);
                                                                                                      ^  
/usr/include/inttypes.h:363:103: warning: '__gnu_inline__' attribute ignored [-Wattributes]
     __gwchar_t **__restrict endptr, int base))
                                                                                                       ^  
xed could not decode instruction at 0x00007f78203f010d
xed could not decode instruction at 0x00007f7822b78b46
realloc-multi-union: can't mmap the section headers for instruction at 0x7fff751fa000, filename [vdso], load addr 0x7fff751fa000
realloc-multi-union: can't mmap the section headers for instruction before 0x7fff751fc000, filename [vdso], load addr 0x7fff751fa000
realloc-multi-union: Warning: mapping of (null) could not extend preceding bigalloc
realloc-multi-union: Warning: mapping of (null) could not extend preceding bigalloc
realloc-multi-union: Warning: mapping of (null) could not extend preceding bigalloc
realloc-multi-union: Warning: mapping of (null) could not extend preceding bigalloc
realloc-multi-union: Warning: mapping of (null) could not extend preceding bigalloc
Segmentation fault (core dumped)

and stack trace at failure is:

xed could not decode instruction at 0x00007ffff586e10d

Program received signal SIGILL, Illegal instruction.
0x00007ffff63ed8d2 in close () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff63ed8d2 in close () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff67e258a in vaddr_to_nearest_instruction (
    search_addr=search_addr@entry=0x7ffff7dd5000 "\177ELF\002\001\001", 
    fname=fname@entry=0x7fffffffa184 "/lib/x86_64-linux-gnu/ld-2.27.so", 
    backwards=backwards@entry=false, out_base_addr=out_base_addr@entry=0x7ffffff790d8) at elfutil.c:166
#2  0x00007ffff67e1ac3 in trap_one_executable_region (begin=0x7ffff7dd5000 "\177ELF\002\001\001", 
    end=0x7ffff7dfc000 <error: Cannot access memory at address 0x7ffff7dfc000>, 
    filename=0x7fffffffa184 "/lib/x86_64-linux-gnu/ld-2.27.so", is_writable=<optimized out>, 
    is_readable=<optimized out>) at trap.c:262
#3  0x00007ffff67dfe4d in maybe_trap_map_cb (interpreter_fname_as_void=<optimized out>, 
    linebuf=0x7fffffffb190 "7ffff7dd5000-7ffff7dfc000 r-xp 00000000 08:01 138238", ' ' <repeats 21 times>, "/lib/x86_64-linux-gnu/ld-2.27.so\n7ffff7fcc000-7ffff7fd7000 rw-p 00000000 00:00 0 \n7ffff7fdd000-7ffff7fed000 rw-p 00000000 00:00"..., ent=0x7fffffffa160)
    at /home/jryans/Projects/liballocs/src/systrap.c:259
#4  process_one_maps_entry (cb=0x7ffff67cf530 <maybe_trap_map_cb>, arg=<optimized out>, 
    entry_buf=0x7fffffffa160, 
    linebuf=0x7fffffffb190 "7ffff7dd5000-7ffff7dfc000 r-xp 00000000 08:01 138238", ' ' <repeats 21 times>, "/lib/x86_64-linux-gnu/ld-2.27.so\n7ffff7fcc000-7ffff7fd7000 rw-p 00000000 00:00 0 \n7ffff7fdd000-7ffff7fed000 rw-p 00000000 00:00"...) at ../include/maps.h:257
#5  for_each_maps_entry (get_a_line=<optimized out>, bufsz=8192, 
    cb=0x7ffff67cf530 <maybe_trap_map_cb>, arg=<optimized out>, entry_buf=0x7fffffffa160, 
    linebuf=0x7fffffffb190 "7ffff7dd5000-7ffff7dfc000 r-xp 00000000 08:01 138238", ' ' <repeats 21 times>, "/lib/x86_64-linux-gnu/ld-2.27.so\n7ffff7fcc000-7ffff7fd7000 rw-p 00000000 00:00 0 \n7ffff7fdd000-7ffff7fed000 rw-p 00000000 00:00"..., handle=<synthetic pointer>) at ../include/maps.h:269
#6  __liballocs_systrap_init () at /home/jryans/Projects/liballocs/src/systrap.c:363
#7  0x00007ffff67cbaa3 in __mmap_allocator_init ()
    at /home/jryans/Projects/liballocs/src/allocators/mmap.c:980
#8  0x00007ffff67cc5b9 in global constructors keyed to 65535_1_systrap.o.9247 ()
    at /home/jryans/Projects/liballocs/src/allocators/stack.c:55
#9  0x00007ffff7de5733 in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7dd60ca in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000002 in ?? ()
#12 0x00007fffffffd7aa in ?? ()
#13 0x00007fffffffd7f8 in ?? ()
#14 0x0000000000000000 in ?? ()

Make fails on Ubuntu 18.04

I tried to build liballocs to use it with libcrunch, but the make step failed. To make the issue reproducible, I created a Dockerfile:

FROM ubuntu:18.04

RUN apt-get update && apt-get install -y bison flex texinfo git build-essential
RUN git clone https://github.com/stephenrkell/binutils-gdb.git
RUN cd binutils-gdb && \
   chmod +x configure && \
   CFLAGS="-fPIC -g -O2" ./configure --prefix=/usr/local \
     --enable-gold --enable-plugins --enable-install-libiberty && \
   make -j4 && make install
RUN apt-get update && \
   apt-get install -y libelf-dev \
       autoconf automake libtool pkg-config autoconf-archive \
       g++ ocaml ocaml-findlib \
       default-jre-headless \
       make git gawk gdb \
       libunwind-dev libc6-dev-i386 zlib1g-dev libc6-dbg \
       libboost-iostreams-dev libboost-regex-dev libboost-serialization-dev libboost-filesystem-dev
RUN git clone https://github.com/stephenrkell/liballocs.git
RUN apt-get update && apt-get install -y python
RUN cd liballocs && \
   git submodule init && \
   git submodule update && \
   make -C contrib -j1

The last step fails with the following error message:

libtool: compile:  g++ -DPACKAGE_NAME=\"libdwarfpp\" -DPACKAGE_TARNAME=\"libdwarfpp\" -DPACKAGE_VERSION=\"0.1\" "-DPACKAGE_STRING=\"libdwarfpp 0.1\"" -DPACKAGE_BUGREPORT=\"[email protected]\" -DPACKAGE_URL=\"\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DPACKAGE=\"libdwarfpp\" -DVERSION=\"0.1\" -DHAVE_CXX14=1 -DHAVE_PTRDIFF_T=1 -DHAVE_ALGORITHM=1 -DHAVE_CASSERT=1 -DHAVE_CSTDLIB=1 -DHAVE_ELF_H=1 -DHAVE_FUNCTIONAL=1 -DHAVE_IOSTREAM=1 -DHAVE_ITERATOR=1 -DHAVE_LIBELF_H=1 -DHAVE_LIMITS=1 -DHAVE_MAP=1 -DHAVE_MEMORY=1 -DHAVE_QUEUE=1 -DHAVE_SET=1 -DHAVE_STACK=1 -DHAVE_STRING=1 -DHAVE_STRINGS_H=1 -DHAVE_TYPE_TRAITS=1 -DHAVE_UNISTD_H=1 -DHAVE_UNORDERED_MAP=1 -DHAVE_UNORDERED_SET=1 -DHAVE_UTILITY=1 -DHAVE_VECTOR=1 -DHAVE_ELF_H=1 -DHAVE_LIBELF_H=1 -DHAVE__LIBALLOCS_CONTRIB_LIBDWARFPP_CONTRIB_LIBDWARF_PREFIX_INCLUDE_DWARF_H=1 -DHAVE__LIBALLOCS_CONTRIB_LIBDWARFPP_CONTRIB_LIBDWARF_PREFIX_INCLUDE_LIBDWARF_H=1 "-DHAVE_BOOST=/**/" -DHAVE_LIBBOOST_IOSTREAMS=1 -DHAVE_LIBBOOST_REGEX=1 -DHAVE_LIBBOOST_SERIALIZATION=1 -DHAVE_LIBBOOST_SYSTEM=1 -I. -fno-omit-frame-pointer -std=c++14 -ggdb3 -fvar-tracking-assignments -O2 -fkeep-inline-functions -Wall -Wno-deprecated-declarations -Iinclude -Iinclude/dwarfpp -I/liballocs/contrib/libdwarfpp/contrib/libsrk31c++/include -I/liballocs/contrib/libdwarfpp/contrib/libc++fileno/include -I/liballocs/contrib/libdwarfpp/contrib/libdwarf/prefix/include -std=c++14 -I/liballocs/contrib/usr/include -MT src/dies.lo -MD -MP -MF src/.deps/dies.Tpo -c src/dies.cpp  -fPIC -DPIC -o src/.libs/dies.o

(deleted warnings)

src/dies.cpp: In member function 'virtual dwarf::spec::opt<dwarf::core::type_scc_t> dwarf::core::type_die::get_scc() const':
src/dies.cpp:2560:62: error: no matching function for call to 'dwarf::spec::opt<std::shared_ptr<dwarf::core::type_scc_t> >::opt(std::shared_ptr<dwarf::core::type_scc_t>)'
       = opt<shared_ptr<type_scc_t> >(shared_ptr<type_scc_t>());
                                                              ^
(deleted warnings)

Makefile:695: recipe for target 'src/dies.lo' failed
make[1]: Leaving directory '/liballocs/contrib/libdwarfpp'
make[1]: *** [src/dies.lo] Error 1
make: *** [build-libdwarfpp] Error 2
Makefile:85: recipe for target 'build-libdwarfpp' failed
make: Leaving directory '/liballocs/contrib'
The command '/bin/sh -c cd liballocs &&    git submodule init &&    git submodule update &&    make -C contrib -j1' returned a non-zero code: 2
Makefile:13: recipe for target 'build' failed
make: *** [build] Error 2

Do you have an idea what goes wrong? The version of GCC is gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0.

uniqtype-make-precise test failure

On Ubuntu 18.04.2, uniqtype-make-precise fails with the stack trace:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00005555555560de in main () at uniqtype-make-precise.c:27

It looks like make_precise is null.

Generated uniqtype names for anonymous structs/unions are not unique enough

This is a problem with DWARF which we inherit.

When a type (typically a struct or union) is anonymous in the DWARF, we generate a name for it from its source file name and path.

However, since a given source file has many pathname aliases, and since DWARF records pathnames not file identities, we can get many copies of the same anonymous type.

Is this a problem? The same file included in a different preprocessor context might generate a different definition. However, that is for the typecode to disambiguate, not the type name. So yes, it is a problem we would like to avoid.

For example, on my glibc build, I get

__uniqtype_10222f7e__build_glibc_S9d2JN_glibc_2_27_shadow____sysdeps_nptl_internaltypes_h_174
__uniqtype_10222f7e__build_glibc_S9d2JN_glibc_2_27_signal____sysdeps_nptl_internaltypes_h_174
__uniqtype_10222f7e__build_glibc_S9d2JN_glibc_2_27_socket____sysdeps_nptl_internaltypes_h_174
__uniqtype_10222f7e__build_glibc_S9d2JN_glibc_2_27_stdlib____sysdeps_nptl_internaltypes_h_174

... and many others. All are clearly the very same definition, but aren't uniquely identified by their path. It looks like this is because of '..' elements in the pathname, which get translated to underscores. It could also be caused by symlinking in the source tree.

Some possible ways to deal with this:

forget the pathname and just pay attention to the filename, relying on the typecode to disambiguate aliases from coincidentally namesake include files
process out '..' elements from pathnames
strongly canonicalise source pathnames, if we are running at build time (cf. the glibc case above, where we got the debug info from a binary package)

I lean towards the first option. One drawback is that accidental namesakes are not totally implausible, and could create merge the identities of types that should be nominally distinct. However, this would still be very rare indeed.

Derivation of runtime type information from ctf

Given the recent changes in e.g. binutils adding intial support for the ctf
Compact Type Format, I'm not sure the current status of across gcc/ld in tree or releases yet

Anyhow given that liballocs derives its runtime information from DWARF, If it were possible to derive this information from Compact Type Format instead or as well, that might be nice given its more compact format. Since the intent seems to be that due to their minimal size these can have a more wide availability in normal release builds.

Edit: I think it's probably not possible, it seems like CTF is limited to type information about symbols, while liballocs would seem to need type information at call-sites which perform allocations?

By all means close if it's too much work invested in dwarf/not possible/whatever. Mostly just thought of liballocs while reading about this new/upcoming feature.

Definite-length array uniqtypes should not exist

It's not a good use of space to create a new uniqtype for every existing size of array. Better is to make clients work in terms of 'memory ranges' not memory addresses, in the cases where they care about array lengths. Then the size of the memory range determines the array length, and we don't need a separate type -- a single __uniqtype____ARR_T type will do for all lengths of array of T. Since we already know allocation sizes, we have the length information already on a per-allocation basis; moving it out of per-type information is just an interface change. For us that means:

refactor struct uniqtype's representation of composite types, so that subobjects (contained objects) specify not only an offset but also a length;
update clients (libcrunch, and Guillaume's CPython extension) accordingly. In some cases where currently a struct uniqtype * together with a void* will do, now also a length will be needed.

Building on Linux/License Issue

Hello,

I'm unable to get this to build on my Ubuntu x64 box. In particular, your dwarfpp project doesn't won't run autoconf without failing.

I'm also wondering whether you have a license for these, as I was thinking about using them to build a scheme/guile/racket library, presuming I can get it to build.

Thanks

Internal documentation is lacking

The whole codebase really needs me to do a brain dump in each .c file, explaining what goes on in each. A lot of non-obvious things 'obvious to me' are not stated anywhere.

For the 'overall' picture, the Onward! 2015 paper in theory covers this, but it doesn't do a great job and some of its details are outdated. It may be time to write a follow-up paper focusing on experience. Some things I can think of are the following.

initialization / bootstrapping being non-obviously hard
uniqtypes are allocators
memtables not being a good idea (yet) in practice
the generalised notion of reference
array uniqtypes going away? the story of that
stability in extensions of system interfaces: why trailers are better than headers
richer versions of system interfaces (libunwind, dladdr, maybe libdlbind counts here?)
the story of the custom dynamic linker
some of the content from my [forthcoming] "how to hook malloc, really, really, really" blog post
something about rejigging how allocation sites are classified, if I get on to that
something about my 'working underneath libc' and the contrast with sanitizer runtimes

stack-walk test failure

On at least Ubuntu 18.04.2, the test stalk-walk fails at build time with:

$ make gdbrun-stack-walk
make -C "stack-walk" `if test -e /home/jryans/Projects/liballocs/tests//stack-walk/mk.inc; then /bin/echo -f mk.inc; else true; fi` "stack-walk" && ( cd "stack-walk" && make `if test -e /home/jryans/Projects/liballocs/tests//stack-walk/mk.inc; then /bin/echo -f mk.inc; else true; fi` -f ../Makefile _onlygdbrun-stack-walk )
make[1]: Entering directory '/home/jryans/Projects/liballocs/tests/stack-walk'
/home/jryans/Projects/liballocs/tools/lang/c/bin/allocscc -L\/home/jryans/Projects/liballocs/contrib/libantlr3c-3.4/.libs -Wl,-rpath,\/home/jryans/Projects/liballocs/contrib/libantlr3c-3.4/.libs -L/home/jryans/Projects/liballocs/tests/../lib -L/home/jryans/Projects/liballocs/tests/../src  stack-walk.o   -lallocs -o stack-walk
/home/jryans/Projects/binutils-gdb/build/bin/ld: stack-walk.linked.o: in function `__liballocs_walk_stack':
/home/jryans/Projects/liballocs/tests/../include/liballocs.h:771: undefined reference to `_Ux86_64_getcontext'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:772: undefined reference to `_Ux86_64_init_local'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:774: undefined reference to `_Ux86_64_get_reg'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:778: undefined reference to `_Ux86_64_get_reg'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:788: undefined reference to `_Ux86_64_get_reg'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:789: undefined reference to `_Ux86_64_get_reg'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:791: undefined reference to `_Ux86_64_get_reg'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:797: undefined reference to `_Ux86_64_step'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:800: undefined reference to `_Ux86_64_get_reg'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:801: undefined reference to `_Ux86_64_get_reg'
/home/jryans/Projects/binutils-gdb/build/bin/ld: /home/jryans/Projects/liballocs/tests/../include/liballocs.h:803: undefined reference to `_Ux86_64_get_reg'
collect2: error: ld returned 1 exit status
<builtin>: recipe for target 'stack-walk' failed

Meta objects should be bundle-able into the output binary

Keeping metadata in a /usr/lib/meta hierarchy has its pros and cons. As an alternative, it should be possible to bundle the metadata with the output binary. I have prototyped some hacks which provide a basis for this, in my elftin repo (https://github.com/stephenrkell/elftin). More specifically, the "embed-loadable" example shows how to bundle an ELF file opaquely into a containing ELF file.

This embedded ELF file can itself be a loadable .so file... this is directly loadable so long as the ld.so supports dlopening from a file descriptor. Sadly glibc's ld.so doesn't expose this, but internally it more-or-less implements it; elftin also has some non-robust hacks that make this work on a particular build of the now-outdated glibc 2.19 (in the ldso-helper.c file). The way to make this more robust, over multiple versions of glibc, is to use reflection on the ld.so itself. We already generate metadata for the ld.so; it is probably possible to use that to obtain the entry points we need, instead of the hacky offset-based code in ldso-helper.c.

Built executables hang in stack/heap queries

(reported by @clearyf -- with thanks!)

Note that the resulting executables still don't work fully, eg the sample test.c in README.md is broken, produces:

$ LD_PRELOAD=/usr/local/src/liballocs/lib/liballocs_preload.so ./test
xed could not decode instruction at 0x0000557aa6486a2a
xed could not decode instruction at 0x00007f3aeb0da64a
xed could not decode instruction at 0x00007f3aeb122c36
xed could not decode instruction at 0x00007f3aeb54ae7e
xed could not decode instruction at 0x00007f3aeb557a4e
test: Warning: mapping of (null) could not extend preceding bigalloc
At 0x557aa64868c5 is a static-allocated object of size 0, type __FUN_FROM___ARG0_int$32__ARG1___PTR___PTR_signed_char$8__FUN_TO_int$32

and just hangs there with 100% CPU usage and requires SIGKILL to kill it.

Querying static storage is fine (functions & static/global variables); stack & heap allocations hang.

Lifetime policy support is temporarily broken

In e5438f2 I deliberately broke the lifetime policies support (used by pycallocs), by removing support for extended inserts in heap chunks. Instead we'll use some spare bits in the main insert, when they become available (when we move from a memtable to a bitmap).

Run project test suite on CI system

As this is a complex project with various interlocking components and sub-repos, I think it would be quite helpful to have tests running in some kind of CI system for each PR.

Since we already have the buildtest Docker files around, we would need to experiment with running that on some CI system and hooking it up to run for each PR. It's possible we may run up against resource limits on free CI tools if it takes a while to build and run.

Dummyweaks library should go away

The no-op liballocs_dummyweaks.so library is no longer necessary, for at least the following reasons.

The BFD linker provides -z dynamic-undefined-weak. So we can just make the symbols weak, if we don't mind testing for liballocs' presence at every use. We might mind, though.
Thanks to allocsld, we can always preload liballocs transparently. Or, if we cared, we could preload a stripped-down no-op version. That would avoid the 'test at every use' problem, but wouldn't fully eliminate the library as a built artifact.
A hybrid might be possible -- test for some symbols only.
We could link dummy version of the symbols into allocsld -- although not yet, because it ls only a chain loader, so the real ld.so doesn't see it yet.
We could always load the real liballocs but make it disable-able. This would mean stubbing out its own entry points, maybe by dynamic patching or perhaps by some IFUNC trickery. Indeed making liballocs's entry points IFUNCS might be the cleanest way to allow run-time disabling.

What clients care about this? Anyone that links -lallocs presumably doesn't, though that's not entirely clear. It may be only libcrunch_stubs.so that cares.

Instead of LD_PRELOAD, use a custom dynamic linker

To insert liballocs into the process, we use LD_PRELOAD. But even better would be a custom dynamic linker, since we would get control at the very beginning. This could simplify our initialization logic, e.g. avoiding our need to special-case "internal" malloc calls occurring during dlopen or lazy symbol resolution.

I have been working on making it easy to build custom dynamic linkers, in my libgerald project. So the idea is to use and extend that, to create an allocsld.so, which would then be the natural choice of interpreter for binaries generated using our compiler wrapper, linker plugin etc.. It should be possible to invoke this linker as a command (allocsld.so /path/to/binary) for binaries not built with our toolchain extensions.

sizeofness analysis isn't sufficiently general

Currently, we do an intraprocedural analysis of the flow of 'sizeofness', so that e.g. in the following example we can infer that p points to a struct Foo.

size_t s = sizeof (struct Foo);
unsigned t = 2 * s;
void *p = malloc(t);

However, sizeofness is not tracked interprocedurally. A consequence is that it's necessary to declare a wrapper such as in this variation:

void *p = mymalloc(t);

and also a 'helper' such as in this variation:

unsigned t = get_size_of_struct_foo();

and some seen-in-the-wild cases are just not supported, such as (the 'perlbench case'):

struct blah {
  ...
  size_t size_of_a_struct_foo;
  ...
} the_blah = { ..., sizeof (struct Foo), ...};

The 'helper' case is handled by a hack, where helpers can be declared much like a malloc wrapper, and we grossly assume that the preceding helper call executed in a given thread is the one that a given otherwise-unclassified malloc call should be associated with. This allows a simple implementation using a thread-local variable, but is easily foiled e.g. by computing two sizes but doing the allocations in the other order.

One idea for doing better: we can easily make 'sizeofness-returningness' an annotation whose completeness we check by a simple static analysis, i.e. warn if you return an expression having sizeofness but are not marked as a helper. Then instead of the thread-local hack, generate sizeofness at calls to helpers, just as we generate it at 'sizeof' itself... that's if helpers always generate same-type sizeofness.

To handle the perlbench case, use a link-time map for the case of static-constant values with sizeofness.

Uniqtype name-mangling function is not injective

Since we mangle names just by clobbering inconvenient characters to underscores, we can induce unintended aliases.

For an injective mangling, we could:

use hex '$NN' sequences (nice because '$' is C-friendly, at least on gcc et al)
use asm-friendly but not C-friendly characters such as '.'

The problem with the latter is that we'd need to invent a separate C encoding of the names (and map using asm("...")). Or we could generate uniqtypes in assembly.

Stacks in pthreads or sigaltstacks

Currently our stack and stackframe allocators only know about the initial stack. They should also know about stacks registered with sigaltstack and with clone (used by pthread_create).

TLS allocator is not indexed/queryable

We are missing support for thread-local storage.

This allocator is implemented inside the dynamic linker and is much like a static allocator, but each thread gets its own segment, for each library defining TLS-storage symbols. We need to create bigallocs for these areas as threads are created, and index them more-or-less as we handle static segments. (There is no point starting this until the deep-static-allocs branch is merged.)

liballocs could usefully track reachability

Reachability among bigallocs would be very useful to track, e.g. for implementing garbage collectors while tracking what is escaping from their view. It should not be expensive if done carefully.

It may be necessary to track an overapproximation ('may reach') and underapproximation ('must reach') separately from each other, rather than tracking exact truth.

It may be useful to track immediate reachability and transitive reachability separately.

One relevant parameter is whether a given text segment (or other text-containing bigalloc) is write-barriered by its liballocs-aware allocator, hence sending liballocs pointer-write callbacks when new 'interesting' reachability edges of various kinds. If it isn't, then instead, new overapproximate reachability edges must be induced. E.g. for any bigalloc reachable by that text segment, any writably reachable bigalloc may also now reach that bigalloc (i.e. the code may write a pointer into it). See the Bertholon & Kell VMIL '19 paper for more of this flavour.

For static inter-DSO reachability, ld.so auditing may be useful (try man rtld-audit).

Split private-malloc into O(nbigallocs) and O(mem) cases

As of commit ab4d7b0, we have a new __private_malloc() implementation which never does mmap(), thanks to a 'large-enough' (1GB) up-front MAP_NORESERVE area created at the same time as the pageindex.

However, a 1GB area is not really sensible. It is simultaneously too large and not large enough. For a really large workload, we might exhaust it (e.g. a 128GB heap with 16-byte alignment would need 1GB of bitmap). Conversely, we only really need the no-mmap guarantee some of the time, to prevent reentrancies that we can't (or refuse to) deal with.

Ideally we would split private mallocs into two cases: those that are O(nbigallocs) i.e. roughly bounded by the number of big allocations, and those that are O(mem) i.e. bounded instead by the amount of memory in use. Bitmaps are in the latter category, metavectors in the former. The MAP_NORESERVE approach is fine for O(nbigallocs) and we could bound the amount of memory needed to much less than 1GB (since we have at most 32k bigallocs). One risk is that maybe this would bring back bad reentrancies, e.g. would a malloc hook ever need to create a bitmap?

gold plugin is incomplete

Currently, the tools/allocscompilerwrapper.py contains a bunch of stuff that we do when producing a final linked binary. This includes using stubgen.h to create allocator wrappers, linking --wrap to interpose them, unbinding/globalizing the allocator functions if necessary so that interposition can work, creating aliases that we use in some cases (e.g. when linking a malloc() into an executable -- this needs special care), and actually generating the meta-object.

The medium-term goal is to move away from compiler wrappers. Instead, all of liballocs's toolchain interventions should be invokable simply by compiler or linker flags. One of these flags should enable a linker plugin, to do whatever we need to do at link time. That includes, at least, generating the meta-object.

A basic version of this overall approach, of relying on compiler flags not wrappers is already proof-of-concepted in libcrunch's cilpp front-end. Some helper scripts provide compiler options, and these include specifying a "wrapper" program on the gcc command line, which is used to replace the vanilla C preprocessing step with a cpp-then-CIL source-to-source pass. https://github.com/stephenrkell/libcrunch/blob/master/frontend/cilpp/bin/cilpp-cflags

We already also have a gold plugin that works for some simple cases. This needs to be finished. But we should not do this before tackling #11, because switching to binary instrumentation at run time will simplify the task considerably, by eliminating a lot of the link-time interventions.

compilation failing

I'm trying to build liballocs, using:

arch linux with all dependencies on their latest version (converted dependencies from ubuntu as accurately as I could)
gcc 8.2
arch linux standard binutils-gdb since your patched version would not compile either but it's still on version 2.30 so it should work(?)

the build fails with make -C contrib with the error:

src/dies.cpp: In member function ‘virtual dwarf::spec::opt<dwarf::core::type_scc_t> dwarf::core::type_die::get_scc() const’:
src/dies.cpp:2560:62: error: no matching function for call to ‘dwarf::spec::opt<std::shared_ptr<dwarf::core::type_scc_t> >::opt(std::shared_ptr<dwarf::core::type_scc_t>)’
      = opt<shared_ptr<type_scc_t> >(shared_ptr<type_scc_t>());

more logs can be supplied if necessary.
any idea?

Makefile.meta should summarise status to console

At the moment running Makefile.meta may spew many millions of lines to the console as various subsidiary tools are invoked, leaving it a bit hard to follow what's actually happening.

It would be nice to default to a summarised view that prints just the current phase of work, and hides the detailed log unless you add -v or similar.

Uniqtypes should integrate with libdivide

A common operation on uniqtypes when answering any query is to divide by their size. There is potential for speedup here by using libdivide.

Probably this means storing a 'magic number' in the uniqtype. We need to be careful about overall size (see #12). Another approach might be to cache a code snippet that does the division... these could be pooled across all like-sized uniqtypes, although an additional 8-bit code pointer per uniqtype is a hefty price.

debian-stretch build broken (by asm header hackery in libsystrap)

(Reported by @clearyf -- with thanks!)

Building the debian-stretch Dockerfile on buster systems is broken as struct ucontext & ucontext_t have changed between the two releases.

liballocs should be meta-complete

At present, liballocs has blind spots about its own abstractions -- specifically, about its own allocations and about the "uniqtype of uniqtypes".

It would be good to eliminate these. There is a rhetorical motivation for doing so -- showing we can describe our own stuff is a good design-level sanity check, making the design more persuasive. But also, being able to call on some of our own functions when managing allocations might make our code a bit simpler and more uniform, such as in how we manage and free bigallocs' metadata allocations.

This would mean (1) liballocs's private malloc being reified as an allocator, (2) ditto for any other allocations we do, e.g. memtable mappings, and (3) a working/precise uniqtype of uniqtypes, including a make_precise function. There maybe other aspects to meta-completeness that I'm overlooking right now.

allocscompilerwrapper.py puts defsyms too late when wrapping in-exe malloc

The malloc-in-exe test case hacks around this with its own LDFLAGS in mk.inc. But without these, the __wrap___real_malloc and friends are being defined (in liballocs_nonshared.a) too early on the command line to be picked up by the defsym options which want to bind to them.

Minimise the up-front requirement to identify allocator wrappers and (simple) suballocators

With the ongoing overhaul of static metadata, there's an opportunity to design towards a more dynamic heuristic for identifying allocator functions. Done carefully, this should not impact performance in any measurable way, and should increase usability.

For example, we already classify all indirect call sites as "possibly calling allocators", if their signatures satisfy a certain property. What if we included any call site that receives one or more arguments having a sizeofness? Our allocator call site table will become bigger, but perhaps not unmanageably so.

One use of the current instrumentation is to set __current_allocsite to some value. We already want to generalise this so that it is merely __outermost_allocsite, and so auxiliary allocations made during a prima facie allocation are still classified/typed separately. In this form, the only value __outermost_allocsite brings is telling us we can stop walking up the stack. How often do we make use of this?

This is connected with the desire in #11 to eliminate source-level instrumentation. An entirely online approach would be better even than binary instrumentation. Can a bootstrapping approach work? Whenever mmap() is called, any of its callers is considered a possible allocation function. However, this won't catch all allocation events, because most of the time they won't call down to mmap. This is the main difference between suballocators and wrappers.

I think a fully automatic online treatment of malloc wrappers is feasible, provided that malloc itself is dynamically interposable. Other allocators probably can't be handled automatically with much generality... ultimately, an implementation of the struct allocator functions needs to come from somewhere. We could perhaps still do more to make a sensible guess, e.g. by considering the subsequence of the call chain between the mmap entry and the nearest enclosing call having a classified call site. Probably one idea is to issue a warning on the console when this happens, and see whether the output looks sensible.

With libcrunch in the mix, we have yet another mechanism for avoiding up-front classification, which is letting the first cast decide the type. We can't directly use this within liballocs... it's for the instrumentation to call set_type or whatever, if it wants to. But this might make some of our run-time efforts redundant, so perhaps they should be disable-able somehow.

Build error

Hello!

While attempting to build this project, I've got following error:

lt-usedtypes: src/root.cpp:411: bool dwarf::core::root_die::advance_cu_context(): Assertion `retval == DW_DLV_OK || retval == DW_DLV_NO_ENTRY' failed.
/home/indutny/liballocs/tools/lang/c/bin/link-used-types: line 55: 30297 Aborted                 (core dumped) ${USEDTYPES} "$objfile" > "$usedtypes_src"

The actual DWARF error message (that I found upon patching this library) is:

DW_DLE_RELOC_INVALID (241)

Did I do something wrong? Is there any way to fix it?

Thank you,
Fedor.

Cache syscall trap sites using static metadata

It's not necessary or desirable to suffer start-up delay scanning a whole libc binary for syscalls. We used to hack around this by only scanning certain parts, but that is not robust.

Instead, we should do the scan offline and use static metadata. Just like our type metadata, it needs to be rebuilt when the binary is rebuilt, but that is OK and we can do a sanity check using timestamps.

Also it will eventually be necessary to worry about CoW and that fact that if we modify text in a shared library, it is no longer shareable. Most likely a pre-rewritten cached binary needs to be created, stored at a well-known place in the filesystem, and loaded by our modified dynamic linker in place of whatever the executable actually asked to load. That would restore text-sharing, at the cost of a hacky parallel world of library binaries in the filesystem. (Of course when every process in our whole system uses liballocs, that will go away!)

The rudiments of this static metadata should be pulled up into libsystrap or librunt.

Uniqtype aliases don't preserve uniqueness in all cases

The current handling of typedefs, and other cases of aliased uniqtypes (base types, bitfield types) is not quite right.

Consider a DSO in which there is a type foo_t, being a typedef of int, and another DSO in which there is bar_t, also a typedef of int. In the eventual run-time metadata, all three should be aliases of a single unique int type.

But instead, as long as foo_t and bar_t don't appear in the other DSOs, one of the two int definitions will 'lose' and be overridden, while its alias (foo_t or bar_t) will not be overridden, so will remain, and will be non-equal to the 'winning' int type.

This is basically a result of aliases being bound too early, before the global symbol overriding is resolved. We can blame the inflexible semantics of ELF dynamic linking.

One way to fix it up might be to use our dlbind library, if we can preserve its place near the head of the link order. If so, we can intervene whenever we load new metadata, and insert ABS symbols pointing at the 'real' target of any aliased uniqtype. For example, when loading the DSO defining 'foo_t' and 'int' uniqtypes, supposing we already have an aliased 'bar_t' and 'int', we create a '__uniqtype__foo_t' ABS symbol whose value is the 'bar_t'/'int' uniqtype address. This is outwith both the meta-DSO being loaded and the dlbind DSO; and we rely on our dlbind overriding that meta-DSO's definitions.

Do binary instrumentation of allocation functions

Rather than doing a lot of hairy link-time stuff (see tools/allocscompilerwrapper.py) to interpose on allocation functions, it would be better to do it at run time. This should be less fragile, and may benefit from access to run-time type information. It will also avoid the need to relink the target binary if we want to change our list of allocation functions, making the system more convenient to use. That also opens the possibility of inferring a "good guess" about the allocation functions themselves, from looking at the dynamic call tree, so taking away some of the developer effort that is needed.

Allocation functions that are accessed via call/return are the easy case, so I propose to investigate a solution for those first. (The harder cases are alloca and inlined allocation functions.)

The idea is to trampoline-rewrite the entry and exit paths of allocation functions, to call out into stubs. These stubs should simply do the same things that our current ones do (as generated by stubgen.h). Since we have xed in our dependencies anyway, it may be possible to hand-roll a solution. Most allocation functions have at least five bytes' worth of prologue, into which there are no inward jumps. In such a case, all we really need is to identify a 5-bytes-or-more "launch pad" at the start of the prologue, displace those instructions elsewhere (re-relocating them as necessary), replace them with a jump, and append to them a return instruction. To handle the return path, online rewriting of the on-stack return address should be sufficient.

If it gets hairier (e.g. we have to get involved with diverting branches back into the 5-byte displaced chunk), DynInst (https://dyninst.org/) will look more worthwhile. Importing DynInst as a dependency has pros and cons. There is quite a lot of overlap with what we do. My preference, if possible, would be to build it under contrib/ in the form of an archive, from which we can pick only a few functions and hopefully not pull in too much code.

generic_malloc can't always resolve interior pointers

The idea of a bounded backward search is really good for speed. But it creates a tricky contract: get_info() is only supposed to work for interior pointers no more than a certain distance from the start of the object. (What distance? The size of the biggest unpromoted allocation. You can find the other allocations as bigallocs.)

Maybe we need 'fast' and 'complete' query functions? Or maybe our 'hit search bound' path needs to try the bigallocs? Normally, a client would have tried bigallocs 'on the way down', so this path would be redundant. It probably doesn't hurt, though.

Remember that as of recent changes, we index 'big' promoted allocations in the usual way, using inserts (trailers), rather than a field in the bigalloc record.

Binary instrumentation of inline allocation functions and alloca

This follows on from #11.

To handle alloca() in unmodified binaries, we need to be able to identify it where it appears in the output instruction stream. I believe this is doable, by a mixture of source- and binary-level analysis similar to what we currently do. A heuristic suffices for identifying the instruction that is actually doing the alloca (this is a sub of $rsp, on x86-64), cross-checked against the source code using debug info. Instead of inserting a wrapper, and indexing the alloca'd chunks as if they were malloc'd, we need to use a trampoline approach (like with #11) and keep a custom index (perhaps a bitmap hanging off the stack frame's bigalloc).

Handling inlineable allocation functions in general is hard. But it's rare for allocators to be inlined. We probably want to forbid this and warn if we see it happening, again by inspecting the DWARF. (A longer-term project is to support retroactive un-inlining....)

autoconf in root folder has errors

I've read issue #5 and think it describes something else. This is what I get with autoconf 2.69.

/liballocs$ autoconf
configure.ac:15: error: possibly undefined macro: AM_INIT_AUTOMAKE
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:16: error: possibly undefined macro: AM_MAINTAINER_MODE
configure.ac:65: error: possibly undefined macro: AM_CONDITIONAL

Nix build support could be very useful

A really good idea raised by @clearyf -- a Nix set-up allowing user-level build and test could be useful in several ways.

to reliably build liballocs much as Docker currently (in theory) does, but perhaps more easily
helping students or casual-interest users get up-to-speed more quickly than Docker does, sidestepping problems around debugging/ptracing and the sometimes-inconvenient separateness of containers
to build other packages using its instrumentation, that of libcrunch, and so on, by including liballocs's tools in the Nix package-build toolchain

Dynamically created uniqtypes should be named using the canonical names of type(s) they are derived from

Uniqtype names often embed the name of another uniqtype. When dynamically creating uniqtypes (arrays, maybe pointers, sometimes unions), ensuring that we embed the canonical name is currently not possible, because we don't know which of the many aliases is canonical.

Tentative solution: higher-order-macroise the function that computes canonical names (currently in tools/helpers.cpp) and provide a raw C/uniqtypes instantiation as well as the current C++/dwarfpp instantiation.

make_precise should instead be allocator-level operations

Creating a 'precise' struct that ends with a flexible array member is a ballache. You have to create the array type, then create the struct type. The struct type's make_precise function should in theory do this. But if the flexible array member is many layers down, the number of created types and make_precise functions starts to add up. Currently we don't generate these functions, only the ones for the array type.

(Once no array type has a definite length (#34), structs that contain arrays will still have definite length, except for flexible array members, so that does not change things here.)

Given that we are rethinking arrays, we have a chance to rethink make_precise in general. We want it for:

stack frames, but we could instead rely on the stackframe allocator to mediate the uniqtypes (there is no reason why frames need to have an overall struct type)
unions? read/write is a mess at present... for read-validity we have not_simultaneous and may_be_invalid. But maybe we want read-validity as a separate concept orthogonal to uniqtype? i.e. at the allocator level. If so it could cover struct padding, inter-allocation dead space, genuinely read-trapping regions, sparseness that should not be disturbed (?), and so on.
'temporally discriminated unions', e.g. inregs/outregs, hence why it takes an mcontext_t. But for these maybe we need to do the write-tracking thing, trap writes and keep last-written shadow state
explicitly explicitly discriminated unions (like uniqtype itself), variant records etc... the idea is that we can compiling a hypothetical DWARF expression, describing the discriminator semantics, into a make_precise function. It seems reasonable that a variant record or union would have such a piece of code (again logically/hypothetically speaking). So if we get rid of make_precise, where should it live? Perhaps as a special 'related' entry... composites already have N+1 relateds, where the extra one is the member names. We can decree that not_simultaneous composites have a function that returns read-validity... and write-validity? Or maybe a "do write to member N" memcpy-like function that respects invariants (N must be the index of a may_be_invalid member).
Can we also capture "tagvecs" a.k.a. _DYNAMIC or auxv-style arrays this way? We can view them as composites where a given member may be present, absent or even present >1 time. It would be better to view these as an allocator. This may require us to relax our view of uniqtypes as being only at the 'terminal' layer of allocation, i.e. here we can access the arena as an array of Elf64_Dyn entries or as an allocator managing a packed pool of discrete / singleton Elf64_Dyn entries.

We seem to want

struct composite_member_rw_funcs
{
    // we want this to return a bitmask in the common case
    // it is either ('address-boxing')
    //      - if vas.h tells us is not a valid user address,
    //           the caller should then mask it by ((1ull<<(nmembers))-1u)
    //           and the result is a bitmask conveying definedness of the
    //           first nmembers members
    //      - if vas.h tells us it is a valid user address,
    //           it points to a bit vector (le? be?) of nmemb entries,
    //               read-valid in whole words (i.e. rounded up to the word size)
    //           resource management? use TLS? yes I think TLS is best.
    //           the function that writes it can do  static __thread intptr_t mask;
    //           and return &mask (after writing the bits to it);
    intptr_t (*get_read_validity_mask)(struct uniqtype *, void *base, mcontext_t *ctxt);
    void     (*write_member)          (struct uniqtype *, void *base, mcontext_t *ctxt,
                                      unsigned memb_idx, const void *src);
};

ld.so with non-contiguous segments breaks startup

If the ld.so has a hole in, it will be mapped by the kernel with that hole existing. This leads the mmap allocator to build not a single mapping sequence for it but two. This in turn foils the static file allocator which expects a single mapping sequence for the whole file.

My first approach was to plug the hole by mapping it PROT_NONE. However, it may be too late -- sometimes a TLS area, or other random stuff, has been placed in the hole.

Currently I'm seeing this on the CircleCI machine using Ubuntu 18.04.

We may need to plug the gap super-early, or else detect the problematic ld.so build and request the user to patch it (recalling that in the long run, we want to be the ld.so).

Note that non-ld.so binaries are fine because they are mapped by the ld.so, which does not create holes... it uses PROT_NONE.

allocscc gobbles useful warnings

Owing to CIL's default rewritings, many useful compiler warnings are not generated when using allocscc. Ironically perhaps, these include bad implicit conversions of pointers, which CIL makes explicit and so forestalls any warning.

Probably I need to fix this in CIL. Making implicit conversions explicit is one of the things it does during lifting, in order to present to its client code an AST that is free of such implicit things. That's valid, but it should emit a warning when it explicitifies an implicit conversion. See castTo in cabs2cil.ml... it looks like it would be easy to plumb in a warning.

Uniqtypes should be more compact

Currently, each binary generates quite a lot of uniqtype data. This could easily be made more compact in various ways.

remove the cache word
don't generate anonymous subrange types for arrays
relax the invariant that related[0] always exists
for degree-1 pointer types, avoid storing the ultimate pointee and immediate pointee separately
strip the -meta.so so it no longer has .symtab and .strtab
check that the ".size" calculation is accurate (I think it might be doing extra padding)
"short-allocate" some cases of the union, if they don't need the full amount of space
only include make_precise if some flag (somewhere) says it's present
do more clever string compression on subobject names, perhaps, or otherwise tweak our naming conventions to better exploit assemble- and link-time string merging

lib-test test failure

On at least Ubuntu 18.04.2, the lib-test test segfaults, and gdb gives the following stack:

#0  0x00007ffff781ee97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7820801 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff65dbb05 in add_all_loaded_segments (info=<optimized out>, size=<optimized out>, 
    maybe_lment=0x0) at /home/jryans/Projects/liballocs/src/allocators/static.c:172
#3  0x00007ffff7945f21 in dl_iterate_phdr () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff65d1ce1 in __static_allocator_init ()
    at /home/jryans/Projects/liballocs/src/allocators/static.c:35
#5  0x00007ffff65d1ece in global constructors keyed to 65535_1_liballocs_pic.a_0x872a.8401 ()
    at /home/jryans/Projects/liballocs/src/allocators/stack.c:55
#6  0x00007ffff7de5733 in ?? () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7dea1ff in ?? () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff79472df in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x00007ffff7de97ca in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7bd1f96 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#11 0x00007ffff79472df in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x00007ffff794736f in _dl_catch_error () from /lib/x86_64-linux-gnu/libc.so.6
#13 0x00007ffff7bd2735 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#14 0x00007ffff7bd2051 in dlopen () from /lib/x86_64-linux-gnu/libdl.so.2
#15 0x00005555555550b5 in main () at lib-test.c:11

Subsection allocator might be useful

Sometimes, especially for instrumentation, it's useful to be able to carve out little bits of space within a binary's segments. This could be used to hold trampolines, or static data they refer to.

Some such space is available in inter-section padding, so is already visible to liballocs. However, in larger objects there is a lot more space in linker-inserted gaps between sections. These are not currently visible, but would be visible if we could process the link map. Much as the liballocs toolchain forwards relocations (-Wl,-q), it could also forward the link map (-Wl,-Map,filename) for postprocessing into the -meta.so, after which the linker artifact can be deleted.

Probably the foundations for this would be best going in librunt, but it would be tied together here.

Patched binutils should be unnecessary

For a long time, liballocs has required a patched objcopy to perform "symbol unbinding". This is necessary when a wrapped function (e.g., an allocator wrapper) occurs within a compilation unit that also calls the function, since in that case, ordinary --wrap is ineffective.

I seem to have found another way of achieving the same effect, using --defsym and -z muldefs. With a bit more testing, this should be just as good. I'll report progress here. One of the main challenges is ensuring the right semantics for self-reference within a wrapped function. Another is to avoid breaking debug info that wants to relocate against function internals.

stephenrkell / liballocs Goto Github PK

liballocs's People

Contributors

Stargazers

Watchers

Forkers

liballocs's Issues

Recommend Projects

Recommend Topics

Recommend Org