Coder Social home page Coder Social logo

Comments (5)

stephenrkell avatar stephenrkell commented on August 19, 2024

Of course, generating trampolines at run time compromises debuggability, or more generally meta-completeness, unless we can generate debug info / metadata for our trampolines. See #16. Ideally we would create the trampolines in a libdlbind-allocated object, and write into it some DWARF info for them. Generating this DWARF may prove a bit tedious, but we could probably hard-code a template and generate from that. Or arguably, our trampolines are just like PLT entries and don't really need their own DWARF.

from liballocs.

stephenrkell avatar stephenrkell commented on August 19, 2024

Perhaps an appealing way to do the hooking would be by return-address hooking. But then we need to worry about breaking stack walkers, both from debugging and from our own code. In our own code we can cope with it (make introspective clients see the "illusion" of the real stack). But it would be annoying to break the debugger. I wonder if DynInst has a solution to this.

from liballocs.

stephenrkell avatar stephenrkell commented on August 19, 2024

I am leaning towards thinking that we should get rid of caller-side instrumentation altogether. That might work as follows.

  • only do callee-side instrumentation

  • insert type info by walking up the stack and finding a call to a classified alloc func, i.e. classified according to both source- (dumpallocs) and binary-level (objdumpallocs) analyses. That supposes we are still requiring users to identify the allocation functions; getting away without that is more for #20 although some kind of bootstrapping based on instrumented built-in allocators (brk, mmap, stack) might just about work.

  • In general we want the outermost hit, to catch wrapper cases. Or do we? In a wrapper, we expect the in-wrapper site not to be classified. And if there might be allocations during allocation, the innermost classified call seems to be the right one. E.g. SPEC CPU2006's sphinx3 has

void **__ckd_calloc_2d__ (int32 d1, int32 d2, int32 elemsize,
                         const char *caller_file, int32 caller_line)
{
   char **ref, *mem;
   int32 i, offset;

   mem = (char *) __ckd_calloc__(d1*d2, elemsize, caller_file, caller_line);
   ref = (char **) __ckd_malloc__(d1 * sizeof(void *), caller_file, caller_line);

   for (i = 0, offset = 0; i < d1; i++, offset += d2*elemsize)
       ref[i] = mem + offset;

   return ((void **) ref);
}

But if we do this, caller-side instrumentation may need to come back for performance reasons. It basically short-circuits the stack walk: by maintaining __current_allocsite we actually don't need to walk the stack, and can zoom straight to the data we want. Maybe some clever caching can get us something almost as good. Or maybe we can heuristically insert the caller-side instrumentation, at the binary level, after the first time we discover an allocation function by stack walking. We could patch the PLT or maybe the call (for not-via-PLT cases). Indirect calls would still need to be going through the PLT; indirect and not-via-PLT is a problem and might fall back to always walking the stack. Or we could instead do our binary instrumentation at the called entry point. This is less clean because we will CoW-fork the text every time.

from liballocs.

stephenrkell avatar stephenrkell commented on August 19, 2024

Perhaps we should first rename __current_allocsite to __outermost_allocsite and use it to terminate the stack walk...

... then always choose the innermost (first) classified call site that we find on the walk?

This means we are sticking with caller-side instrumentation, which we had wanted to eliminate. I guess the first step is to do the above (fixes sphinx3) and then measure the performance drop of skipping the __outermost_allocsite early termination.

from liballocs.

stephenrkell avatar stephenrkell commented on August 19, 2024

... then, if the performance hit is a problem, replace caller-side wrapping with binary instrumentation of such callers, performed lazily (on the first stack walk that traverses a frame of the named allocation function). Otherwise just scrap it.

That's still assuming we know the names of allocation functions, or at least of 'functions receiving a value with sizeofness'. If we do a more extensive sizeofness analysis, maybe at link time, we can perhaps eliminate that. It's a pretty standard call graph analysis, but has the usual problem of imprecision around indirect calls (for which we'd probably want to assume 'calls any type-compatible callee').

For the call graph analysis, we'd want a local summary of sizeofness sourced, sizeofness sinked, but also 'passed through' i.e. for any integer argument (or integer value read!), supposing it has sizeofness, which callee arg (or value written!) would receive that sizeofness?

A tricky case would be a cross-DSO call into a malloc wrapper. Though at link time we would detect that a given UND symbol receives some sizeofness. Maybe that is enough. It seems to need a further run-time step, accounting for the link map....

from liballocs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.