trailofbits / binrec-tob Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 14.0 34.44 MB

BinRec: Dynamic Binary Lifting and Recompilation

License: Other

Shell 0.38% CMake 1.33% Python 27.56% HTML 1.89% C++ 57.32% Makefile 0.29% C 11.22% Brainfuck 0.01%

binrec-tob's People

Contributors

Stargazers

Watchers

Forkers

gmh5225 mewbak fengjixuchui crackercat losenineai frenchyeti 66hh elniak danxinzhujian chinmaydd cuicuicuinice yujinmon wiauxb hygoni

binrec-tob's Issues

Keep failing while cloning qemu/capstone

hello, i'm having problems in "just install-binrec".
steps keep fail in cloning the qemu/capstone.
so i just added before i build the project(i mean cloned qemu), but same error keep occurs.
and build stopped finally with makefile error.

Is there any solution that i can try it?
Thank you.

coreutils: seq fails during symbolic execution

When tracing the coreutils seq benchmark with a symbolic argument, tracing fails with the following error message:

[FunctionLog] Saving Trace Info... 
[FunctionLog] Restoring tracing vars for state: 0
qemu-system-i386: /home/michaeldbrown/binrec-prerelease/s2e/source/s2e/libs2eplugins/src/s2e/Plugins/binrec_plugins/FunctionLog.cpp:235: void s2e::plugins::FunctionLog::slotStateSwitch(s2e::S2EExecutionState *, s2e::S2EExecutionState *): Assertion `m_tracesByState.find(newStateID) != m_tracesByState.end() && " Could not restore traceinfo state!"' failed.
INFO: [run] S2E terminated with code -6
INFO: [run] Terminating S2E
WARNING: [run] Sending SIGTERM to S2E process group
INFO: [s2e_env.server.stats] Terminating stats collection thread
INFO: [s2e_env.server] Waiting for unfinished threads
INFO: [s2e_env.server] Waiting for thread "RequestHandlingThread-0"
ERROR: [run] S2E terminated with error -6

Conflict between symbolic tracing code and export interval in BinRec plugins

Work trying to recover zip in BinRec revealed an issue between the export interval and symbolic tracing code. The OnSlotStateSwitch event is triggered when the export interval is reached, causing multiple LLVM IR files to be generated per trace when not using symbolic arguments. This condition does not result in a corresponding traceinfo file being created, which causes a failure during merging.

We should explore eliminating the export interval entirely, as binrec-uci appears to have increased this limit to avoid problems. @ameily confirmed that increasing the interval to 10K solves the problem.

Another option would be to always use the state number in the plugins (and later in merging), and this would allow a high export interval to remain.

Inconsistent fpregs address in binrec_lift and qemu code

While working on fixing trailofbits/binrec-prerelease#87, I found that the fpregs global variable is not the same address from binrec_lift, custom-helpers.cpp, and qemu, op_helper.c/softfloat.c.

binrec_lift has a pass that makes each member of the env global variable a separate global variable, globalize_env.cpp. This is working for the fpstt variable, which contains the fpu stack top index. However, looking at disassembly, the fpregs address is not consistent for binrec_lift and qemu. This is verified by calling into a helper method such as helper_fldl_ST0, which should set fpregs[fpstt].d = value. The updated value is not reflected on the binrec_lift side.

For now, this may not be an issue as long as binrec_lift leverages qemu for every interaction with the emuatled fpu stack. This may not be ideal in the future for performance reasons since calling into qemu is much slower than actually running the corresponding single fpu instruction. Also, calling into qemu leads to redundant and unnecessary calls. For example, returning a float has 3 calls into qemu, which would be replaced by a single fpu instruction:

// pseudo code of what happens when a function returns a double

double value; // the return value
helper_fldl_ST0(value);  // load the value into the emulated fpu stack
value = helper_fstl_ST0();  // get the double from the emulated fpu stack
helper_fpop(); // pop the value from the stack
return value;

/////

// all of this can be replaced with a single fpu instruction:
double value;
asm("fstpl %0" : "=m"(value));
return value;

Analysis Timeout

I ran into an issue today where I accidentally ran head within the VM with no arguments, so the program froze trying to read from stdin. I think I waited a minute or two before killing the trace, which made me think that the S2E analysis isn't timing out like I thought it would.

We may want to look into this to see if S2E does in fact enforce a maximum analysis runtime. If it doesn't, we may consider adding it as a new configuration parameter or command line argument.

`mkdir` test case fails at a high rate on LLVM-14 version of BinRec

We need to explore this and determine to what degree it is different from other transient issues we have encountered with S2E.

I can't install binrec, it seems there is a HTTP connection error when executing "pipenv lock --dev"

`test -f Pipfile.lock || pipenv lock --dev
Locking [dev-packages] dependencies…
ts/packages/urllib3/connectionpool.py", line 592, in urlopen
httplib_response = self._make_request(conn, method, url,
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/urllib3/connectionpool.py", line 355, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/packages/urllib3/connectionpool.py", line 315, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pipenv/resolver.py", line 82, in
main()
File "/usr/lib/python3/dist-packages/pipenv/resolver.py", line 66, in main
results = resolve(
File "/usr/lib/python3/dist-packages/pipenv/resolver.py", line 56, in resolve
return pipenv.utils.resolve_deps(
File "/usr/lib/python3/dist-packages/pipenv/utils.py", line 469, in resolve_deps
r = requests.get(
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/sessions.py", line 488, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/sessions.py", line 596, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/pipenv/vendor/requests/adapters.py", line 499, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=10)

error: Recipe _binrec-init failed on line 55 with exit code 1
`

Support tail calls to library functions

binrec does not support indirect function calls and jumps. Specifically, two samples are failing to lift: longjmp and siglongjmp, and it appears to be related to the insert_calls.cpp pass.

coreutils: cksum produces incomplete trace

It looks like recovered cksum is not complete and always segfaults because there are several conditional branches that are not hit during the initial capture. In the recovered bitcode, after a read operation has completed, via fread(), there are a bunch of nop instructions and then an instruction that will always segfault (esi will always be 0 because of the xor):

   # call to __fread_unlocked 
   0x0904958f <+991>:	call   0x9049150 <helper_stub_trampoline>
   # instructions, no branches
   0x090495c0 <+1040>:	xor    esi,esi
   0x090495c2 <+1042>:	nop
   # More nop's
   0x090495ce <+1054>:	nop
   0x090495cf <+1055>:	nop
=> 0x090495d0 <+1056>:	movzx  esi,BYTE PTR [esi]

Based on the assembly, I think this is referring to the read and sum loop in cksum (see cksum.c.

My hunch is that binrec is operating correctly, based on the captured bitcode, and the actual problem is that the trace is incomplete. I’ve tried running more traces on additional files without any luck. So, I don't think this problem is specific to cksum and we will see more sample affected by this.

Add support for making bytes within file inputs symbolic

After PR trailofbits/binrec-prerelease#186 gets merged, we should look into making bytes within the files symbolic. This is documented by S2E here:

http://s2e.systems/docs/Tutorials/BasicLinuxSymbex/s2e.so.html

coreutils: who recovered binary segfault on realloc

I'm seeing that the who coreutils sample is lifting but the recovered binary is segfaulting because realloc is failing:

realloc(): invalid pointer
Aborted (core dumped)

Recovered binaries init_array is incorrect

In the Henrik/s2esubmodule branch, when recovering binaries having entries in the init_array, e.g. usage of std::cout, these entries are lost somewhere in the translation.

The function to be called seems to be present in recovered.ll but the address to it is not added to the init_array section.

This issue is the primary reason why most C++ samples fail, including simple samples.

Regression: `eq`, `args` fails during `lift-trace`

@ameily and I have both seen this issue with two binaries so far, args and eq. The details of the failure output are:

pipenv run python -m binrec.lift -vv "argsproj"
Loading .env environment variables…
10:41:32 DEBUG binrec.lift: extracting symbols from binary: s2e-out
10:41:32 DEBUG binrec.audit: subprocess.Popen: ('make', ['make', '-f', '/home/michaeldbrown/binrec-prerelease/scripts/s2eout_makefile', 'symbols'], '/home/michaeldbrown/binrec-prerelease/s2e/projects/argsproj/s2e-out', None)
make: 'symbols' is up to date.
10:41:32 DEBUG binrec.lift: cleaning captured bitcode: s2e-out
10:41:32 DEBUG binrec.lift: applying fixups to captured bitcode: s2e-out
10:41:32 DEBUG binrec.audit: subprocess.Popen: ('llvm-link-12', ['llvm-link-12', '-o', 'linked.bc', 'cleaned.bc', '/home/michaeldbrown/binrec-prerelease/runlib/custom-helpers.bc'], '/home/michaeldbrown/binrec-prerelease/s2e/projects/argsproj/s2e-out', None)
10:41:32 DEBUG binrec.lift: performing initially lifting of captured LLVM bitcode: s2e-out
[INFO] pruned 0 trivially dead references from successor lists
[INFO] pruned 0 trivially dead references from successor lists
[INFO] pruned 0 trivially dead references from successor lists
[INFO] pruned 0 trivially dead references from successor lists
[INFO] pruned 0 trivially dead references from successor lists
Traceback (most recent call last):
  File "/home/michaeldbrown/binrec-prerelease/binrec/lift.py", line 203, in _lift_bitcode
    binrec_lift.lift(
RuntimeError: block BB_8049160 stores PC 2148087152 but does not have BB_80093570 in its successor list. Did you remember to disable multithreading in qemu (-smp 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/michaeldbrown/binrec-prerelease/binrec/lift.py", line 411, in <module>
    main()
  File "/home/michaeldbrown/binrec-prerelease/binrec/lift.py", line 406, in main
    lift_trace(args.project_name)
  File "/home/michaeldbrown/binrec-prerelease/binrec/lift.py", line 360, in lift_trace
    _lift_bitcode(merged_trace_dir)
  File "/home/michaeldbrown/binrec-prerelease/binrec/lift.py", line 210, in _lift_bitcode
    raise BinRecError(
binrec.errors.BinRecError: failed to perform initial lifting of LLVM bitcode: s2e-out: block BB_8049160 stores PC 2148087152 but does not have BB_80093570 in its successor list. Did you remember to disable multithreading in qemu (-smp 1)
error: Recipe `lift-trace` failed on line 251 with exit code 1

Latest version of S2E throws errors when tracing our integration tests

Can be reproduced by installing BinRec without the freeze recipe enabled.

obstack support

Multiple coreutils samples use the obstack functions for memory management. obstack accepts two function pointers for the allocator and free functions within obstack_init (a macro that just calls _obstack_begin). Typically malloc and free are passed in as these functions.

The problem is that external callbacks are not working within binrec, so the "callbacks" (malloc and free) are not lifted and the recovered binary segfaults in the call to _obstack_begin. Example of a recovered binary:

   0x090499f7 <+615>:	push   0x8048cd0 ; original free (not lifted)
   0x090499fc <+620>:	push   0x8048e40 ; original malloc (not lifted)
   0x09049a01 <+625>:	push   0x0
   0x09049a03 <+627>:	push   0x0
   0x09049a05 <+629>:	push   0x80521e0 ; struct obstack*
   0x09049a0a <+634>:	call   0x9049060 <_obstack_begin@plt>

We could add support for this specific use case of obstack within a new binrec lift pass:

Identify calls to _obstack_begin
Replace the arguments for the callbacks with addresses of free and malloc.

This pass would need to be run after library functions have been identified and externalized so that they can be resolved to an address (most likely via ptrtoint).

I've confirmed that three coreutils samples use obstack:

ls
tac
dircolors

Make use of S2E parallel execution mode

Instead of running the cmd-debian-mt.sh script, can we use S2E parallel execution mode directly?

Regression: Unable to locate main function during lift

Several samples (eq, args, atoftest, floattest) occasionally fail to lift properly because of an error where the entry point could not be located:

Failed to locate main-function

This appears to occur seemingly at random and re-runing the lift will eventually work. This is similar to #26, although most likely a different underlying issue. The heuristics to identify main() changed significantly between the previous version of s2e and the updated version of s2e, which most likely introduced this regression.

coreutils: date multiple output formatting issues

The coreutils date sample lifts successfully but the recovered binary has multiple issues with the outputted data and format. I've verified that the date sample is running correctly within the analysis VM and outputs data correctly.

Issues I'm seeing include:

The default date format is incorrect and does not show the current time
```
# original (correct)
Mon 28 Mar 2022 01:18:06 PM EDT
# recovered (incorrect)
Mon 18 Mar 2022 18 EDT
```
- The day of the month (second field) is always the minute value (%m vs %M)
- The recovered binary doesn't show the current time

Specifying a date/time to format does not include 0 padding (-Iseconds)

# original (correct)
2022-03-28T13:22:08-0400
# recovered (incorrect)
2022-3-28T13:22:8-0400

Attempting to lift multiple runs with different arguments results in hitting the indirection function call limitation and the lift fails (#119). I tried to exercise date to trigger the two above behaviors by merging 3 separate runs:
```
date -d 2020-01-01T01:01:01
date -d 2020-01-01T01:01:01 -Iseconds
date
```

This looks to me like either a limitation of how binrec captures bitcode (needs enough inputs to fully exercise each necessary branch) or a inconsistent behavior specific to date.

`env` and `printenv` tests fail on Debian 11.3 version of BinRec

for the env and printenv benchmarks, the recovered binary produces different output. Didn't look into these much.

coreutils: sleep recovered segfaults on XMM instruction

The coreutils sleep sample lifts but the recovered binary segfaults after the sleep operation has been performed. It looks like sleep uses a double floating point for the sleep argument (see sleep.c). All of the qemu/binrec code I've seen uses the legacy FPU registers / instructions, and this is the first time I've seen an XMM register being used, which I'm guessing is unsupported.

Program received signal SIGSEGV, Segmentation fault.
0xf7e6079a in ?? () from /lib/i386-linux-gnu/libc.so.6
2: x/i $pc
=> 0xf7e6079a:	movdqu xmm0,XMMWORD PTR [edi]
(gdb) bt
#0  0xf7e6079a in ?? () from /lib/i386-linux-gnu/libc.so.6
trailofbits/binrec-prerelease#1  0x0904c011 in Func_main ()
trailofbits/binrec-prerelease#2  0x0904bf7b in Func_dtotimespec ()
trailofbits/binrec-prerelease#3  0x0904c47d in Func_main ()
trailofbits/binrec-prerelease#4  0x0904bdfb in main ()

The segfault occurs right before the call to exit:

   # the cal to dtotimespec is segfaulting
   0x0904c478 <+1272>:	call   0x904be00 <Func_dtotimespec>
   # remainder of the main function
   0x904c47d <Func_main+1277>:	add    esp,0x10
   0x904c480 <Func_main+1280>:	mov    DWORD PTR [eax],0x0
   0x904c486 <Func_main+1286>:	mov    DWORD PTR [eax-0x4],0x804907b
   0x904c48d <Func_main+1293>:	sub    esp,0xc
   0x904c490 <Func_main+1296>:	push   0x0
   0x904c492 <Func_main+1298>:	call   0x9049070 <exit@plt>

coreutils: cannot locate main function

I'm seeing that the cp coreutils sample, and potentially others, fail to locate the main function. This heuristic was changed when initial support for coreutils samples was added, trailofbits/binrec-prerelease#127.

failed to perform initial lifting of LLVM bitcode: s2e-out: Expected a single successor for potential main block, BB_804A030: got 43 successors

I think this is different than the initial regression found that occurs sometimes against binrec samples, #33.

coreutils: recovered stty segfault on vasprintf (potential limitation of variadic arguments)

The stty coreutils sample is lifting but the recovered binary is segfaulting on a call to vasprintf

(gdb) bt
#0  0xf7e4aa0b in strchrnul () from /lib/i386-linux-gnu/libc.so.6
trailofbits/binrec-prerelease#1  0xf7e222b1 in ?? () from /lib/i386-linux-gnu/libc.so.6
trailofbits/binrec-prerelease#2  0xf7e36369 in ?? () from /lib/i386-linux-gnu/libc.so.6
trailofbits/binrec-prerelease#3  0xf7ed3c45 in __snprintf_chk () from /lib/i386-linux-gnu/libc.so.6
trailofbits/binrec-prerelease#4  0x0904976b in Func_wrapf ()
trailofbits/binrec-prerelease#5  0x09049d94 in Func_main ()
trailofbits/binrec-prerelease#6  0xf7ddbee5 in __libc_start_main () from /lib/i386-linux-gnu/libc.so.6
trailofbits/binrec-prerelease#7  0x09049fd6 in _start ()
(gdb) f 0
#0  0xf7e4aa0b in strchrnul () from /lib/i386-linux-gnu/libc.so.6
(gdb) ds
=> 0xf7e4aa0b <strchrnul+27>:	mov    cl,BYTE PTR [eax]

The source code for stty.c shows that the wrapf function accepts variadic arguments and then calls vasprintf. I'm not sure at this point if we have tested binrec against a lifted function that accepts variadic arguments, so this may be a limitation of binrec.

Meta: Code maintainability improvements

There is much room for improvement to the code quality of the C++ components in BinRec and the plugins. Stopping short of a complete refactor / rewrite, there is a lot we can do to make the code more readable and decipherable for new engineers coming on to the project.

This is a catchall issue we commit / merge PRs against to capture the various little improvements we make along the way. Ideally, when exploring parts of the code base as parts of other issues, we can commit /merge localized cleanup and maintainability changes towards this issue to keep our other PRs simple to review.

Commits and PRs against this issue should have zero to no impact on program behavior to allow for rapid review and merging. This includes changes like adding comments, cleaning up comments, single line code clarity changes that do not change program behavior, code formatting, etc.

Actionable tasks can be added as comments for cleanup.

coreutils: uniq recovered binary segfaults

The uniq coreutils sample lifts but the recovered binary segfaults. I'm not seeing anything that stands out in the recovered binary as to why the sample is segfaulting, although it may be related to #19.

Program received signal SIGSEGV, Segmentation fault.
0x090498ee in Func_main ()
2: x/i $pc
=> 0x90498ee <Func_main+1790>:	mov    DWORD PTR [eax+0x8],ecx

Verify or Add Support for long double Types

Issue trailofbits/binrec-prerelease#87 added support for 32bit (float) and 64bit (double) types. C has a 80bit floating point type, long double, that is analogous of the 80bit FPU registers. We should either verify that binrec_lift supports the long double type or add support for it, which will most likely be very similar to how float/double support was added in trailofbits/binrec-prerelease#93.

coreutils: base64 reads beyond size of file

The base64 coreutils sample routinely reads past the end of a file, which results in a segfault. Looking at the recovered bitcode, I'm not seeing a reference to feof, which the base64 sample calls (see base64.c). The original binary doesn't import feof either, so this may not be related.

In testing, I found that the segfault occurs when trying to write beyond the output buffer. I am seeing that base64 is filling the buffer with A characters, which is what you'd expect trying to encode the null byte repeatedly.

In this example, the file being read is encoded as a base64 array with the size 0xea3. The next character (0xea4) is the start of encoding null bytes. Here, in the recovered binary:

$edx is the address of the output array
$esi is the current output index into the array.
$al is the base64 encoded character for the current input byte

Program received signal SIGSEGV, Segmentation fault.
=> 0x904952f <Func_main+943>:	mov    BYTE PTR [edx+esi*1],al
(gdb) p/x $esi
$1 = 0x2d014

(gdb) x/32c $edx+0xea0
0xa042e8c <stack+16739868>:	97 'a'	87 'W'	52 '4'	75 'K'	65 'A'

I've verified that the last 10+ bytes of the array are correct and match the output of the original base64 program. SO, it appears that base64 is reading beyond the end of the file, which returns arrays of 0.

I haven't seen this behavior on similar samples, such as cat, so I'm not sure if this is isolated to base64 or more widespread.

Inconsistent Trace Info on Some Samples with Concrete Inputs

With PR trailofbits/binrec-prerelease#169, I'm seeing that several binrec samples can lift to a recovered binary that has inconsistent and incorrect behavior sometimes. These samples will lift but will fail the verification step in the integration tests. I've seen it happen on multiple samples:

breakFallback
eq
consecutive_calls

For eq, I'm seeing that the broken recovered binary has differences in traceInfo.json:

comparing s2e/projects/eq-working/s2e-out/traceInfo.json (-) against
          s2e/projects/eq-broken/s2e-out/traceInfo.json (+)

found 5 differences:
- functionLog.entryToReturn ('0x8049196', '0x80491f3')
- functionLog.entryToTbs ('0x8049070', ('0x8049030', '0x8049050', '0x8049070'))
- functionLog.entryToTbs ('0x8049196', ('0x8049196', '0x80491b2', '0x80491d4', '0x80491e1', '0x80491e6', '0x80491f3', '0x80491f6'))
+ functionLog.entryToTbs ('0x8049070', ('0x8049030', '0x8049050', '0x8049070', '0x8049196', '0x80491b2', '0x80491e6', '0x80491f3', '0x8049210', '0x804921a', '0x8049231', '0x8049244', '0x8049263', '0x804926d'))
+ functionLog.entryToTbs ('0x8049196', ('0x8049196', '0x80491b2', '0x80491d4', '0x80491e1', '0x80491f6'))

Here are zip files of the s2e-out directory for both a broken and working recovered binary:

Cannot lift recompiled example binaries

I'm trying to use binrec. I was able to compile and run the examples correctly. However, when I try to recompile the example binaries from sources (or just use my own hello world binaries), binrec always crashes.

To compile my binaries, I connect to the S2E VM image (s2e/images/debian-9.2.1-i386/image.raw.s2e), compile the binaries there and then I let the binrec analyze those exact binaries. (Ofc, I'm using a copy of the image not to corrupt it). AFAIK, there should be no problem with missing dependencies or anything since I'm using the exact same VM as binrec.

Error

15:54:35 INFO binrec.merge: linking prepared bitcode: /tmp/binrec-tob/s2e/projects/myhello/s2e-out-2/captured-link-ready.bc
pipenv run python -m binrec.lift  "myhello" 
Loading .env environment variables…
15:54:36 INFO binrec.lift: lifting project myhello
Traceback (most recent call last):
  File "/tmp/binrec-tob/binrec/lift.py", line 380, in _lift_bitcode
    binrec_lift.lift(
_binrec_lift.LiftError: [recover_functions] Failed to located main via entrypoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmp/binrec-tob/binrec/lift.py", line 663, in <module>
    main()
  File "/tmp/binrec-tob/binrec/lift.py", line 658, in main
    lift_trace(args.project_name, opt_level, args.harden)
  File "/tmp/binrec-tob/binrec/lift.py", line 596, in lift_trace
    _lift_bitcode(merged_trace_dir)
  File "/tmp/binrec-tob/binrec/lift.py", line 387, in _lift_bitcode
    raise convert_lib_error(
binrec.errors.BinRecLiftingError: failed to perform initial lifting of LLVM bitcode: myhello: [recover_functions] Failed to located main via entrypoint
error: Recipe `lift-trace` failed on line 239 with exit code 1
error: Recipe `recover` failed on line 244 with exit code 1

Questions

Do you know what could be the problem and how to fix it?
What compiler+flags and environment were used to produce the test binaries?
Do you potentially have time to fix it, or you no longer work in binrec? (I don't see any recent activity here).
Orthogonal question: You mention BinRec has very limited support for C++ programs.. What exactly does it mean? What is not implemented?

Potential issue - bug?

The code related to the error contains a comment saying.

    // NOTE (hbrodin):
    // Previous implementation/versions of BinRec and S2E didn't instrument starting at the
    // entrypoint in the updated S2E we do. Because of this the previous method for identifying
    // "main" doesn't any more. Previously they could rely on multiple "entry-points", first being
    // __libc_csu_init and then, the second time, main. Our updated approach looks like this: First
    // block: _start, it has one call to get the eip, on return from that we enter a block with a
    // call to __libc_start_main, wich later will call main the address of main is passed in
    // register eax. Unfortunately, we don't have any symbols at this point. The ieda is instead to
    // walk the successors to find the third block (having the call to __libc_start_main)
    // and locate the last store to eax, this "should" be the address of main.
    uint32_t locate_main_addr(Function *entrypoint, Module &m)

Potential issue - Binary difference?

I noticed that there is still some difference between the binaries, e.g., when I try to see the dynamic libraries with objdump -T. Not sure that could be the problem? (the different GLIBC?)

; my binary
DYNAMIC SYMBOL TABLE:
00000000  w   D  *UND*  00000000              _ITM_deregisterTMCloneTable
00000000  w   DF *UND*  00000000  GLIBC_2.1.3 __cxa_finalize
00000000      DF *UND*  00000000  GLIBC_2.0   puts
00000000  w   D  *UND*  00000000              __gmon_start__
00000000      DF *UND*  00000000  GLIBC_2.0   __libc_start_main
00000000  w   D  *UND*  00000000              _Jv_RegisterClasses
00000000  w   D  *UND*  00000000              _ITM_registerTMCloneTable
000005ed g    DF .text  0000003c  Base        main
000006ac g    DO .rodata        00000004  Base        _IO_stdin_used

; test binary
DYNAMIC SYMBOL TABLE:
00000000      DF *UND*  00000000  GLIBC_2.0   puts
00000000  w   D  *UND*  00000000              __gmon_start__
00000000      DF *UND*  00000000  GLIBC_2.0   __libc_start_main
0804a004 g    DO .rodata        00000004  Base        _IO_stdin_used

Thanks a lot for your help!

coreutils: stat sample writes corrupt output and segfaults

The coreutils stat sample lifts properly but the recovered binary write corrupt output and then segfaults:

  File: ‘/etc/passwd’
  Size: 2811      	Blocks: 8          IO Block: 4096   regular file
Device: 805h/2053d	Inode: 524409      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-6-14 8:14:27.391947486 -0400
Modify: 2021-10-20 15:4:52.414639624 -0400
Change: 2021-10-20 15:4:52.422639624 -0400
 Birth: -
###### Begin Corrupt Oupput ######
8859-2��;
                nisles4_minimal [NOTFOUND=return] dnshis file. try:e: 524409      Links: 1     Device type: 2811,root
###### ... corrupt output continues ... ######
Segmentation fault (core dumped)

Interestingly, this problem does not occur when stdout is redirected to a file and stat behaves normally.

It appears that the initial output is correct, then a bunch of corrupt data is written to stdout before segfaulting.

Initial Support for Callbacks

Add some initial support for lifting callback functions, which will entail:

Lift the actual callback function and make sure it is not removed by DCE

Update function calls that register the callback, for example:

// original code
atexit(&my_callback);
// lifted code
atexit(&lifted_my_callback);

This initial support will target atexit with the goal of having it extensible for other callbacks.

coreutils: shuf recovered buffer overflow

The coreutils shuf sample lifts but the recovered binary segfaults due to a buffer overflow being detected. I've run the recovered shuf on small files (a couple lines) and /etc/passwd with 47 lines, both produce the same crash:

./recovered /etc/passwd
*** buffer overflow detected ***: terminated
Aborted (core dumped)

I believe this check is performed by libc and not part of the recovered binary. I verified that the shuf sample is executing properly within the analysis VM.