Coder Social home page Coder Social logo

build-bom's Introduction

I enjoy building secure and correct systems. I have built a number of static analysis and automated verification tools, with a recent focus on analysis of binaries.

Repositories

Data Structures

Data structures that may be generally useful

  • haggle [39 ⭐ 📖]: An efficient graph library for Haskell
  • persistent-vector [27 ⭐ 📖]: Persistent vectors for Haskell based on array mapped tries
  • robbed [4 ⭐]: A pure Haskell implementation of Reduced Ordered Binary Decision Diagrams (BDDs)

Program Analysis

  • ql-grep [3 ⭐]: A code search tool that implements CodeQL on the tree-sitter infrastructure
  • build-bom [45 ⭐]: Dynamically discover the commands used to create a piece of software
  • whole-program-llvm [682 ⭐]: A wrapper script to build whole-program LLVM bitcode files; note that I consider this to be obsoleted by build-bom, which takes a more robust approach to the same problem
  • itanium-abi [12 ⭐ 📖]: An implementation of C++ name mangling for the Itanium ABI
  • what4-serialize [0 ⭐]: Serialization/deserialization for What4 expressions

Binary Analysis

  • crepitans [2 ⭐]: A tool for scriptable exploration of binaries
  • dismantle [25 ⭐]: A library of assemblers and disassemblers derived from LLVM TableGen data
  • portable-executable [2 ⭐]: Tools for working with the Windows Portable Executable (PE) file format
  • semmc [35 ⭐]: Stratified synthesis for learning machine code instruction semantics
  • macaw [201 ⭐]: Open source binary analysis tools.
  • macaw-loader [5 ⭐]: Uniform interface to load a binary executable and get Macaw Memory and a list of entry points.
  • renovate [47 ⭐]: A library for binary analysis and rewriting
  • language-sleigh [5 ⭐]: A parser for the Sleigh language, which is used to represent ISA semantics in Ghidra
  • mctrace [5 ⭐]: An implementation of DTrace for machine code

Debugging Tools

  • ddmin [3 ⭐]: An implementation of delta debugging (ddmin) in Haskell
  • surveyor [18 ⭐]: A symbolic debugger for C/C++ (via LLVM), machine code, and JVM programs
  • binary-walkr [2 ⭐]: A tool for examining ELF binaries

Solvers

Note that these are interesting and informative, but definitely not efficient enough to use in production

  • satisfaction [2 ⭐]: A DPLL SAT solver written in Haskell
  • datalog [102 ⭐]: A pure Haskell implementation of Datalog
  • ifscs [4 ⭐ 📖]: An inductive form set constraint solver in Haskell
  • satir [1 ⭐]: An implementation of a SAT solver in Rust

Emacs Packages

Others

  • taffybar [691 ⭐ 📖]: A gtk based status bar for tiling window managers such as XMonad; now maintained by Ivan Malison
  • travitch [1 ⭐]: The code for my Github profile page, which generates this page
  • blog [3 ⭐]: The code for my blog (ravit.ch)
  • dotfiles [0 ⭐]: A collection of dotfiles managed by Chezmoi

build-bom's People

Contributors

kquick avatar langston-barrett avatar travitch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

build-bom's Issues

Bitcode extraction from ar files

An archive file (e.g. static library) requires special handling. Currently build-bom can be used to extract from an archive file (via objcopy) but the result is only the bitcode for the last member of the archive. It's not clear if objcopy is only extracting this last member, or if it's extracting each member in turn and overwriting the output file each time. Regardless, build-bom should be updated to specifically support extraction from archive files.

Better error reporting when llvm-link is not available

Currently, the error reported when llvm-link is not available is not very expressive, and could be misinterpreted as some other file not being present. We should pre-check to ensure that llvm-link is executable before running the extract command.

Panic when running `build-bom generate-bitcode` (with no arguments)

Revision:

$ git rev-parse --short HEAD                       
8056803

Reproduce:

$ ./target/debug/build-bom  generate-bitcode
thread 'main' panicked at 'assertion failed: mid <= self.len()', /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/slice/mod.rs:1537:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Backtraces
root@37b1efb3efde:/x# /build-bom/target/x86_64-unknown-linux-musl/release/build-bom  generate-bitcode
thread 'main' panicked at 'assertion failed: mid <= self.len()', /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/slice/mod.rs:1537:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
root@37b1efb3efde:/x# RUST_BACKTRACE=1 /build-bom/target/x86_64-unknown-linux-musl/release/build-bom  generate-bitcode
thread 'main' panicked at 'assertion failed: mid <= self.len()', /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/slice/mod.rs:1537:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
   1: core::panicking::panic_fmt
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
   2: core::panicking::panic
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
   3: bom::bom::bitcode::bitcode_entrypoint
   4: bom::run_bom
   5: build_bom::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
root@37b1efb3efde:/x# RUST_BACKTRACE=full /build-bom/target/x86_64-unknown-linux-musl/release/build-bom  generate-bitcode
thread 'main' panicked at 'assertion failed: mid <= self.len()', /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/slice/mod.rs:1537:9
stack backtrace:
   0:     0x7f731c605e3c - std::backtrace_rs::backtrace::libunwind::trace::hc1bc96ddb4426aa4
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
   1:     0x7f731c605e3c - std::backtrace_rs::backtrace::trace_unsynchronized::h923980a653d66493
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f731c605e3c - std::sys_common::backtrace::_print_fmt::h9c757c85a437b931
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x7f731c605e3c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hd4daee6a3bf7c86e
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x7f731c6401cc - core::fmt::write::hb92fcd00ba9c1ad2
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/fmt/mod.rs:1163:17
   5:     0x7f731c601775 - std::io::Write::write_fmt::he1040163a0175759
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/io/mod.rs:1696:15
   6:     0x7f731c607de0 - std::sys_common::backtrace::_print::h41aed1f85e85fe81
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x7f731c607de0 - std::sys_common::backtrace::print::h80502ae1de52b70b
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x7f731c607de0 - std::panicking::default_hook::{{closure}}::ha8bcafa5b9176f3f
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:210:50
   9:     0x7f731c607995 - std::panicking::default_hook::hfaee58ed0a065bec
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:227:9
  10:     0x7f731c608494 - std::panicking::rust_panic_with_hook::h8ce3328d937db5aa
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:624:17
  11:     0x7f731c607f42 - std::panicking::begin_panic_handler::{{closure}}::h1f2295b855ba5030
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:519:13
  12:     0x7f731c6062e4 - std::sys_common::backtrace::__rust_end_short_backtrace::h17092a58b60b0566
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:139:18
  13:     0x7f731c607ed9 - rust_begin_unwind
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
  14:     0x7f731c499431 - core::panicking::panic_fmt::hcf6bd03e382adeab
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
  15:     0x7f731c49937d - core::panicking::panic::he7efc04572bf92e9
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
  16:     0x7f731c4fef2f - bom::bom::bitcode::bitcode_entrypoint::h5935250f50302372
  17:     0x7f731c49f905 - bom::run_bom::h10a6bcd1fede10ab
  18:     0x7f731c499eb8 - build_bom::main::hb1f847b24e043138
  19:     0x7f731c49a253 - std::sys_common::backtrace::__rust_begin_short_backtrace::hddd118dbbd843840
  20:     0x7f731c49a10d - std::rt::lang_start::{{closure}}::h8b6ab5d1237deb63
  21:     0x7f731c605b11 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h37d528e2b7386a19
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:259:13
  22:     0x7f731c605b11 - std::panicking::try::do_call::h21f3d980e271aebe
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:403:40
  23:     0x7f731c605b11 - std::panicking::try::h6366c75894a5ee3f
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:367:19
  24:     0x7f731c605b11 - std::panic::catch_unwind::hbab33c6a69c714f4
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panic.rs:133:14
  25:     0x7f731c605b11 - std::rt::lang_start_internal::{{closure}}::h4a2c188522fb7f4a
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/rt.rs:128:48
  26:     0x7f731c605b11 - std::panicking::try::do_call::h9b4b672a4b3537ad
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:403:40
  27:     0x7f731c605b11 - std::panicking::try::h9c95acfa69428cd5
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:367:19
  28:     0x7f731c605b11 - std::panic::catch_unwind::h137c802160173f20
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panic.rs:133:14
  29:     0x7f731c605b11 - std::rt::lang_start_internal::h89221b25a17002da
                               at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/rt.rs:128:20
  30:     0x7f731c499f12 - main

Cool project!

First of all, thanks for the blight shoutout :-)

Feel free to close this; I just wanted to say that this looks really cool! I'm looking forward to seeing how the ptrace approach stacks up compared to LD_PRELOAD and blight's (relatively dumb) wrapping.

Response files provided by pipes break build-bom

Clang supports response files, which are special "file" arguments prefixed by @ that actually contain the command line arguments that clang should be invoked with. This facility is generally used to support long command lines that exceed system capabilities (especially on Windows). In unusual cases, the file containing the arguments could actually be a pipe. In that case, build-bom generate-bitcode runs the original compilation command, which "drains" the pipe. When it re-runs the command to generate bitcode, the pipe is empty and clang receives no arguments, which means that no bitcode is generated.

This has been observed by @kquick on Nix, which has a wrapper script around clang that uses this pattern.

We can fix this by noticing cases where clang is being invoked with a response file that points to a pipe. If we see that, drain the pipe in build-bom before invoking the original command and save the contents of the response file. Then persist the contents to a temporary file and tweak the execve arguments to the original command to point to the temporary file. When we run the bitcode generation command, just reuse that same temporary file. Note that there is a bit of a technical challenge here, as the filename must exist in the process. We will likely need to both allocate storage for the filename and poke the path byte-by-byte (or via /dev/mem) into the process. This isn't conceptually difficult, but will take effort to implement.

Build systems that delete directories can discard generated bitcode

When build systems place build outputs in temporary directories, build-bom generate-bitcode will place bitcode side-by-side with those outputs. When the build system cleans up temporary directories, the bitcode can be lost.

To fix this, we should add an extra (optional) command line flag to specify a prefix to store bitcode in. The directory structure in the prefix should mirror the real filesystem, for simplicity.

Generating bitcode fails when compiling directly to an executable

$ cat Makefile
CC := clang
CFLAGS :=  -Wall -Werror -g

hello-world: hello-world.c
    $(CC) $(CFLAGS) -o "$@" "$<"

.DEFAULT: build
.PHONY: build
build: hello-world

.PHONY: clean
clean:
    rm -f hello-world

$ cat hello-world.c
#include <stdio.h>

int main() {
  printf("Hello, world!\n");
}

$ make clean && env NIX_CC_USE_RESPONSE_FILE=-1 build-bom generate-bitcode --verbose -- make

rm -f hello-world
clang -Wall -Werror -g -o "hello-world" "hello-world.c"
Bitcode Generation Summary
 0 build steps skipped due to having a pipe as an input or output
 0 build steps skipped due to using a response file (@file)
 0 unresolved outputs with multiple inputs
 0 original build commands failed, causing us to skip bitcode generation
 0 inputs skipped due to being only assembled (-S)
 0 bitcode compilation errors
 0 errors attaching bitcode to object files
 0 attempts at generating bitcode
 0 successful bitcode captures
 last bitcode capture: "<none>"

$ ./hello-world
Hello, world!

$ build-bom extract-bitcode --output=hello-world.bc hello-world

objcopy: hello-world: can't dump section '.llvm_bitcode' - it does not exist: bad value
tar: /run/user/1000/.tmpRs1m09/bitcode.tar: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
llvm-link: Not enough positional command line arguments specified!
Must specify at least 1 positional argument: See: llvm-link -help

Support response files

Clang supports response files, but build-bom currently is not able to analyze their contents to identify inputs and outputs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.