quarkslab / qbindiff Goto Github PK

View Code? Open in Web Editor NEW

167.0 8.0 7.0 5.41 MB

Quarkslab Bindiffer but not only !

Home Page: https://diffing.quarkslab.com

License: Apache License 2.0

Python 92.25% Cython 5.59% Roff 0.87% Meson 1.29%

binary-diffing network-alignment program-analysis reverse-engineering vulnerability-research

qbindiff's Issues

Update to sphinx >= 7.2

This is just a reminder to update to sphinx >= 7.2 as it has a better support for pep 585 builtins generics (see sphinx-doc/sphinx#11570).

Currently there are some blockers to update to sphinx >= 7.2:

sphinx_rtd_theme
enum_tools

Use a better feature for the function names

Right now the only feature that leverages the functions name is FuncName and it works as a yes/no: it gives a similarity of 1 two functions have the exact same name, 0 otherwise.

It might be nice to have a better similarity score, for example using the levenshtein string distance or some LSH function.

Add tests for new passes

New prepass and postpass will be introduced with #38 and #39. Some tests should be implemented.

Redesign the `FeatureExtractor` and `FeatureCollector` interaction

Ideally a FeatureExtractor should return the feature instead of adding it to FeatureCollector passed as a parameter.
The visitor should then add the returned feature to the collector

Parallelize the feature extractions

We could write cython++ code to make use of multiple cores to parallelize some cpu intensive features like the Weisfeiler Lehman Graph Kernel

Clarify the difference between FunctionType.extern and FunctionType.imported

It seems that there shouldn't be any difference between an imported and an extern function, however there are currently two different types.
This difference originates from the Quokka python bindings where it is introduced the external type.

If I guessed correctly the problem arises because sometimes IDA behaves very differently with imported functions in a PE binary (treating them like data instead of code).
More testing is needed.

Belief propagation raised a RuntimeWarning

Hello,

Each time I used qbindiff, I have this warning which is raised, it appears with pip package but also when I have built qbindiff from sources.

As I do not know if this warning is important or note, I cannot propose a patch. But I suppose it should be handled by qbindiff.

[HOME]/.virtualenvs/qbindiff-dev/lib/python3.11/site-packages/qbindiff/matcher/belief_propagation.py:273: RuntimeWarning: overflow encountered in power
  x / (1 + x) for x in np.clip(np.power(math.e, curr_marginals.data), 0, 1e6)

Thanks!

Seperate the data types between simple and composite

Right now the data types are divided in "simple" (word, dword, ascii, etc...) and "structure" (structure, union, enum).
Imho this is not a good choice of words as the term "structure" could either mean a proper "structure" (like a C struct) or the collection of data types (struct, union and enum).

I propose to refactor the data types to add enum as a "simple" type (after all it's not a composite type, its fields' type are homogeneous) and rename the "structure" type to "composite". A "composite" data type would be either a struct or a union, making it in fact a heterogeneous type.

More tests need to be done to prove that a enum can effectively be moved into the "simple" types.

Problem with type of instruction (related to capstone)

Hi, I'm getting the following error when using differ.compute_matching():

Traceback (most recent call last):
  File "test.py", line 70, in <module>
    main()
  File "test.py", line 64, in main
    matches = differ.compute_matching()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/qbindiff/differ.py", line 313, in compute_matching
    for _ in tqdm.tqdm(self._matching_iterator(), total=self.maxiter, disable=not is_debug()):
  File "/lib/python3.11/site-packages/tqdm/std.py", line 1170, in __iter__
    for obj in iterable:
  File "/lib/python3.11/site-packages/qbindiff/differ.py", line 325, in _matching_iterator
    self.process()
  File "/lib/python3.11/site-packages/qbindiff/differ.py", line 303, in process
    self.run_passes()  # User registered passes
    ^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/qbindiff/differ.py", line 274, in run_passes
    self.p_features, self.s_features = pass_func(
                                       ^^^^^^^^^^
  File "/lib/python3.11/site-packages/qbindiff/passes/base.py", line 224, in __call__
    primary_features = self._visitor.visit(primary, key_fun=key_fun)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 79, in visit
    self.visit_item(graph, node, collector)
  File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 150, in visit_item
    self.visit_function(program, item, collector)
  File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 226, in visit_function
    self.visit_basic_block(program, bb, collector)
  File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 245, in visit_basic_block
    self.visit_instruction(program, inst, collector)
  File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 263, in visit_instruction
    self.visit_operand(program, op, collector)
  File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 277, in visit_operand
    callback(program, operand, collector)
  File "/lib/python3.11/site-packages/qbindiff/features/graph.py", line 225, in visit_operand
    if operand.type in (OperandType.memory, OperandType.displacement, OperandType.phrase):
       ^^^^^^^^^^^^
  File /lib/python3.11/site-packages/qbindiff/loader/operand.py", line 46, in type
    return self._backend.type
           ^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/qbindiff/loader/backend/binexport.py", line 144, in type
    raise NotImplementedError(f"Unrecognized capstone type {cs_op_type}")
NotImplementedError: Unrecognized capstone type 3

The code in which the exception occurs is the following:

@property
def type(self) -> OperandType:
    """Returns the capstone operand type"""
    op = self.cs_operand
    typ = OperandType.unknown
    cs_op_type = self.cs_operand.type

    if cs_op_type == capstone.CS_OP_REG:
        return OperandType.register
    elif cs_op_type == capstone.CS_OP_IMM:
        return OperandType.immediate
    elif cs_op_type == capstone.CS_OP_MEM:
        # A displacement is represented as [reg+hex] (example : [rdi+0x1234])
        # Then, base (reg) and disp (hex) should be different of 0
        if op.mem.base != 0 and op.mem.disp != 0:
            typ = OperandType.displacement
        # A phrase is represented as [reg1 + reg2] (example : [rdi + eax])
        # Then, base (reg1) and index (reg2) should be different of 0
        if op.mem.base != 0 and op.mem.index != 0:
            typ = OperandType.phrase
        if op.mem.disp != 0:
            typ = OperandType.displacement
    else:
        raise NotImplementedError(f"Unrecognized capstone type {cs_op_type}")
    return typ

However the type 3 correspond to capsone.CS_OP_MEM as you can see in the interpreter:

>>> import capstone
>>> capstone.CS_OP
{0: 'CS_OP_INVALID', 1: 'CS_OP_REG', 2: 'CS_OP_IMM', 3: 'CS_OP_MEM', 4: 'CS_OP_FP'}
>>> capstone.CS_OP_MEM
3

So there must be something tricky going on...

The Linear Assignment Problem doesn't always make good matches

The LAP (Linear Assignment Problem) sometimes can make very poor choices because the marginals have been already flattened (because of the epsilon relaxation) and it's impossible to extract valuable information. We need either a better algorithm or a better source of information other than the flattened marginals.

It might be possible to use directly the score matrix or run again the NAP problem only on a subset of the nodes.

Add a CFG/CG normalization pass

Possible ideas:

Better handling of thunk functions (can safely be removed in the normalization pass)
Sometimes there might be some code duplication. When function A jump to function B sometimes the whole code of function B is copied within function A instead of considering the jump like a function call

Capstone version compatibility

After merging #36 we will break compatibility with older versions of capstone. We should identify which version is the first one that is compatible with qbindiff and enforce it in pyproject.toml

Move from setuptools to a different build system

Numpy is incompatible with recent version of setuptools starting from version 65.5.2 and will be impossible to use it entirely starting from python 3.12 because of the removal of the deprecated distutils module. See this for reference.
It is recommended to use setuptools < 60.0 for python 3.12 and later even though this won't be enough to guarantee it will work forever.
On top of that setuptools is not very stable in its configuration API relying already on 3 config files, none of which can replace all the others: pyproject.toml, setup.cfg, setup.py.

It seems we cannot expect a fix for this to ever happen as both setuptools and numpy developers have no intention of keeping compatibility between each other.

The best solution would be to drop the usage of setuptools entirely and transition to meson-python that is also the solution used by numpy.

BinExport files result in many keyerror issues

Following the documentation I've tried secondary = Program(LoaderType.binexport, "./objcopy.BinExport")
and p1 = Program(Path("objcopy.BinExport")) both of which crash in unexpected ways.

...
    91     dst = cg.vertex[edge.target_vertex_index].address
     92     self.callgraph.add_edge(src, dst)
---> 93     self[src].children.add(self[dst])
     94     self[dst].parents.add(self[src])
     96 # Create a map of function names for quick lookup later on
...
  self.be_prog = binexport.ProgramBinExport(file)
  File ".../python3.10/site-packages/binexport/program.py", line 93, in __init__
    self[src].children.add(self[dst])
KeyError: 4751856

When using the cli via qbindiff -l 'binexport' file1.BinExport file2.BinExport I get the same sort of key errors as well despite these files working fine with BinDiff as-is.

Note: If I do not explicitly ask for binexport I get this issue

     59     self._backend = ProgramBackendQuokka(*args, **kwargs)
     61 else:
---> 62     raise NotImplementedError("Loader: %s not implemented" % loader)
     64 self._filter = lambda x: True
     65 self._load_functions()

Any idea what could be causing the issue? I am using BinDiff 8 and Ghidra 10.3

Add feature options from command line

Right now there is no way from command line to specify a feature option (like max_passes for the WLGK)

Set default parameters

Set the best values for the hyperparameters

Add a cofigurable parameter threshold for the sparsity matrix

Instead of using just a sparsity ratio as a filter we can also add a parameter threshold, ie: keep all the elements > threshold

Optionally add the raw binary path with binexport

BinExport doesn't export the raw binary path in the protobuf message so either we have to guess it or we can accept it as a parameter in qbindiff. The latter would be better of course but then we lose the option to diff just two .BinExport files, without the raw binaries.

In any case without the raw binary path we cannot recreate the .BinDiff sqlite database.

Qbindiff can't guess the arch

While trying to use qbindiff on a ARM32 Thumb program, i got the following exception:

Cannot guess the instruction set of the instruction at 0x....

I fixed the issue by hard-coding the mode and arch inside the file qbindiff/loader/backend/binexport.py but it could be cool to let the user define the arch and mode when he knows it, something like:

differ = qbindiff.QBinDiff(
    p, q,
    distance=Distance.canberra,
    ...,
    arch="ARM-32",
    mode="THUMB"
)

quarkslab / qbindiff Goto Github PK

qbindiff's Issues

Recommend Projects

Recommend Topics

Recommend Org