quarkslab / qbindiff Goto Github PK
View Code? Open in Web Editor NEWQuarkslab Bindiffer but not only !
Home Page: https://diffing.quarkslab.com
License: Apache License 2.0
Quarkslab Bindiffer but not only !
Home Page: https://diffing.quarkslab.com
License: Apache License 2.0
This is just a reminder to update to sphinx >= 7.2 as it has a better support for pep 585 builtins generics (see sphinx-doc/sphinx#11570).
Currently there are some blockers to update to sphinx >= 7.2:
Right now the only feature that leverages the functions name is FuncName
and it works as a yes/no: it gives a similarity of 1 two functions have the exact same name, 0 otherwise.
It might be nice to have a better similarity score, for example using the levenshtein string distance or some LSH function.
A CSV exporter is badly needed.
Ideally a FeatureExtractor
should return the feature instead of adding it to FeatureCollector
passed as a parameter.
The visitor should then add the returned feature to the collector
We could write cython++ code to make use of multiple cores to parallelize some cpu intensive features like the Weisfeiler Lehman Graph Kernel
It seems that there shouldn't be any difference between an imported and an extern function, however there are currently two different types.
This difference originates from the Quokka python bindings where it is introduced the external type.
If I guessed correctly the problem arises because sometimes IDA behaves very differently with imported functions in a PE binary (treating them like data instead of code).
More testing is needed.
Hello,
Each time I used qbindiff, I have this warning which is raised, it appears with pip package but also when I have built qbindiff from sources.
As I do not know if this warning is important or note, I cannot propose a patch. But I suppose it should be handled by qbindiff.
[HOME]/.virtualenvs/qbindiff-dev/lib/python3.11/site-packages/qbindiff/matcher/belief_propagation.py:273: RuntimeWarning: overflow encountered in power
x / (1 + x) for x in np.clip(np.power(math.e, curr_marginals.data), 0, 1e6)
Thanks!
Right now the data types are divided in "simple" (word, dword, ascii, etc...) and "structure" (structure, union, enum).
Imho this is not a good choice of words as the term "structure" could either mean a proper "structure" (like a C struct
) or the collection of data types (struct, union and enum).
I propose to refactor the data types to add enum as a "simple" type (after all it's not a composite type, its fields' type are homogeneous) and rename the "structure" type to "composite". A "composite" data type would be either a struct or a union, making it in fact a heterogeneous type.
More tests need to be done to prove that a enum can effectively be moved into the "simple" types.
Hi, I'm getting the following error when using differ.compute_matching()
:
Traceback (most recent call last):
File "test.py", line 70, in <module>
main()
File "test.py", line 64, in main
matches = differ.compute_matching()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/qbindiff/differ.py", line 313, in compute_matching
for _ in tqdm.tqdm(self._matching_iterator(), total=self.maxiter, disable=not is_debug()):
File "/lib/python3.11/site-packages/tqdm/std.py", line 1170, in __iter__
for obj in iterable:
File "/lib/python3.11/site-packages/qbindiff/differ.py", line 325, in _matching_iterator
self.process()
File "/lib/python3.11/site-packages/qbindiff/differ.py", line 303, in process
self.run_passes() # User registered passes
^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/qbindiff/differ.py", line 274, in run_passes
self.p_features, self.s_features = pass_func(
^^^^^^^^^^
File "/lib/python3.11/site-packages/qbindiff/passes/base.py", line 224, in __call__
primary_features = self._visitor.visit(primary, key_fun=key_fun)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 79, in visit
self.visit_item(graph, node, collector)
File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 150, in visit_item
self.visit_function(program, item, collector)
File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 226, in visit_function
self.visit_basic_block(program, bb, collector)
File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 245, in visit_basic_block
self.visit_instruction(program, inst, collector)
File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 263, in visit_instruction
self.visit_operand(program, op, collector)
File "/lib/python3.11/site-packages/qbindiff/visitor.py", line 277, in visit_operand
callback(program, operand, collector)
File "/lib/python3.11/site-packages/qbindiff/features/graph.py", line 225, in visit_operand
if operand.type in (OperandType.memory, OperandType.displacement, OperandType.phrase):
^^^^^^^^^^^^
File /lib/python3.11/site-packages/qbindiff/loader/operand.py", line 46, in type
return self._backend.type
^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/qbindiff/loader/backend/binexport.py", line 144, in type
raise NotImplementedError(f"Unrecognized capstone type {cs_op_type}")
NotImplementedError: Unrecognized capstone type 3
The code in which the exception occurs is the following:
@property
def type(self) -> OperandType:
"""Returns the capstone operand type"""
op = self.cs_operand
typ = OperandType.unknown
cs_op_type = self.cs_operand.type
if cs_op_type == capstone.CS_OP_REG:
return OperandType.register
elif cs_op_type == capstone.CS_OP_IMM:
return OperandType.immediate
elif cs_op_type == capstone.CS_OP_MEM:
# A displacement is represented as [reg+hex] (example : [rdi+0x1234])
# Then, base (reg) and disp (hex) should be different of 0
if op.mem.base != 0 and op.mem.disp != 0:
typ = OperandType.displacement
# A phrase is represented as [reg1 + reg2] (example : [rdi + eax])
# Then, base (reg1) and index (reg2) should be different of 0
if op.mem.base != 0 and op.mem.index != 0:
typ = OperandType.phrase
if op.mem.disp != 0:
typ = OperandType.displacement
else:
raise NotImplementedError(f"Unrecognized capstone type {cs_op_type}")
return typ
However the type 3 correspond to capsone.CS_OP_MEM
as you can see in the interpreter:
>>> import capstone
>>> capstone.CS_OP
{0: 'CS_OP_INVALID', 1: 'CS_OP_REG', 2: 'CS_OP_IMM', 3: 'CS_OP_MEM', 4: 'CS_OP_FP'}
>>> capstone.CS_OP_MEM
3
So there must be something tricky going on...
The LAP (Linear Assignment Problem) sometimes can make very poor choices because the marginals have been already flattened (because of the epsilon relaxation) and it's impossible to extract valuable information. We need either a better algorithm or a better source of information other than the flattened marginals.
It might be possible to use directly the score matrix or run again the NAP problem only on a subset of the nodes.
Possible ideas:
A
jump to function B
sometimes the whole code of function B
is copied within function A
instead of considering the jump like a function callAfter merging #36 we will break compatibility with older versions of capstone. We should identify which version is the first one that is compatible with qbindiff and enforce it in pyproject.toml
Numpy is incompatible with recent version of setuptools starting from version 65.5.2 and will be impossible to use it entirely starting from python 3.12 because of the removal of the deprecated distutils
module. See this for reference.
It is recommended to use setuptools < 60.0 for python 3.12 and later even though this won't be enough to guarantee it will work forever.
On top of that setuptools is not very stable in its configuration API relying already on 3 config files, none of which can replace all the others: pyproject.toml
, setup.cfg
, setup.py
.
It seems we cannot expect a fix for this to ever happen as both setuptools and numpy developers have no intention of keeping compatibility between each other.
The best solution would be to drop the usage of setuptools entirely and transition to meson-python that is also the solution used by numpy.
Following the documentation I've tried secondary = Program(LoaderType.binexport, "./objcopy.BinExport")
and p1 = Program(Path("objcopy.BinExport"))
both of which crash in unexpected ways.
...
91 dst = cg.vertex[edge.target_vertex_index].address
92 self.callgraph.add_edge(src, dst)
---> 93 self[src].children.add(self[dst])
94 self[dst].parents.add(self[src])
96 # Create a map of function names for quick lookup later on
...
self.be_prog = binexport.ProgramBinExport(file)
File ".../python3.10/site-packages/binexport/program.py", line 93, in __init__
self[src].children.add(self[dst])
KeyError: 4751856
When using the cli via qbindiff -l 'binexport' file1.BinExport file2.BinExport
I get the same sort of key errors as well despite these files working fine with BinDiff as-is.
Note: If I do not explicitly ask for binexport
I get this issue
59 self._backend = ProgramBackendQuokka(*args, **kwargs)
61 else:
---> 62 raise NotImplementedError("Loader: %s not implemented" % loader)
64 self._filter = lambda x: True
65 self._load_functions()
Any idea what could be causing the issue? I am using BinDiff 8 and Ghidra 10.3
Right now there is no way from command line to specify a feature option (like max_passes
for the WLGK)
Set the best values for the hyperparameters
Instead of using just a sparsity ratio as a filter we can also add a parameter threshold, ie: keep all the elements > threshold
BinExport doesn't export the raw binary path in the protobuf message so either we have to guess it or we can accept it as a parameter in qbindiff. The latter would be better of course but then we lose the option to diff just two .BinExport files, without the raw binaries.
In any case without the raw binary path we cannot recreate the .BinDiff sqlite database.
While trying to use qbindiff on a ARM32 Thumb program, i got the following exception:
Cannot guess the instruction set of the instruction at 0x....
I fixed the issue by hard-coding the mode and arch inside the file qbindiff/loader/backend/binexport.py
but it could be cool to let the user define the arch and mode when he knows it, something like:
differ = qbindiff.QBinDiff(
p, q,
distance=Distance.canberra,
...,
arch="ARM-32",
mode="THUMB"
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.