trailofbits / protofuzz Goto Github PK
View Code? Open in Web Editor NEWGoogle Protocol Buffers message generator
Home Page: https://blog.trailofbits.com/2016/05/18/protofuzz-a-protobuf-fuzzer/
License: MIT License
Google Protocol Buffers message generator
Home Page: https://blog.trailofbits.com/2016/05/18/protofuzz-a-protobuf-fuzzer/
License: MIT License
This appears to be related to the usage of path.split(':')
, rather than path.split(os.pathsep)
(Windows path splits on ;
, not :
).
This may be a misunderstanding of how protobufs work.
If I have a compiled protobuf module, I can try to import the python file as a module
error as expected:
message_fuzzers = protofuzz.from_file("/Users/Spellchaser/PycharmProjects/test/module_pb2.py")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 251, in from_file
module = pbimport.from_file(protobuf_file)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/pbimport.py", line 114, in from_file
raise BadProtobuf()
protofuzz.pbimport.BadProtobuf
While creating a dictionary of generators from file by calling protofuzz.from_file(file.proto)
, the file is compiled using _compile_proto(full_path, dest)
in pbimport.py
.
Assuming file.proto:
syntax = "proto2";
import "file2.proto";
message Test {
required Comment c = 1;
}
and file2.proto:
syntax = "proto2";
message Comment {
required string s = 1;
}
Issue: only file.proto
is compiled to file_pb2.py
while file2.proto
is ignored and the module can't be loaded due to a missing import.
(Hacky) Workaround:
I compiled file.proto
and file2.proto
manually,
protoc --python_out=. file.proto
protoc --python_out=. file2.proto
keeping the outputs file_pb2.py
and file2_pb2.py
in the same directory and then got the module and generator as follows:
module = pbimport._load_module(os.path.abspath('file_pb2.py'))
generator = protofuzz._module_to_generators(module)
Basically, the only thing missing is the compilation of all imported proto-files. Another potential issue could be if the import file is situated in a different folder (e.g. import dir/file2.proto
), which would require the creation of a similar folder structure when compiling the files if I'm not mistaken, but I didn't look into it too much.
We should do this in the CI.
Hello there!
I have a message in proto file with map parameter:
message BlahRequest {
enum BlaDataType {
BLA = 0;
BLA1 = 1;
BLA2 = 2;
}
BlaDataType Type = 1;
string Identity = 2;
string UserSid = 3;
string ParametersKey = 4;
map<string, string> Parameters = 5;
bool BlaDataTables = 6;
bool BlaIsBla= 7;
}
I'm trying to generate payloads from this proto file like this:
message_fuzzers = protofuzz.from_file(proto_file)
for obj in message_fuzzers[self.method_name].permute():
do_some()
but I got an error:
Exception has occurred: RuntimeError
Unsupported type: <class 'google.protobuf.pyext._message.ScalarMapContainer'>
Am I doing something wrong?
There is a pretty obvious performance bottleneck. In the function _fuzzdb_get_strings and in the function _fuzzdb_integers the function reads the entire fuzzdb files on every call.
for subdir in os.listdir(_get_fuzzdb_path()):
if subdir in ignored:
continue
subdir_abs_path = _get_fuzzdb_path() / Path(subdir)
try:
listing = os.listdir(subdir_abs_path)
except NotADirectoryError:
continue
for filename in listing:
if not filename.endswith(".txt"):
continue
subdir_abs_path_filename = subdir_abs_path / Path(filename)
with open(subdir_abs_path_filename, "rb") as source:
for line in source:
string = line.decode("utf-8").strip()
if not string or string.startswith("#"):
continue
if max_len != 0 and len(line) > max_len:
continue
yield string
This causes a pretty obvious performance bottleneck because the file gets read on every call. Instead of reading the file every time a much better strategy would be to put all of the values in to a list and then just return the stuff from that list from ram instead of reading it from the disk constantly.
strings = []
def _fuzzdb_get_strings(max_len: int = 0) -> Generator:
"""Return strings from fuzzdb."""
if strings == []:
ignored = ["integer-overflow"]
for subdir in os.listdir(_get_fuzzdb_path()):
if subdir in ignored:
continue
subdir_abs_path = _get_fuzzdb_path() / Path(subdir)
try:
listing = os.listdir(subdir_abs_path)
except NotADirectoryError:
continue
for filename in listing:
if not filename.endswith(".txt"):
continue
subdir_abs_path_filename = subdir_abs_path / Path(filename)
with open(subdir_abs_path_filename, "rb") as source:
for line in source:
string = line.decode("utf-8").strip()
if not string or string.startswith("#"):
continue
if max_len != 0 and len(line) > max_len:
continue
strings.append(string)
yield string
else:
for string in strings:
yield string
We can also obviously do the same for the integers:
integers = []
def _fuzzdb_integers(limit: int = 0) -> Generator:
"""Return integers from fuzzdb."""
if integers == []:
path = _get_fuzzdb_path() / Path("integer-overflow/integer-overflows.txt")
with open(path, "rb") as stream:
for line in _limit_helper(stream, limit):
integer = int(line.decode("utf-8"), 0)
integers.append(integer)
yield integer
else:
for integer in integers:
yield integer
When parsing around four thousand protobuf messages and then mutating them originally the cProfile was this:
['60786110', '50.667', '0.000', '100.909', '0.000', 'values.py:90(_fuzzdb_get_strings)']
['60973618/60973382', '15.971', '0.000', '116.882', '0.000', 'values.py:72(_limit_helper)']
['62370202', '10.454', '0.000', '10.454', '0.000', '{method', "'decode'", 'of', "'bytes'", 'objects}']
['60981956', '8.501', '0.000', '8.501', '0.000', '{method', "'startswith'", 'of', "'str'", 'objects}']
the second column is total time spent in the function.
after the optimization it is now this:
['60790067/60790036', '12.243', '0.000', '17.458', '0.000', 'values.py:69(_limit_helper)']
['60786110', '5.203', '0.000', '5.215', '0.000', 'values.py:97(_fuzzdb_get_strings)']
['4630', '4.368', '0.001', '20.797', '0.004', 'protofuzz.py:48(_string_generator)']
['265', '0.984', '0.004', '2.028', '0.008', 'protofuzz.py:55(<listcomp>)']
['371565/34403', '0.318', '0.000', '0.875', '0.000', 'gen.py:188(step_generator)']
['48027', '0.286', '0.000', '0.482', '0.000', 'python_message.py:469(init)']
So just by making this simple optimization I have cut down the processing time by atleast in 45 seconds easily.
I have attached a .zip file which contains a .diff file which implements the stuff. Apply it to the a4fd093 commit and you are done.
stuff.zip
What a great tool! I was glad to see that the protobuf descriptor API is close enough for Protobuf v3 that I was able to use protofuzz on the definitions from my Protobuf v3 project. In v3 all nested messages are optional, and we're making a lot of use of this to communicate "no value", but it seems that protofuzz wants to populate all fields. I messed around with it for a while and this implementation was the only way I was able to make it work, which is maybe less than ideal. In particular I'd guess that this behavior should be optional. I also haven't tested it against protobuf v2, where I think you would probably need a little more code to distinguish between required and optional messages.
protofuzz.values
currently uses the pkg_resources
module from Setuptools to load package resources (i.e., fuzzdb
assets). These APIs apparently don't make closing their underlying file handles easy, so we should switch to importlib
for resource handling.
Example of the warning spew caused by the current pkg_resources
usage:
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/user-agents.txt'>
source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/http-protocol-methods.txt'>
source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/crlf-injection.txt'>
source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/hpp.txt'>
source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/business-logic/DebugParams.Json.fuzz.txt'>
source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/business-logic/CommonDebugParamNames.txt'>
source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:22: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/business-logic/CommonMethodNames.txt'>
for value in stream:
ResourceWarning: Enable tracemalloc to get the object allocation traceback
....../Users/william/devel/protofuzz/protofuzz/values.py:90: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/integer-overflow/integer-overflows.txt'>
for num in _fuzzdb_integers(limit):
ResourceWarning: Enable tracemalloc to get the object allocation traceback
......./Users/william/devel/protofuzz/protofuzz/tests/test_values.py:16: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/rfi/rfi.txt'>
vals = list(values.get_strings(limit=10))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.
Steps to reproduce:
from protofuzz import protofuzz
exp:
No Error
act:
AttributeError: '_NamespacePath' object has no attribute 'sort'
Notes:
IPython with Python 3.5.2
py3-protobuffers==3.0.0a4
Python 3.6.7
protoc --version
libprotoc 3.5.1
protobuf==3.7.1
protofuzz==0.1
py3-protobuffers==3.0.0a4
>>> from protofuzz import protofuzz
>>> message_fuzzers = protofuzz.from_description_string("""
... message Address {
... required int32 house = 1;
... required string street = 2;
... }
... """)
KeyError: "Couldn't find field google.protobuf.FileOptions.javanano_use_deprecated_package"
When cloning latest fuzzdb, loading mechanism fails due to README.md
in attacks dir
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 173, in _iteration_helper
generator = descriptor_to_generator(self._descriptor, iter_class)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
generator = _prototype_to_generator(descriptor, cls)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 92, in _prototype_to_generator
generator = _string_generator(descriptor)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 49, in _string_generator
vals = list(values.get_strings(max_length, limit))
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/values.py", line 23, in _limit_helper
for value in stream:
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/values.py", line 49, in _fuzzdb_get_strings
listing = pkg_resources.resource_listdir('protofuzz', path)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1163, in resource_listdir
resource_name
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1441, in resource_listdir
return self._listdir(self._fn(self.module_path, resource_name))
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1610, in _listdir
return os.listdir(path)
NotADirectoryError: [Errno 20] Not a directory: '/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/fuzzdb/attack/README.md'
The program just reads fuzz_db and assigns values to protobuf randomly, right? It means that protofuzz can't be used for fuzzing test, It just scans the target program simply
Getting a trace dump from trying to call permute() for a class that contains a repeated field:
Traceback (most recent call last):
File "test_protofuzz.py", line 27, in <module>
gen_message_from_class(Test)
File "test_protofuzz.py", line 22, in gen_message_from_class
for obj in generator.permute():
File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 183, in _iteration_helper
yield _fields_to_object(self._descriptor, fields)
File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 149, in _fields_to_object
value = _fields_to_object(subtype, value)
File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 150, in _fields_to_object
_assign_to_field(obj, name, value)
File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 138, in _assign_to_field
raise RuntimeError("Unsupported type: {}".format(type(target)))
RuntimeError: Unsupported type: <class 'google.protobuf.pyext._message.RepeatedScalarContainer'>
Repro:
from protofuzz import protofuzz
fuzzers = protofuzz.from_description_string("""
message A {
optional B b = 1;
}
message B {
optional A a = 1;
}
""")
for obj in fuzzers['A'].permute():
print(obj)
Result:
...lots of stuff...
egg/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
generator = _prototype_to_generator(descriptor, cls)
File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 100, in _prototype_to_generator
generator = descriptor_to_generator(descriptor.message_type, cls)
File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
generator = _prototype_to_generator(descriptor, cls)
File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 100, in _prototype_to_generator
generator = descriptor_to_generator(descriptor.message_type, cls)
File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
generator = _prototype_to_generator(descriptor, cls)
File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 83, in _prototype_to_generator
if descriptor.type in ints32+ints64:
RecursionError: maximum recursion depth exceeded in comparison
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.