Coder Social home page Coder Social logo

trailofbits / protofuzz Goto Github PK

View Code? Open in Web Editor NEW
264.0 45.0 40.0 59 KB

Google Protocol Buffers message generator

Home Page: https://blog.trailofbits.com/2016/05/18/protofuzz-a-protobuf-fuzzer/

License: MIT License

Python 100.00%
fuzzer protocol-buffers protobuf

protofuzz's Introduction

ProtoFuzz

CI PyPI version

ProtoFuzz is a generic fuzzer for Google’s Protocol Buffers format. Instead of defining a new fuzzer generator for custom binary formats, protofuzz automatically creates a fuzzer based on the same format definition that programs use. ProtoFuzz is implemented as a stand-alone Python3 program.

Installation

Make sure you have protobuf package installed and protoc is accessible from $PATH, and that protoc can generate Python3-compatible code.

$ git clone --recursive [email protected]:trailofbits/protofuzz.git
$ cd protofuzz
$ python3 setup.py install

Usage

>>> from protofuzz import protofuzz
>>> message_fuzzers = protofuzz.from_description_string("""
...     message Address {
...      required int32 house = 1;
...      required string street = 2;
...     }
... """)
>>> for obj in message_fuzzers['Address'].permute():
...     print("Generated object: {}".format(obj))
...
Generated object: house: -1
street: "!"

Generated object: house: 0
street: "!"

Generated object: house: 256
street: "!"
...

You can also create dependencies between arbitrary fields that are resolved with any callable object:

>>> message_fuzzers = protofuzz.from_description_string("""
...     message Address {
...      required int32 house = 1;
...      required string street = 2;
...     }
...     message Other {
...       required Address addr = 1;
...       required uint32 foo = 2;
...     }
... """)
>>> fuzzer = message_fuzzers['Other']
>>> # The following creates a dependency that ensures Other.foo is always set
>>> # to 1 greater than Other.addr.house
>>> fuzzer.add_dependency('foo', 'addr.house', lambda x: x+1)
>>> for obj in fuzzer.permute():
...     print("Generated object: {}".format(obj))

Note however, the values your lambda creates must be conformant to the destination type.

Caveats

Currently, we use fuzzdb for values. This might not be complete or appropriate for your use. Consider swapping it for your own values.

If you have your own separate instance of fuzzdb, you can export FUZZDB_DIR in your environment with the absolute path to your instance.

export FUZZDB_DIR=/path/to/your/fuzzdb

protofuzz's People

Contributors

aergonus avatar bartek1912 avatar binaryflesh avatar dependabot[bot] avatar dguido avatar esultanik avatar jscanlannyc avatar woodruffw avatar xulaus avatar yan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

protofuzz's Issues

Cannot import protofuzz library

Steps to reproduce:

from protofuzz import protofuzz

exp:
No Error
act:

AttributeError: '_NamespacePath' object has no attribute 'sort'

Notes:
IPython with Python 3.5.2
py3-protobuffers==3.0.0a4

Compiling proto-file and loading module which contains other proto-file imports

While creating a dictionary of generators from file by calling protofuzz.from_file(file.proto), the file is compiled using _compile_proto(full_path, dest) in pbimport.py.

Assuming file.proto:

syntax = "proto2";
import "file2.proto";

message Test {
    required Comment c = 1; 
}

and file2.proto:

syntax = "proto2";

message Comment {
    required string s = 1; 
}

Issue: only file.proto is compiled to file_pb2.py while file2.proto is ignored and the module can't be loaded due to a missing import.

(Hacky) Workaround:
I compiled file.proto and file2.proto manually,

protoc --python_out=. file.proto
protoc --python_out=. file2.proto

keeping the outputs file_pb2.py and file2_pb2.py in the same directory and then got the module and generator as follows:

    module = pbimport._load_module(os.path.abspath('file_pb2.py'))
    generator = protofuzz._module_to_generators(module)

Basically, the only thing missing is the compilation of all imported proto-files. Another potential issue could be if the import file is situated in a different folder (e.g. import dir/file2.proto), which would require the creation of a similar folder structure when compiling the files if I'm not mistaken, but I didn't look into it too much.

Unsupported type: <class 'google.protobuf.pyext._message.ScalarMapContainer'>

Hello there!

I have a message in proto file with map parameter:

message BlahRequest {
	enum BlaDataType {
		BLA = 0;
		BLA1 = 1;
		BLA2 = 2;
	}
	BlaDataType Type = 1;
	string Identity = 2;
	string UserSid = 3;
	string ParametersKey = 4;
	map<string, string> Parameters = 5;
	bool BlaDataTables = 6;
	bool BlaIsBla= 7;
}

I'm trying to generate payloads from this proto file like this:

message_fuzzers = protofuzz.from_file(proto_file)
        for obj in message_fuzzers[self.method_name].permute():
        do_some()

but I got an error:

Exception has occurred: RuntimeError
Unsupported type: <class 'google.protobuf.pyext._message.ScalarMapContainer'>

Am I doing something wrong?

Allow generators which don't populate optional fields

What a great tool! I was glad to see that the protobuf descriptor API is close enough for Protobuf v3 that I was able to use protofuzz on the definitions from my Protobuf v3 project. In v3 all nested messages are optional, and we're making a lot of use of this to communicate "no value", but it seems that protofuzz wants to populate all fields. I messed around with it for a while and this implementation was the only way I was able to make it work, which is maybe less than ideal. In particular I'd guess that this behavior should be optional. I also haven't tested it against protobuf v2, where I think you would probably need a little more code to distinguish between required and optional messages.

Fix resource leaks caused by `pkg_resources`

protofuzz.values currently uses the pkg_resources module from Setuptools to load package resources (i.e., fuzzdb assets). These APIs apparently don't make closing their underlying file handles easy, so we should switch to importlib for resource handling.

Example of the warning spew caused by the current pkg_resources usage:

/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/user-agents.txt'>
  source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/http-protocol-methods.txt'>
  source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/crlf-injection.txt'>
  source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/http-protocol/hpp.txt'>
  source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/business-logic/DebugParams.Json.fuzz.txt'>
  source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:56: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/business-logic/CommonDebugParamNames.txt'>
  source = _open_fuzzdb_file(path)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/Users/william/devel/protofuzz/protofuzz/values.py:22: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/business-logic/CommonMethodNames.txt'>
  for value in stream:
ResourceWarning: Enable tracemalloc to get the object allocation traceback
....../Users/william/devel/protofuzz/protofuzz/values.py:90: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/integer-overflow/integer-overflows.txt'>
  for num in _fuzzdb_integers(limit):
ResourceWarning: Enable tracemalloc to get the object allocation traceback
......./Users/william/devel/protofuzz/protofuzz/tests/test_values.py:16: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/william/devel/protofuzz/protofuzz/fuzzdb/attack/rfi/rfi.txt'>
  vals = list(values.get_strings(limit=10))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.

Improve performance to limit file reads when generating fuzzed values.

There is a pretty obvious performance bottleneck. In the function _fuzzdb_get_strings and in the function _fuzzdb_integers the function reads the entire fuzzdb files on every call.

        for subdir in os.listdir(_get_fuzzdb_path()):
            if subdir in ignored:
                continue
            subdir_abs_path = _get_fuzzdb_path() / Path(subdir)
            try:
                listing = os.listdir(subdir_abs_path)
            except NotADirectoryError:
                continue
            for filename in listing:
                if not filename.endswith(".txt"):
                    continue
                subdir_abs_path_filename = subdir_abs_path / Path(filename)
                with open(subdir_abs_path_filename, "rb") as source:
                    for line in source:
                        string = line.decode("utf-8").strip()
                        if not string or string.startswith("#"):
                            continue
                        if max_len != 0 and len(line) > max_len:
                            continue
                        yield string

This causes a pretty obvious performance bottleneck because the file gets read on every call. Instead of reading the file every time a much better strategy would be to put all of the values in to a list and then just return the stuff from that list from ram instead of reading it from the disk constantly.

strings = []

def _fuzzdb_get_strings(max_len: int = 0) -> Generator:
    """Return strings from fuzzdb."""

    if strings == []:
        ignored = ["integer-overflow"]
        for subdir in os.listdir(_get_fuzzdb_path()):
            if subdir in ignored:
                continue
            subdir_abs_path = _get_fuzzdb_path() / Path(subdir)
            try:
                listing = os.listdir(subdir_abs_path)
            except NotADirectoryError:
                continue
            for filename in listing:
                if not filename.endswith(".txt"):
                    continue
                subdir_abs_path_filename = subdir_abs_path / Path(filename)
                with open(subdir_abs_path_filename, "rb") as source:
                    for line in source:
                        string = line.decode("utf-8").strip()
                        if not string or string.startswith("#"):
                            continue
                        if max_len != 0 and len(line) > max_len:
                            continue
                        strings.append(string)
                        yield string
    else:
        for string in strings:
            yield string

We can also obviously do the same for the integers:


integers = []

def _fuzzdb_integers(limit: int = 0) -> Generator:
    """Return integers from fuzzdb."""
    if integers == []:

        path = _get_fuzzdb_path() / Path("integer-overflow/integer-overflows.txt")
        with open(path, "rb") as stream:
            for line in _limit_helper(stream, limit):
                integer = int(line.decode("utf-8"), 0)
                integers.append(integer)
                yield integer
    else:
        for integer in integers:
            yield integer

When parsing around four thousand protobuf messages and then mutating them originally the cProfile was this:

['60786110', '50.667', '0.000', '100.909', '0.000', 'values.py:90(_fuzzdb_get_strings)']
['60973618/60973382', '15.971', '0.000', '116.882', '0.000', 'values.py:72(_limit_helper)']
['62370202', '10.454', '0.000', '10.454', '0.000', '{method', "'decode'", 'of', "'bytes'", 'objects}']
['60981956', '8.501', '0.000', '8.501', '0.000', '{method', "'startswith'", 'of', "'str'", 'objects}']

the second column is total time spent in the function.

after the optimization it is now this:

['60790067/60790036', '12.243', '0.000', '17.458', '0.000', 'values.py:69(_limit_helper)']
['60786110', '5.203', '0.000', '5.215', '0.000', 'values.py:97(_fuzzdb_get_strings)']
['4630', '4.368', '0.001', '20.797', '0.004', 'protofuzz.py:48(_string_generator)']
['265', '0.984', '0.004', '2.028', '0.008', 'protofuzz.py:55(<listcomp>)']
['371565/34403', '0.318', '0.000', '0.875', '0.000', 'gen.py:188(step_generator)']
['48027', '0.286', '0.000', '0.482', '0.000', 'python_message.py:469(init)']

So just by making this simple optimization I have cut down the processing time by atleast in 45 seconds easily.

I have attached a .zip file which contains a .diff file which implements the stuff. Apply it to the a4fd093 commit and you are done.
stuff.zip

No Windows support

This appears to be related to the usage of path.split(':'), rather than path.split(os.pathsep) (Windows path splits on ;, not :).

Cannot handle message that contains repeated fields

Getting a trace dump from trying to call permute() for a class that contains a repeated field:

Traceback (most recent call last):
  File "test_protofuzz.py", line 27, in <module>
    gen_message_from_class(Test)
  File "test_protofuzz.py", line 22, in gen_message_from_class
    for obj in generator.permute():
  File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 183, in _iteration_helper
    yield _fields_to_object(self._descriptor, fields)
  File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 149, in _fields_to_object
    value = _fields_to_object(subtype, value)
  File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 150, in _fields_to_object
    _assign_to_field(obj, name, value)
  File "/Users/goyaogo/venvs/test_msg/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 138, in _assign_to_field
    raise RuntimeError("Unsupported type: {}".format(type(target)))
RuntimeError: Unsupported type: <class 'google.protobuf.pyext._message.RepeatedScalarContainer'>

Cannot handle recursive protobufs

Repro:

from protofuzz import protofuzz

fuzzers = protofuzz.from_description_string("""
    message A {
     optional B b = 1;
    }
    message B {
     optional A a = 1;
    }
""")
for obj in fuzzers['A'].permute():
    print(obj)

Result:

...lots of stuff...
egg/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
    generator = _prototype_to_generator(descriptor, cls)
  File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 100, in _prototype_to_generator
    generator = descriptor_to_generator(descriptor.message_type, cls)
  File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
    generator = _prototype_to_generator(descriptor, cls)
  File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 100, in _prototype_to_generator
    generator = descriptor_to_generator(descriptor.message_type, cls)
  File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
    generator = _prototype_to_generator(descriptor, cls)
  File "/Users/ezyang/Dev/onnx-env/lib/python3.6/site-packages/protofuzz-0.1-py3.6.egg/protofuzz/protofuzz.py", line 83, in _prototype_to_generator
    if descriptor.type in ints32+ints64:
RecursionError: maximum recursion depth exceeded in comparison

Can't the input of fuzz_db be mutated?

The program just reads fuzz_db and assigns values to protobuf randomly, right? It means that protofuzz can't be used for fuzzing test, It just scans the target program simply

Handle `README.md` files in fuzzdb

When cloning latest fuzzdb, loading mechanism fails due to README.md in attacks dir

Traceback
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 173, in _iteration_helper
  generator = descriptor_to_generator(self._descriptor, iter_class)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 113, in descriptor_to_generator
  generator = _prototype_to_generator(descriptor, cls)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 92, in _prototype_to_generator
  generator = _string_generator(descriptor)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 49, in _string_generator
  vals = list(values.get_strings(max_length, limit))
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/values.py", line 23, in _limit_helper
  for value in stream:
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/values.py", line 49, in _fuzzdb_get_strings
  listing = pkg_resources.resource_listdir('protofuzz', path)
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1163, in resource_listdir
  resource_name
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1441, in resource_listdir
  return self._listdir(self._fn(self.module_path, resource_name))
File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1610, in _listdir
  return os.listdir(path)
NotADirectoryError: [Errno 20] Not a directory: '/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/fuzzdb/attack/README.md'

Allow import of generated `_pb2.py`

This may be a misunderstanding of how protobufs work.

If I have a compiled protobuf module, I can try to import the python file as a module

error as expected:

message_fuzzers = protofuzz.from_file("/Users/Spellchaser/PycharmProjects/test/module_pb2.py")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/protofuzz.py", line 251, in from_file
    module = pbimport.from_file(protobuf_file)
  File "/Users/Spellchaser/PycharmProjects/test/venv/lib/python3.7/site-packages/protofuzz/pbimport.py", line 114, in from_file
    raise BadProtobuf()
protofuzz.pbimport.BadProtobuf

Example code not working

Python 3.6.7

protoc --version
libprotoc 3.5.1
protobuf==3.7.1
protofuzz==0.1
py3-protobuffers==3.0.0a4
>>> from protofuzz import protofuzz
>>> message_fuzzers = protofuzz.from_description_string("""
...     message Address {
...      required int32 house = 1;
...      required string street = 2;
...     }
... """)

KeyError: "Couldn't find field google.protobuf.FileOptions.javanano_use_deprecated_package"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.