Coder Social home page Coder Social logo

geekscrapy / bingraph Goto Github PK

View Code? Open in Web Editor NEW
58.0 3.0 11.0 11.64 MB

Simple tool to graph files for quick analysis

License: GNU Affero General Public License v3.0

Python 100.00%
matplotlib graph malware visualization entropy histogram json

bingraph's Introduction

binGraph.py

A tool to graph files for quick visual analysis of binary files

Feel free to use this project (in its entirety) in other tools, and please provide attribution back to the project.

Creates matplotlib graphs to represent different aspects of a file (usually malware). Focusing on entropy.

Given a file(s) (with --file) different graphs can be generated (e.g. ent, hist etc.) or all can be used to generate all the graphs available.

Below are the --help options:

$ python binGraph.py --help
usage: binGraph.py [-h] -f malware.exe [malware.exe ...] [-r] [-] [--prefix]
                   [--out /data/graphs/] [--json] [--graphtitle "file.exe"]
                   [--showplt] [--format png] [--figsize # #] [--dpi 100]
                   [--blob] [-v]
                   {all,hist,ent} ...

positional arguments:
  {all,hist,ent}        Graph type to generate. Graphs can also be
                        individually generated by running the in isolation:
                        python graphs/ent/graph.py -f file.bin

optional arguments:
  -h, --help            show this help message and exit
  -f malware.exe [malware.exe ...], --file malware.exe [malware.exe ...]
                        Give me a graph of this file. Provide a list of files
                        with the "@files.txt" syntax (for example from a
                        `find` command). See - if this is the only argument
                        specified.
  -r, --recurse         If --file is a directory, add files recursively
  -                     *** Required if --file or -f is the only argument
                        given before a graph type is provided (it's greedy!).
                        E.g. "binGraph.py --file mal.exe - bin_ent"
  --prefix              Add this prefix to the saved filenames
  --out /data/graphs/   Where to save the graph files
  --json                Ouput graphs as json with graph images encoded as
                        Base64
  --graphtitle "file.exe"
                        Given title for graphs
  --showplt             Show plot interactively (disables saving to file)
  --format png          Graph output format. All matplotlib outputs are
                        supported: e.g. png, pdf, ps, eps, svg
  --figsize # #         Figure width and height in inches
  --dpi 100             Figure dpi
  --blob                Do not intelligently parse certain file types. Treat
                        all files as a binary blob. E.g. don't add PE entry
                        point or section splitter to the graph
  -v, --verbose         Print debug information to stderr

Binary Entropy - ent

Shows the entropy over certain sized chunked samples of the binary file. The sample size is scaled to the --chunks option (defaults to 750). More chunks give mode detail, but can get messy! The --ibytes option provides a method to highlight certain bytes and their occurence within that sample set. This often has direct reflection to why entropy goes up or down - lots of 0's? Entropy line goes down, and 0's line go up! --ibytes must be an list of json dictionaries. Dictionaries must contain a "name", and "bytes" values. "bytes" is an array of integers which are interpretted as hex bytes. The optional "colour" value can be a matplotlib colour (e.g. r, b or hex with/or without alpha), or not defined (in this case a seeded value is used)

Binary entropy graph !MALWARE! Sample from: https://cape.contextis.com/analysis/20194/

$ python binGraph.py ent --help
usage: binGraph.py ent [-h] [-c 750] [--ibytes [{ "name":"0s", "bytes":[0] },
                       { "name":"Exploit", "bytes":[44, 144], "colour":"r" }]]
                       [--entcolour #cf3da2ff]

optional arguments:
  -h, --help            show this help message and exit
  -c 750, --chunks 750  Defines how many chunks the binary is split into (and
                        therefore the amount of bytes submitted for shannon
                        sampling per time). Higher number gives more detail
  --ibytes [ { "name":"0s", "bytes":[0] }, { "name":"Exploit", "bytes":[44, 144], "colour":"r" } ]
                        Bytes occurances to add to the graph - used to add
                        extra visability into the type of bytes included in
                        the binary. To disable this option, set the flag
                        without an argument. The "name" value is the name of
                        the bytes for the legend, the "bytes" value is the
                        bytes to count the percentage of per section, the
                        "colour" value maybe a matplotlib colour ( r,g,b
                        etc.), a hex with or without an alpha value, or not
                        defined (a seeded colour is chosen). The easiest way
                        to construct these values is to create a dictionary
                        and convert it using 'print(json.loads(dict))'
  --entcolour #cf3da2ff
                        Colour of the Entropy line

Binary Histogram - hist

Provides an insight into the occurence of all bytes in the file. Two graphs are overlayed, the red graph shows bytes 0x00 to 0xFF in order. The blue graph shows the same bytes, ordered by count, this shows the overall distribution.

Binary byte histogram !MALWARE! Sample from: https://cape.contextis.com/analysis/20194/

$ python binGraph.py hist --help
usage: binGraph.py hist [-h] [--no_zero] [--width 1] [--no_log] [--no_order]
                        [--colours #ff01d5 #ff01d5]

optional arguments:
  -h, --help            show this help message and exit
  --no_zero             Remove 0x00 from the graph, sometimes this blows other
                        results due to there being numerous amounts - also see
                        --no_log
  --width 1             Sample width
  --no_log              Do _not_ apply a log scale to occurance axis
  --no_order            Remove the ordered histogram - It shows overall
                        distribution when on
  --colours #ff01d5 #ff01d5
                        Colours for the graph. First value is the ordered graph

To do:

  • Read from stdin for use with other tools such as Didier Stevens's zipdump.py - Kaitai allows binary array input
    • ent graph (and others) to use Kaitai as the parser instead of third party libs - bit more extensible
  • Add extra graph types - Hilbert curve

bingraph's People

Contributors

0ki avatar doomedraven avatar kevoreilly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bingraph's Issues

I have a file 340s.exe but when I run the file it gives me this

I have a file 340s.exe but when I run the file it gives me this
python binGraph.py -f 340s.exe --out "/home/marshall/Downloads/clg project2/toGraph" --json --graphtitle "340s.exe" --showplt --format png --figsize 8 8 --dpi 100 --blob -v 'ent' agg matplotlib backend in use. This graph generation was tested with "TkAgg", bugs may lie ahead... Parsing file as blob (as requested) /home/marshall/Downloads/clg project2/binGraph-master/binGraph/binGraph.py:230: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure. plt.show()
What to do now?
Screenshot from 2022-07-07 12-37-43

Originally posted by @Concept606 in #4 (comment)

rename to binPlot

The word graph creates incorrect impression that it generates graphs with vertices and edges.

Return graph data for further analysis

Return an object which contains all the points making the graph which will enable use of the output of this tool in other tools as data, rather than just an image

percentages are returned in args_dict['ibytes']

Once generate_graphs() is called on graph ent, 'percentages' values are appended to the ibytes list.

To replicate:

  • Import and submit graph to generate_graphs(args_dict)
  • print args_dict - it shouldn't contain the 'percentages' list

I believe this is due to not copying the dictionary, but using the deepcopy.

Refactor/tidy __main__ to feed properly into generate_graphs()

There is some code overlap in the main section that should be included in generate_graphs() - most functionality should be in generate_graphs() anyway.

This will be a fairly mixed commit, so please can contributors hold off pull requests for the main and generate_graphs() until I have refactored the code.

Thanks!

LICENSE of binGraph

binGraph is really nifty and I would like to integrate it as a misp-modules. I cannot find the license file. What's the open source license of binGraph? Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.