Coder Social home page Coder Social logo

reducebin's Introduction

reducebin

Remove junk bytes from a large binary malware.

Reducing the size of the malware sample makes it easier to analyze or submit to online sandboxes.

This script is a first test version, it was written quickly to reduce the size of some malware samples I needed to analyze.

How it works

  • Convert binary file to Hex string
  • Check for blocks of Hex that are 512 characters long
  • They are usually hexadecimal with CC or 00 values
  • Calculate the occurrences and choose the largest one
  • Remove all occurrences to reduce file size

Example

A malware sample whose size is approximately 650MB.

$ ls -lh malware.exe 
-rw-rw-r-- 1 guelfoweb guelfoweb 647M lug 13 10:33 malware.exe

In case null bytes 00 are added at the end of the malware we can lighten the file by removing the null bytes with the sed command:

sed '$ s/\x00*$//' file.exe > file.exe.bin

However, in this case there are no null bytes but a series of hexadecimal CC, they are not arranged consecutively in a single block but randomly interrupted. We can't use the sed command like above because we don't know the exact points of the breaks.

$ xxd malware.exe | tail
286bb850: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb860: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb870: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb880: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb890: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8a0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8c0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8d0: cccc cccc cccc 986b c606 cccc cccc cccc  .......k........
286bb8e0: cccc 608a b206 cccc cccc cccc cccc c802  ..`.............

The entropy is quite low (0.03), a lower entropy value indicates low randomness in the data.

$ python3 reducebin.py malware.exe --entropy
The entropy of the file is: 0.03

So it is still possible to detect a large enough sequence of hexadecimal code (for convenience we start from a length of 512, but we can calibrate the length using the --len parameter) and rely on the number of occurrences (1321814 were detected).

$ python3 reducebin.py malware.exe
INPUT      : malware.exe
Size       : 646.73 MB
Hash MD5   : AECA52204028884A7EC8DF154F83ACAA
String HEX : CCCCCCCC...CCCCCCCC (length = 512)
Count      : 1321814 (occurrences)

OUTPUT     : malware.exe.reduced
Size       : 1.13 MB
Hash MD5   : 753B5FBABAC18F1A2656FF18CE678C60

Reduction  : 99.83 %
Time       : 00:00:11

The resulting file (rhadamanthys) weighs 1.13 MB, a reduction of 99.83% was achieved.

reducebin's People

Contributors

guelfoweb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

reducebin's Issues

More efficient code to compute binary entropy

For the computation of the binary entropy I suggest to use this more efficient code (python >3.8):

l = float(len(content))
H = -sum(map(lambda a: (a / l) * math.log2(a / l), Counter(content).values()))

Even if I think the entropy computation is not a key point of the application.

Thx for the project

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.