Coder Social home page Coder Social logo

reducebin's Introduction

reducebin

Remove junk bytes from a large binary malware.

Reducing the size of the malware sample makes it easier to analyze or submit to online sandboxes.

This script is a first test version, it was written quickly to reduce the size of some malware samples I needed to analyze.

How it works

  • Convert binary file to Hex string
  • Check for blocks of Hex that are 512 characters long
  • They are usually hexadecimal with CC or 00 values
  • Calculate the occurrences and choose the largest one
  • Remove all occurrences to reduce file size

Example

A malware sample whose size is approximately 650MB.

$ ls -lh malware.exe 
-rw-rw-r-- 1 guelfoweb guelfoweb 647M lug 13 10:33 malware.exe

In case null bytes 00 are added at the end of the malware we can lighten the file by removing the null bytes with the sed command:

sed '$ s/\x00*$//' file.exe > file.exe.bin

However, in this case there are no null bytes but a series of hexadecimal CC, they are not arranged consecutively in a single block but randomly interrupted. We can't use the sed command like above because we don't know the exact points of the breaks.

$ xxd malware.exe | tail
286bb850: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb860: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb870: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb880: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb890: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8a0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8b0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8c0: cccc cccc cccc cccc cccc cccc cccc cccc  ................
286bb8d0: cccc cccc cccc 986b c606 cccc cccc cccc  .......k........
286bb8e0: cccc 608a b206 cccc cccc cccc cccc c802  ..`.............

The entropy is quite low (0.03), a lower entropy value indicates low randomness in the data.

$ python3 reducebin.py malware.exe --entropy
The entropy of the file is: 0.03

So it is still possible to detect a large enough sequence of hexadecimal code (for convenience we start from a length of 512, but we can calibrate the length using the --len parameter) and rely on the number of occurrences (1321814 were detected).

$ python3 reducebin.py malware.exe
INPUT      : malware.exe
Size       : 646.73 MB
Hash MD5   : AECA52204028884A7EC8DF154F83ACAA
String HEX : CCCCCCCC...CCCCCCCC (length = 512)
Count      : 1321814 (occurrences)

OUTPUT     : malware.exe.reduced
Size       : 1.13 MB
Hash MD5   : 753B5FBABAC18F1A2656FF18CE678C60

Reduction  : 99.83 %
Time       : 00:00:11

The resulting file (rhadamanthys) weighs 1.13 MB, a reduction of 99.83% was achieved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.