Coder Social home page Coder Social logo

looking-beyond-dalvik-bytecode's Introduction

Looking beyond Dalvik Bytecode

This Repository is for Android Malware Detection based on Image Representation.

For more technical details, please refer to our A-Mobile '21 paper:

"Android Malware Detection: Looking beyond Dalvik Bytecode"

Data Availability

  • Due to the large size of the APKs and Images, we share them upon request.

  • One can find the Hash list of all original APKs in the directory ApkHashList, and download them in AndroZoo.

  • The images can be generated with the script apk2images.py.

To generate images, use apk2images.py script:

This script generates 3 gray-scale images and 1 color-sacle image from an given APK.

INPUT is:

- The path of an APK to convert into images.

OUTPUTs are

- 3 gray-scale images (from .dex, .so and .xml files) and 1 color-sacle image (combined from the 3 types of files).

Example

python3 apk2images APK_PATH

Models Training and Testing

Notes:

  • The evaluation is repeated 10 times using the holdout technique.
  • The training, validation and test hashes are provided in data_splits directory.
  • To run the scripts blow, you need to
    • Extract the gray-scale images and color-scale images for goodware and malware applications in goodware_hashes.txt and malware_hashes.txt using the apk2images.py script.
    • Then organize the directory structure as dataset.example

Model based on Gray-scale Image

To train and test a model based on gray-scale image, use ModelGray.py script:

This script trains the Neural Network using the gray-scale training images, and evaluates its learning using the gray-scale testing dataset.

INPUTs are:

- The path to the directory that contains malware and goodware image folders.
- The name of the directory where to save the model.
- The type of the image source files, which can only be one of 'dex', 'so' or 'xml'.

OUTPUTs are

- The file that contains Accuracy, Precision, Recall, and F1-score of the ten trained models
  and their average scores.
- The ten trained models.

Example:

python3 ModelGray.py -p "dataset_images" -d "results_dir" -t "dex"

Model based on Color-scale Image

To train and test a model based on color-scale image, use ModelColor.py or ModelEnsemble.py scripts:

These two scripts train the Neural Networks using the color-scale training images, and evaluates its learning using the color-scale testing dataset.

INPUTs are:

- The path to the directory that contains malware and goodware image folders.
- The name of the directory where to save the model.

OUTPUTs are

- The file that contains Accuracy, Precision, Recall, and F1-score of the ten trained models
  and their average scores.
- The ten trained models.

Example:

python3 ModelColor.py -p "dataset_images" -d "results_dir"
# or
python3 ModelEnsemble.py -p "dataset_images" -d "results_dir"

looking-beyond-dalvik-bytecode's People

Contributors

tiezhusun avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.