Coder Social home page Coder Social logo

android-malware-classification's Introduction

1. Get datasets

  • Pull the dataset of benign apk files and store them to <this repo>/benign_apk/

http://205.174.165.80/CICDataset/CICMalAnal2017/Dataset/APKs/Benign-APKs-2017.zip

  • Pull the datasets of malicious apk files and store them to <this repo>/malicious_apk/

http://205.174.165.80/CICDataset/MalDroid-2020/Dataset/APKs/ (exclude Benign.tar.gz)

  • Rename them to .apk.

2. Extract data

Run extract_apks_parallel.sh unpacks the .apk files into folders and processes some of the data there in. You can monitor it in another shell by running watch "wc -l benign_apk/valid_apks.txt; wc -l malicious_apk/valid_apks.txt"

3. Generate feature vectors

Run one of the following scripts to generate feature vectors:

  • parse_xml.py for permissions. "app_permission_vectors.json" is generated
  • parse_maline_output.py for syscalls. "app_syscall_vectors.json" is generated. You will have to run maline first for this to work.
  • parse_disassembled.py for API calls. "app_method_vectors.json" is generated
  • parse_ssdeep.py for fuzzy hashes. "app_hash_vectors.json" is generated. You will have to run ssdeep first for this to work.
  • combine_features.py for a combination of the top weighted features. "app_feature_vectors.json" is generated. This only works if you've previously trained a network on the specified features, and the feature weights files are named appropriately.

4. Trials

Run $ run_trials.sh app_feature_vectors.json (or whichever json you want) which runs the tensorflow_learn.py script (where the ML happens) a number of times and puts the results into a folder. It also runs plot_data.py and match_features.py to create a plot and create a list of top weighted features, respectively.

5. Tuning

Change the parameters or input data and repeat step 6. It should be non-destructive so you can compare the results of different runs.

Note: If you want to use a SVM instead of a neural network, use sklearn_svm.py in place of tensorflow_learn.py. You can also use sklearn_tree.py to use a decision tree.

android-malware-classification's People

Contributors

mwleeds avatar daivc96 avatar thesecmaven avatar doanhnhq-uit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.