Coder Social home page Coder Social logo

5l1v3r1 / android-malware-analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mwleeds/android-malware-analysis

0.0 1.0 0.0 1.37 MB

This project seeks to apply machine learning algorithms to Android malware classification.

License: GNU General Public License v3.0

Shell 9.39% Python 90.61%

android-malware-analysis's Introduction

Getting an API Key

AndroTotal has simplified the process for getting an API Key. Login/Create an Account at http://andrototal.org/ and you will then be able to view your profile settings. There is an API Tab which contains your key.

This repository contains a set of scripts to automate the process of gathering data from malware samples, training a machine learning model on that data, and plotting its classification accuracy.

  1. Make a copy of config-template.ini called config.ini and edit it.

  2. Ensure that the "tools" subdirectory has been initialized ("$ git submodule update --init tools")

  3. Either use get_samples.py to download samples or copy them into "all_apks" from another source. If you're using get_samples.py, you can monitor it in another shell by running watch "ls -l *.apk | wc -l"

  4. sort_malicious.py uses andrototal.org to sort them into "malicious_apk" and "benign_apk" folders. You can monitor it in another shell by running watch "ls -l benign_apk/*.apk | wc -l && ls -l malicious_apk/*.apk | wc -l"

  5. extract_apks_parallel.sh unpacks the .apk files into folders and processes some of the data therein. You can monitor it in another shell by running watch "wc -l benign_apk/valid_apks.txt; wc -l malicious_apk/valid_apks.txt"

  6. Run one of the following scripts to generate feature vectors:

    • parse_xml.py for permissions. "app_permission_vectors.json" is generated
    • parse_maline_output.py for syscalls. "app_syscall_vectors.json" is generated. You will have to run maline first for this to work.
    • parse_disassembled.py for API calls. "app_method_vectors.json" is generated
    • parse_ssdeep.py for fuzzy hashes. "app_hash_vectors.json" is generated. You will have to run ssdeep first for this to work.
    • combine_features.py for a combination of the top weighted features. "app_feature_vectors.json" is generated. This only works if you've previously trained a network on the specified features, and the feature weights files are named appropriately.
  7. Run $ run_trials.sh app_feature_vectors.json (or whichever json you want) which runs the tensorflow_learn.py script (where the ML happens) a number of times and puts the results into a folder. It also runs plot_data.py and match_features.py to create a plot and create a list of top weighted features, respectively.

  8. Change the parameters or input data and repeat step 6. It should be non-destructive so you can compare the results of different runs.

Note: If you want to use a SVM instead of a neural network, use sklearn_svm.py in place of tensorflow_learn.py. You can also use sklearn_tree.py to use a decision tree.

android-malware-analysis's People

Contributors

mwleeds avatar thesecmaven avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.