This repository contains a set of scripts to automate the process of gathering data from malware samples, training a machine learning model on that data, and plotting its classification accuracy.
-
Make a copy of config-template.ini called config.ini and edit it.
-
Ensure that the "tools" subdirectory has been initialized ("
$ git submodule update --init tools
") -
Either use
get_samples.py
to download samples or copy them into "all_apks" from another source. -
sort_malicious.py
uses andrototal.org to sort them into "malicious_apk" and "benign_apk" folders. -
extract_apks.sh
unpacks the .apk files into folders and checks the AndroidManifest.xml files for validity. -
parse_xml.py
reads the AndroidManifest.xml files and puts the permissions requested by each app into "app_permission_vectors.json". -
run_trials.sh
runs thetensorflow_learn.py
script (where the ML happens) a number of times and writes the results to "results.csv". -
plot_data.py
plots the data produced by the previous step using matplotlib.