1. Get datasets

Pull the dataset of benign apk files and store them to <this repo>/benign_apk/

http://205.174.165.80/CICDataset/CICMalAnal2017/Dataset/APKs/Benign-APKs-2017.zip

Pull the datasets of malicious apk files and store them to <this repo>/malicious_apk/

http://205.174.165.80/CICDataset/MalDroid-2020/Dataset/APKs/ (exclude Benign.tar.gz)

Rename them to .apk.

2. Extract data

Run extract_apks_parallel.sh unpacks the .apk files into folders and processes some of the data there in. You can monitor it in another shell by running watch "wc -l benign_apk/valid_apks.txt; wc -l malicious_apk/valid_apks.txt"

3. Generate feature vectors

Run one of the following scripts to generate feature vectors:

parse_xml.py for permissions. "app_permission_vectors.json" is generated
parse_maline_output.py for syscalls. "app_syscall_vectors.json" is generated. You will have to run maline first for this to work.
parse_disassembled.py for API calls. "app_method_vectors.json" is generated
parse_ssdeep.py for fuzzy hashes. "app_hash_vectors.json" is generated. You will have to run ssdeep first for this to work.
combine_features.py for a combination of the top weighted features. "app_feature_vectors.json" is generated. This only works if you've previously trained a network on the specified features, and the feature weights files are named appropriately.

4. Trials

Run $ run_trials.sh app_feature_vectors.json (or whichever json you want) which runs the tensorflow_learn.py script (where the ML happens) a number of times and puts the results into a folder. It also runs plot_data.py and match_features.py to create a plot and create a list of top weighted features, respectively.

5. Tuning

Change the parameters or input data and repeat step 6. It should be non-destructive so you can compare the results of different runs.

Note: If you want to use a SVM instead of a neural network, use sklearn_svm.py in place of tensorflow_learn.py. You can also use sklearn_tree.py to use a decision tree.

s0urc-3 / android-malware-classification Goto Github PK

android-malware-classification's Introduction

1. Get datasets

2. Extract data

3. Generate feature vectors

4. Trials

5. Tuning

android-malware-classification's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent