Coder Social home page Coder Social logo

annamalai-nr / drebin Goto Github PK

View Code? Open in Web Editor NEW
100.0 8.0 46.0 19.2 MB

Drebin - NDSS 2014 Re-implementation

Makefile 0.04% Python 99.86% Shell 0.03% Java 0.06%
malware-detection malware-analysis malware-research machine-learning androguard drebin android-malware-detection android-malware

drebin's Introduction

What does this repository contain?

This repo contains a python implementation of Arp, Daniel, et al. "DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket." NDSS. 2014.

What package/platform dependencies do I need to have to run the code?

The code is developed and tested using python 2.7 on Ubuntu 16.04 PC.
The following packages need to be installed to run the code:
1. sklearn (==0.18.1)
2. pebble
3. glob
4. joblib (==0.11)

How do I use it?

Just clone the repo and follow the following instructions:

1. Move to the "src" folder.

2. Run 'python Main.py --help' for the input arguments
Drebin can be run in 2 modes: (1) Random split classification, (2) Holdout classifiction. In random split mode, the apps in the given dataset are split into training and test sets and are used to train and evaluate the malware detection model, respectively. In the holdout classification mode, the apps for the training and test sets are separated from the start by default, or given by user.

The default value of the arguments of Drebin are:

--holdout       0 (split the dataset into training and test set and use the same for training and evaluating the model,respectively)
	    1 (the dataset for training and test set are separated from the input)
--maldir        '../data/small_proto_apks/malware' (malware samples used to train the model)
--gooddir       '../data/small_proto_apks/goodware' (goodware samples used to train the model)
--testmaldir    '../data/apks/malware' (malware samples used to test the model. ONLY APPLICABLE IF --holdout IS NOT 0(must be an integer).)
--testgooddir   '../data/apks/goodware' (goodware samples used to test the model. ONLY APPLICABLE IF --holdout IS NOT 0(must be an integer).)
--testsize      0.3 (30% of the samples will be used for testing and the remaining 70% will be used to train the model. ONLY APPLICABLE IF --holdout IS 0.)
--ncpucores     maximum number of CPU cores to be used for multiprocessing (only during the feature extraction phase)
--model         classifier model will be trained and saved as a .pk1 file(name of file is specified by the user)
--numfeatforexp 30(number of top features to be shown for each test sample)

3. Run 'python Main.py --holdout 0 --maldir <folder containing malware apks> --gooddir <folder containing goodware apks>' to build and test a Drebin malware detection model. By defatult, 70% and 30% of the samples will be used for training and testing the model, respectively.

4. Run 'python Main.py --holdout 1 --maldir <folder containing training set malware apks> --gooddir <folder containing training set goodware apks> --testmaldir <folder containing test set malware apks> --testgooddir <folder containing test set goodware apks>'.

Functionalities:

User need to specify which mode* of classification to be done from --holdout option;

Random split classification:

**--holdout 0(default)** allows you to do a random split classification for the given malware dataset and benign/goodware dataset.
The --maldir and --gooddir arguments should be the directories containing malware Apks and benign-ware Apks. The data files will be
generated automatically before the program does the random split classification.

Hold-out classification:

**--holdout 1** allows you to specify the testing set. You can do a hold-out classification for the given training set and test set.
Beside settling the training set arguments as --holdout 0, You need to specify the testing set arguments in the command line i.e --testmaldir
and --testgooddir. The txt files will be generated automatically before the program does the hold-out classification.

Who do I talk to?

In case of issues/difficulties in running the code, please contact me at [email protected]

You may also contact Arief Kresnadi Ignatius Kasim at [email protected] or Loo Jia Yi at [email protected]

drebin's People

Contributors

annamalai-nr avatar ariefkresnadi avatar duylp avatar jyl313 avatar shantanuj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

drebin's Issues

datasets

Are the datasets references in the README available for download somewhere?

Got problem when testing large-size dataset

The program runs well when I test a small size dataset. But when I try to test a large size dataset it just don't show any result. The process is finished and there is no error.
So I want to know is there any chance the program has a limition for data size?

Restricted API Calls collected in the code have different definition compared to the one in original Drebin paper?

The restricted API calls collected by 'GetPermissionsAndApis(ApiList, PMap, RequestedPermissionList)' function only include the API calls that are restricted by permission but have no corresponding permission requested in Manifest file. However, in the original paper, the restricted APIs in an app refer to all the API calls that are restricted by permissions in the decompiled code, the ones without corresponding permission requested in manifest are just special cases of them.

Error loading Model

When i have to create a model for the first time there are no problem.
After the creation, i use the tool with the --model option (also with malware and benign apk parameters), but after: SVMModels = load(Model)
Gives me an error in: BestModel= SVMModels.best_estimator

Ok, i modify it as: BestModel= SVMModels.best_estimator_ and it works great, but the next predict doesn't work: y_pred = SVMModels.predict(x_test)

Should i retrain the model after loading, before predict? If yes, why you didn't do it...
schermata 2018-03-02 alle 21 43 47

Building one-hot encoder problem

hello
when i read this paper "DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket."
In this paper, it has used one-hot encoder to their scheme,
so i want to know ,Do you have to use one-hot encoder in this GitHub project? can you show me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.