Coder Social home page Coder Social logo

malware_detect2's Introduction

Malware Classification using classical Machine Learning and Deep Learning

This repository is the official implementation of the research mentioned in the chapter "An Empirical Analysis of Image-Based Learning Techniques for Malware Classification" of the Book "Malware Analysis Using Artificial Intelligence and Deep Learning"

The book or chapters can be purchased from: https://link.springer.com/chapter/10.1007/978-3-030-62582-5_16

The arXiv eprint is at: https://arxiv.org/abs/2103.13827

alt text

Abstract

In this chapter, we consider malware classification using deep learning techniques and image-based features. We employ a wide variety of deep learning techniques, including multilayer perceptrons (MLP), convolutional neural networks (CNN), long short-term memory (LSTM), and gated recurrent units (GRU). Among our CNN experiments, transfer learning plays a prominent role—specifically, we test the VGG-19 and ResNet152 models. As compared to previous work, the results presented in this chapter are based on a larger and more diverse malware dataset, we consider a wider array of features, and we experiment with a much greater variety of learning techniques. Consequently, our results are the most comprehensive and complete that have yet been published.

Quick Notes:

  • Classic ML-based approaches tried : K-NN, Random Forest, and XGBoost
  • Deep Learning-based approaches tried: ANN, CNN, LSTM, and GRU
  • Implementation is using sklearn, numpy, pandas and pytorch.
  • MS Windows executable binary files are used as data.
  • Features   * Classic ML-based approaches: PE fie features are extracted and used   * Deep Learning-based approaches: (1) Opcodes (2) Converted executables into gray-scale images
  • This project is an extension of https://github.com/pratikpv/malware_classification

Steps to repro

Packages requirements

  • Install pefile pythong package e.g. conda install pefile
  • Install PyTorch and other libs e.g. conda install -c pytorch torchtext. All other common dependencies should be covered by anaconda distro.
  • objdump in ubuntu. (This code is developed and tested for ubuntu-based development env)

Malware samples

 * copy the malware samples at <project_dir>/data/exec_files/exec_files. You can reach out to me for samples used in this research. Overall directory structure should look like this,

├── config.py
├── data
│             ├── exec_files
│             │             └── exec_files
│             │                 ├── adload
│             │                 ├── agent
│             │                 ├── alureon
│             │                 ├── bho
│             │                 ├── ceeinject
│             │                 ├── cycbot
│             │                 ├── delfinject
│             │                 └── fakerean
├── data_preprocess.py
├── data_utils
.
.

Data preprocessing

Execute data_preprocess.py with below mentioned options to preprocess the data.

python data_preprocess.py --extract_pe_features

python data_preprocess.py --bin_to_img

python data_preprocess.py --extract_opcodes

python data_preprocess.py --split_opcodes

Train and test models

Execute detect_malware.py with appropriate command-line args for models to train/test. e.g.

python detect_malware.py --deep_feedforward

python detect_malware.py --deep_rnn

python detect_malware.py --shallow_ml

python detect_malware.py --transfer_conv_ml

If you like our work and is useful for your research please cite this chapter/paper as:

Prajapati P., Stamp M. (2021) An Empirical Analysis of Image-Based Learning Techniques for Malware Classification. In: Stamp M., Alazab M., Shalaginov A. (eds) Malware Analysis Using Artificial Intelligence and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-62582-5_16

or

@Inbook{
    Prajapati2021,
    author={Prajapati, Pratikkumar and Stamp, Mark},
    editor={Stamp, Mark and Alazab, Mamoun  and Shalaginov, Andrii},
    title={An Empirical Analysis of Image-Based Learning Techniques for Malware Classification},
    bookTitle={Malware Analysis Using Artificial Intelligence and Deep Learning},
    year={2021},
    publisher={Springer International Publishing},
    address={Cham},
    pages={411-435},
    abstract={In this chapter, we consider malware classification using deep learning techniques and image-based features. We employ a wide variety of deep learning techniques, including multilayer perceptrons (MLP), convolutional neural networks (CNN), long short-term memory (LSTM), and gated recurrent units (GRU). Among our CNN experiments, transfer learning plays a prominent role---specifically, we test the VGG-19 and ResNet152 models. As compared to previous work, the results presented in this chapter are based on a larger and more diverse malware dataset, we consider a wider array of features, and we experiment with a much greater variety of learning techniques. Consequently, our results are the most comprehensive and complete that have yet been published.},
    isbn={978-3-030-62582-5},
    doi={10.1007/978-3-030-62582-5_16},
    url={https://doi.org/10.1007/978-3-030-62582-5_16}
}

malware_detect2's People

Contributors

pratikpv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.