Coder Social home page Coder Social logo

csvl / sema-toolchain Goto Github PK

View Code? Open in Web Editor NEW
47.0 3.0 14.0 1.45 GB

ToolChain using Symbolic Execution for Malware Analysis.

Home Page: https://csvl.github.io/SEMA-ToolChain/

License: BSD 2-Clause "Simplified" License

Python 91.40% Shell 0.09% C 0.39% Jupyter Notebook 1.69% Makefile 0.02% CSS 2.57% DTrace 0.02% YARA 2.96% Dockerfile 0.01% SCSS 0.65% Jinja 0.01% Mako 0.01% Handlebars 0.03% CMake 0.01% C++ 0.15%
malware-analysis malware symbolic classification symbolic-execution detection angr ctf ctf-tool concolic-execution linux sema static-analysis windows

sema-toolchain's Introduction

☠️ SEMA ☠️

ToolChain using Symbolic Execution for Malware Analysis.

  ██████ ▓█████  ███▄ ▄███▓ ▄▄▄
▒██    ▒ ▓█   ▀ ▓██▒▀█▀ ██▒▒████▄
░ ▓██▄   ▒███   ▓██    ▓██░▒██  ▀█▄
  ▒   ██▒▒▓█  ▄ ▒██    ▒██ ░██▄▄▄▄██
▒██████▒▒░▒████▒▒██▒   ░██▒ ▓█   ▓██▒
▒ ▒▓▒ ▒ ░░░ ▒░ ░░ ▒░   ░  ░ ▒▒   ▓▒█░
░ ░▒  ░ ░ ░ ░  ░░  ░      ░  ▒   ▒▒ ░
░  ░  ░     ░   ░      ░     ░   ▒
      ░     ░  ░       ░         ░  ░

Documentation Built by gendocs .github/workflows/.pre-commit-config.yaml CodeQL Documentation Generation pages-build-deployment Python application

Python Docker Debian

Toolchain architecture

Our toolchain is represented in the following figure and works as follows:

  • A collection of labelled binaries from different malware families is collected and used as the input of the toolchain.
  • Angr, a framework for symbolic execution, is used to execute binaries symbolically and extract execution traces. For this purpose, different heuristics have been developed to optimize symbolic execution.
  • Several execution traces (i.e., API calls used and their arguments) corresponding to one binary are extracted with Angr and gathered together using several graph heuristics to construct a SCDG.
  • These resulting SCDGs are then used as input to graph mining to extract common graphs between SCDGs of the same family and create a signature.
  • Finally, when a new sample has to be classified, its SCDG is built and compared with SCDGs of known families using a simple similarity metric.

Toolchain Illustration

This repository contains a first version of a SCDG extractor. During the symbolic analysis of a binary, all system calls and their arguments found are recorded. After some stop conditions for symbolic analysis, a graph is built as follows: Nodes are system calls recorded, edges show that some arguments are shared between calls.

When a new sample has to be evaluated, its SCDG is first built as described previously. Then, gspan is applied to extract the biggest common subgraph and a similarity score is evaluated to decide if the graph is considered as part of the family or not. The similarity score S between graph G' and G'' is computed as follows: Since G'' is a subgraph of G', this is calculating how much G' appears in G''. Another classifier we use is the Support Vector Machine (SVM) with INRIA graph kernel or the Weisfeiler-Lehman extension graph kernel.

A web application is available and is called SemaWebApp. It allows to manage the launch of experiments on SemaSCDG and/or SemaClassifier.

Pre-commit

This repository uses pre-commit to ensure that the code is formatted correctly and that the code is clean. To install pre-commit, run the following command:

python3 -m pip install pre-commit
pre-commit install

Documentation

  • Complete README of the entire toolchain : Sema README

  • SCDG README : SCDG README

  • Classifier README : Classifier README

  • Web app README : Web app README

  • A Makefile is provided to ease the usage of the toolchain, run make help for more information about the available commands

Credentials

Main authors of the projects:

  • Charles-Henry Bertrand Van Ouytsel (UCLouvain)

  • Christophe Crochet (UCLouvain)

  • Khanh Huu The Dam (UCLouvain)

  • Oreins Manon (UCLouvain)

Under the supervision and with the support of Fabrizio Biondi (Avast)

Under the supervision and with the support of our professor Axel Legay (UCLouvain) (:heart:)

Linked papers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.