Coder Social home page Coder Social logo

qujixiang / detect-malicious-npm-package-with-machine-learning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from willterner/detect-malicious-npm-package-with-machine-learning

1.0 1.0 0.0 21.47 MB

Python 46.33% TypeScript 49.91% Shell 0.63% JavaScript 3.12%

detect-malicious-npm-package-with-machine-learning's Introduction

detect-malicious-npm-package-with-machine-learning

Predicting npm packages with the model training by machine learning method.

environment

  • ubuntu 22.04
  • node.js 18.16.0
  • python 3.10.6

install

  1. Clone the repository.
  2. Install the project.
./install.sh

usage

For convenience, please use cli.py to extract features and predict.

# Get the help.
python3 cli.py -h

# Get the help of extracting features.
python3 cli.py extract -h

# Extract features from malicious dataset "npm-malicious-20230512".
python3 cli.py extract -d npm-malicious-20230512

# Get the help of predicting.
python3 cli.py predict -h

# Predict the malicious dataset "npm-malicious-20230512".
python3 cli.py predict -d npm-malicious-20230512 -o RF

Note:

npm-malicious-20230512 is a malicious dataset used in our experiments. There is no this dataset here. You should put your dataset here in the specified format which used by conf/settings.py.

Briefly, you should put malicious datasets in datasets/preprocessed-datasets/malicious directory. And other datasets you want to predict in datasets/preprocessed-datasets/unknown directory.

If you want to specify other directory as malicious datasets directory, you should change variable MALICIOUS_DATASETS_PATH in conf/settings.py to your malicious datasets directory. The same is true for benign and unknown datasets.

architecture

├── README.md
├── cli.py
├── conf
│   └── settings.py: settings about datasets and models, etc
├── datasets
│   └── preprocessed-datasets
│       ├── benign: default benign npm package datasets
│       ├── malicious: default malicious npm package datasets, you should put your malicious dataset here
│       │   └── npm-malicious-20230512: a malicious npm package dataset
│       └── unknown: default unknown npm package datasets, you should put your unknown dataset here
├── feature-extract: extract features from package
│   ├── README.md
│   ├── jest.config.ts
│   ├── log
│   ├── material
│   ├── node_modules
│   ├── dist
│   ├── package-lock.json
│   ├── package.json
│   ├── src
│   └── tsconfig.json
├── features: extracted features
│   ├── npm-malicious-20230512
├── models
│   ├── MLP.pkl
│   ├── MLP_scaler.pkl
│   ├── NB.pkl
│   ├── NB_scaler.pkl
│   ├── RF.pkl
│   ├── RF_scaler.pkl
│   ├── SVM.pkl
│   └── SVM_scaler.pkl
├── reports
│   ├── npm-malicious-20230512-RF-report.csv
└── training
    ├── __init__.py
    ├── requirements.txt
    ├── results
    └── src

documentation

feature-extract directory

This program can analyze if a package is a malicious package or not. The directory of feature value file is output_feature.

This program is used to extract feature values from npm package originally. It scans all the file in the package and use babel and regular expression to give a static analysis of package source code.

training directory

This project is used to traing classifier model and evaluate the performance of the model. At this time, MLP,RF, NB, Kernel SVM are used as classifier.

The train set is material/training_set. Malicious-dedupli subdirecotry contains malicous package feature vectors. Normal subdirectory contains benign package feature vectors.

The test set is material/test_set. Malicious-dedupli subdirectory contains malicous package feature vector. Normal subdirectory contains benign package feature vectors.

Note:

  • You can download benign npm package in npm registry.
  • The malicious package in train set is from ohm.
  • The malicious package in test set is from Duan.

detect-malicious-npm-package-with-machine-learning's People

Contributors

qujixiang avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.