machine-learning-based-malware-detection's Introduction

Machine-Learning-based-Malware-detection

This project was my solution for SPIT hackathon, I attended, during my final year engineering.

A solution to detect malware using Machine learning, on features extracted from windows PE files.

Problem statement uploaded.

Focus has to be on Data pre-processing and normalizing, especially text processing in columns, “ImportedDlls, ImportedSymbols”. My approach to process them is to extract filenames, words by appropriate delimeter. Then, apply 1-hot encoder that will create filenames as columns (1414 columns). And then apply dimension reduction techniques to reduce the columns. And finally train the classifier on it.

I am still learning and may be wrong. Any other suggestion/approach welcome!

No processing was done on numeric columns. Any suggestions? Normalizing?

Feature selection performed using SelectKbest.

Classifier used: Decision Tree, Random forest, XGBoost, GradientBoost, etc.

Much more Data Processing can be done to improve accuracy of the model. Many more techniques can be applied. I am still working on it. Suggestions welcome!

Recommendation engine, as mentioned in problem statement is still remaining. Will try confidence scores for classification.

Working on EDA and text processing on those 2 columns.

Recommend Projects

sid3345 / machine-learning-based-malware-detection Goto Github PK