Coder Social home page Coder Social logo

samujjwaal / spam-email-classifier Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 1.0 529 KB

Machine Learning Model to classify if emails are spam or non-spam, and identify the specific words which contribute more in classifying an email.

Jupyter Notebook 100.00%
spam-email-classifier spam-email-recognition classification-algorithms clustering-algorithm data-science jupyter-notebooks python uci-machine-learning pandas numpy scikit-learn sklearn statistics naive-bayes-classification svm kmeans-clustering matplotlib-pyplot

spam-email-classifier's Introduction

Spam Email Classifier

This project was done as final course project for CS418: Introduction to Data Science course at the University of Illinois at Chicago during the Fall 2019 term along with teammates Yushenli1996 and nathanhe789.


We wanted to be able to classify if emails are spam or not spam by training on the Spambase Dataset from UCI’s Machine Learning repository.

In addition to solving the above problem, we wanted to be able to find out which specific words inside the email contribute more to finding out if emails are spam-related or not.

We used classification algorithms to identify if emails in the given dataset are spam or not, and clustering algorithms for text analysis on the dataset of emails.

Check out the Jupyter Notebook or the project report to see the data science flow implemented.


Project Background

The main focus of this project revolves around the constant issue of modern spam emails and the countermeasure study of spam mail identification. Fraudulent companies, scammers, and robocalls often find a way to get a hold of email addresses somehow and exploit that. As a result, they flood everyone’s email inbox with a myriad of irrelevant information. To counter this exploitation, large corporations have implemented their own email filtering system to detect and identify suspicious emails, whether it actually does contain a harmful computer virus or spam email, and separate those emails from actual useful emails for individuals. Such big public companies like Google have their own Gmail system with built-in spam email recognition and filtering to reduce the chances of people falling for suspicious spam emails. This project dives deeper into the internal design system of spam email recognition and filtering system. To do so, a dataset provided by the University of California at Irvine and Hewlett-Packard is used to examine the filtering classification algorithm. The dataset is used for Hewlett-Packard Internal-only Technical Report; therefore, it only contains sensitive keywords that Hewlett-Packard requires.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.