This project contains the complete code(spam_classification.ipynb) along with detailed description for the classification of spam and non-spam text messages based on machine learning algorithm. The software was developed using Python and lingspam dataset. The following files are included in the project:
-
spam_dataset(dir): Contains the lingspam dataset used in order to perform classification.
-
pdf files: Contains the reference documentations used in order to perform different stages of the classification like feature extraction, feature selection, etc.
-
spam_classification.ipynb: Contains the main source code(along with documentation) for the classification algorithm.
-
xlsx files: Contains the features(like word-document frequency, etc.) calculated for each word of the dataset.