A machine learning project using Weka and Python Jupyter Notebooks
The project investigates whether website news stories can be classified into one of ten subject categories, based on the known category of stories that have already been classified.
Stages:
- extract the data from the website as html
- clean the data, to get just a list of words
- prepare the data for the machine learning tool
- train and test the classification, changing a few variables eg
- language processing (stop words)
- ML model (ZeroR, OneR, Naive Bayes, J48 Tree)
See Text classification - news stories.pptx for a presentation describing the project.