This repository is for 545 final project "Multi-Label Learning for Text Categorization".
All three datasets are available in the 'Datasets' folder.
'data_simulation.R' is the file that generates the simulated data. 'toxic_preprocessing.py' is the file that helps do feature engineering. All algorithms files, except for 'mlknn', whose name ends up with '_svm' is using base classifier SVM. The rest algorithms are using base classifier Logistic Regression. And 'mlknn' is by its own using base classifier modified knn.
All algorithms will be called in the main2.py file, and the data is imported into this file as well.