artisingh0913 / predicting-toxicity-over-diverse-online-conversation Goto Github PK
View Code? Open in Web Editor NEWThis project aims to implement a model to detect toxicity in an online conversation. The model solves some of the significant challenges related to the field. We implemented the model in three phases: preprocessing of data, creation of feature vectors like TFIDF, Word2Vec and Doc2Vec, algorithms and evaluation metrics. Further, we optimized our output by working and experimenting with various features and models like SVM, Logistic Regression, Naive Bayes and Neural Network . We created a baseline model using SVM for our binary classification task to use it as a standard to compare our future models implementation. By making a comparison over these predictive models for the macro-average precision, recall and F1 score, we achieved a higher F1 score for CNN model over bigram computation. Our word-level CNN model out-performed all the other models including the Logistic Regression using TFIDF, Gensim Doc2Vec and other Neural network models.