Implemented in simple_text_recognition.py
.
Simple K-mean algorithm used to cluster texts data set.
This algorithm could be used to predict classification in new data.
Implemented in stemming.py
.
Sklearn Vectorizer class has not steeming implemented.
So we can use nltk modules to add this functionality and also create our special tokenizer adding our own rules.