This function can split the entire text of Huckleberry Finn into sentences in about 0.1 seconds and handles many of the more painful edge cases that make sentence parsing non-trivial. The code is not mine and has been taken from https://stackoverflow.com/a/31505798/8233015 published by D Greenberg https://stackoverflow.com/users/5133085/d-greenberg Thanks man :)
acerock6 / efficient-sentence-tokenizer Goto Github PK
View Code? Open in Web Editor NEWA really efficient Sentence Tokenizer working better than NLTK or spaCy