elliott5 / textextract Goto Github PK
View Code? Open in Web Editor NEWThis project forked from emiruz/textextract
textextract is a tiny library that identifies where the article content is in an HTML page (as opposed to navigation, headers, footers, ads, etc), extracts it and returns it as a string. It optionally adds missing full stops. Like Boilerpipe but for Go in Go.