general-news-extractor
一个新闻网页正文通用抽取工具,还有标题、作者和发布日期。
该项目启发自kingname/GeneralNewsExtractor,由 Python 迁移到 Node.js ,并做了一些改动,提高提取准确度。
Online DEMO
https://general-news-extractor-demo.stayin.cn/
Installation
Using npm:
npm i general-news-extractor
Usage
const GeneralNewsExtractor = require('general-news-extractor')
const htmlString = `` // HTML for a news page
const gne = new GeneralNewsExtractor()
// gne.extract( html: string, { titleSelector = '', authorSelector = '', dateTimeSelector = '', noiseNodeList = [] } = {})
const result = gne.extract(htmlString, {})
TODO
- Run in browser
Thanks
License
MIT © zenghongtu