Light

zhaoyin / html2article Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stanzhai/html2article

0.0 2.0 0.0 598 KB

Html网页正文提取

License: Other

html2article's Introduction

Html2Article

.NET平台下，一个高效的从Html中提取正文的工具。
正文提取采用了基于文本密度的提取算法，支持从压缩的Html文档中提取正文，每个页面平均提取时间为10ms，正确率到95%以上。

让你的项目支持Html正文提取

将实例项目中的Html2Article.cs复制到你的项目中。
引入命名空间Html2Article。
添加如下代码：

// html为你要提取的html文本
string html = "<html>....</html>";
// article对象包含Title(标题)，PublishDate(发布日期)，和Content(正文)三个属性
Article article = Html2Article.GetArticle(html);

Html2Article类

Html2Article类是提取正文的核心类
Html2Article配置说明
AppendMode：是否使用正文追加模式，默认为false，设置为true会将更多符合条件的文本添加到正文。
Depth：分析的深度，默认为5，对于行空隙较大的页面可增加此值。
LimitCount：字符限定数，当分析的文本数量达到限定数则认为进入正文内容，默认为180个字符。
GetArticle(string html)：从Html文本中获取Article。

License

html2article's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.