Coder Social home page Coder Social logo

tuian / python-readability Goto Github PK

View Code? Open in Web Editor NEW

This project forked from buriy/python-readability

0.0 1.0 0.0 706 KB

fast python port of arc90's readability tool, updated to match latest readability.js!

Home Page: https://github.com/buriy/python-readability

Makefile 0.33% Python 15.42% HTML 84.25%

python-readability's Introduction

image

python-readability

Given a html document, it pulls out the main body text and cleans it up.

This is a python port of a ruby port of arc90's readability project.

Installation

It's easy using pip, just run:

$ pip install readability-lxml

Usage

>> import requests
>> from readability import Document
>>
>> response = requests.get('http://example.com')
>> doc = Document(response.text)
>> doc.title()
>> 'Example Domain'

Change Log

  • 0.3 Added Document.encoding, positive_keywords and negative_keywords
  • 0.4 Added Videos loading and allowed more images per paragraph
  • 0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and 3.4
  • 0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3 and 3.4

Licensing

This code is under the Apache License 2.0 license.

Thanks to

  • Latest readability.js
  • Ruby port by starrhorne and iterationlabs
  • Python port by gfxmonk
  • Decruft effort to move to lxml
  • "BR to P" fix from readability.js which improves quality for smaller texts
  • Github users contributions.

python-readability's People

Contributors

buriy avatar mitechie avatar martinth avatar jcharum avatar decentral1se avatar timbertson avatar alphapapa avatar hush-hush avatar facundo avatar zacharydenton avatar seanbrant avatar nathanathan avatar horva avatar markperdomo avatar avalanchy avatar lsemel avatar psycojoker avatar evasdk avatar digitaldavenyc avatar andreypopp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.