Comments (3)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 431, in _readable
if self.candidates:
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 419, in candidates
doc = self.doc
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 409, in doc
doc = self.orig.html
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/document.py", line 93, in html
return self._parse(self.orig_html)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/document.py", line 80, in _parse
doc = build_doc(html)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/document.py", line 57, in build_doc
parser=utf8_parser)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/lxml/html/init.py", line 532, in document_fromstring
value = etree.fromstring(html, parser, **kw)
File "lxml.etree.pyx", line 2743, in lxml.etree.fromstring (src/lxml/lxml.etree.c:52665)
File "parser.pxi", line 1573, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:79932)
File "parser.pxi", line 1452, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:78774)
File "parser.pxi", line 960, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:75389)
File "parser.pxi", line 564, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:71739)
File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
File "parser.pxi", line 594, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:72087)
XMLSyntaxError: line 563: Tag figure invalid
from breadability.
[D 120826 11:32:46 existing:67] Q0 getting content for 4c3edf3a8229cd http://www.dafont.com/
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in *bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(_self.__args, _self.__kwargs)
File "scripts/readability/existing.py", line 68, in fetch_content
read = ReadUrl.parse(url)
File "/home/bmark.us/0.5/bookie/lib/readable.py", line 176, in parse
if not document.readable:
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in __get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 426, in readable
return tounicode(self._readable)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 431, in _readable
if self.candidates:
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 419, in candidates
doc = self.doc
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 409, in doc
doc = self.orig.html
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/document.py", line 95, in html
return self._parse(self.orig_html)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/document.py", line 82, in _parse
doc = build_doc(html)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/document.py", line 59, in build_doc
parser=utf8_parser)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/lxml/html/init.py", line 534, in document_fromstring
value = etree.fromstring(html, parser, **kw)
File "lxml.etree.pyx", line 2743, in lxml.etree.fromstring (src/lxml/lxml.etree.c:52665)
File "parser.pxi", line 1573, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:79932)
File "parser.pxi", line 1452, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:78774)
File "parser.pxi", line 960, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:75389)
File "parser.pxi", line 564, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:71739)
File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
File "parser.pxi", line 596, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:72123)
XMLSyntaxError: None
from breadability.
[D 120827 11:13:41 existing:67] Q0 getting content for 4c3edf3a8229cd http://www.dafont.com/
[D 120827 11:13:41 existing:67] Q1 getting content for 9feafedb1e468b http://www.redbullmusicacademy.com/
[D 120827 11:13:41 existing:67] Q2 getting content for 1ca8c1b6cb8e08 http://cameratoss.blogspot.com/
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in *bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(_self.__args, _self.__kwargs)
File "scripts/readability/existing.py", line 68, in fetch_content
read = ReadUrl.parse(url)
File "/home/bmark.us/0.5/bookie/lib/readable.py", line 176, in parse
if not document.readable:
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in __get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 426, in readable
return tounicode(self._readable)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 431, in _readable
if self.candidates:
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 419, in candidates
doc = self.doc
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/readable.py", line 409, in doc
doc = self.orig.html
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/document.py", line 95, in html
return self._parse(self.orig_html)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/document.py", line 82, in _parse
doc = build_doc(html)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/breadability/document.py", line 59, in build_doc
parser=utf8_parser)
File "/home/bmark.us/0.5/lib/python2.6/site-packages/lxml/html/init.py", line 534, in document_fromstring
value = etree.fromstring(html, parser, **kw)
File "lxml.etree.pyx", line 2743, in lxml.etree.fromstring (src/lxml/lxml.etree.c:52665)
File "parser.pxi", line 1573, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:79932)
File "parser.pxi", line 1452, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:78774)
File "parser.pxi", line 960, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:75389)
File "parser.pxi", line 564, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:71739)
File "parser.pxi", line 645, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:72614)
File "parser.pxi", line 594, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:72087)
XMLSyntaxError: line 1887: htmlParseEntityRef: expecting ';'
from breadability.
Related Issues (20)
- None type error in readable parsing
- lazy loaded images aren't pulled in HOT 7
- Missing argparse in requirements HOT 1
- issue parsing NYT article missing first paragraph
- Readable version is missing images HOT 1
- site should all content when none is found for readable parsed
- Wrong content picked HOT 3
- Using AdBlock rules to remove elements
- Single invalid character results in a failed parse. HOT 1
- move tests to pytest like bookie HOT 1
- Includes possible non-free content HOT 5
- Missing section headings
- Article fail to pick some content HOT 1
- Bookie tests failing; need to fix installation for Travis CI HOT 5
- Python 3.7 compatible release? HOT 6
- not all nodes are scored/removed in the prep_article phase. HOT 1
- utf 8 issue HOT 2
- error dropping node HOT 2
- TypeError: decode() argument 1 must be string, not Non HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from breadability.