wummel / linkchecker Goto Github PK
View Code? Open in Web Editor NEWcheck links in web documents or full websites
Home Page: http://wummel.github.io/linkchecker/
License: GNU General Public License v2.0
check links in web documents or full websites
Home Page: http://wummel.github.io/linkchecker/
License: GNU General Public License v2.0
Converted from SourceForge issue 204875, submitted by calvin
it does not parse arbitrary chars in the ?subject= line
Converted from SourceForge issue 635596, submitted by int2000
******** LinkChecker internal error, bailing out ********
self.urlConnection = ftplib.FTP(self.urlTuple[1], _user,
_password)
File "/usr/lib/python2.2/ftplib.py", line 108, in init
self.connect(host)
File "/usr/lib/python2.2/ftplib.py", line 133, in connect
self.welcome = self.getresp()
File "/usr/lib/python2.2/ftplib.py", line 216, in getresp
if c not in '123':
TypeError: 'in <string>' requires character as left operand
System info:
LinkChecker 1.6.6
Python 2.2.2 (#4, Oct 15 2002, 04:21:28)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
******** LinkChecker internal error, bailing out ********
Converted from SourceForge issue 676017, submitted by nobody
"python setup.py install" fails with following compilation
error:
"C:\Program Files\Microsoft Visual Studio\VC98
\BIN\cl.exe" /c /nologo /Ox /MD /W3 /GX -
DYY_NO_UNISTD_H -Ilinkcheck/parser -IC:\Python22
\include /Tclinkcheck/parser/htmllex.c /Fobuild\temp.win
32-2.2\Release\htmllex.obj
I manually compiled the file adding
-DYY_NO_UNISTD_H
to list of compiler flags and all seemed well (apart from
a couple of warnings).
This is on WinXP with Python 2.2.1
Robert
[email protected]
Converted from SourceForge issue 573605, submitted by nobody
exceptions.NameError global name 'StringUtil' is not
defined
Traceback (most recent call last):
File
"/home/intranet/tools/lib/python2.2/site-packages/linkcheck/UrlData.py",
self._check() File
"/home/intranet/tools/lib/python2.2/site-packages/linkc
self.logMe() File
"/home/intranet/tools/lib/python2.2/site-packages/linkch
self.config.log_newUrl(self)
File
"/home/intranet/tools/lib/python2.2/site-packages/linkcheck/Config.py",
l
File
"/home/intranet/tools/lib/python2.2/site-packages/linkcheck/log/ColoredLo
self.fd.write("| "+linkcheck._("Info")+Spaces["info"]+
NameError: global name 'StringUtil' is not defined
Systeminformation:
LinkChecker 1.5.4
Python 2.2.1 (#1, Jun 4 2002, 09:57:34)
[GCC 2.95.2 19991024 (release)] on linux2
******** LinkChecker interner Fehler, breche ab ********
Die geprüfte Seite ist eine php generierte Seite im
Firmeninternen Netz, die von ausserhalb nicht
erreichbar ist.
Converted from SourceForge issue 741131, submitted by nobody
hi,
take a deep breath, and bear with me...
if i have the following (valid and weblinted) html:
---8<---
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>named anchor id test</title>
</head>
<body>
<p><a href="#indexa">A</a></p>
<ol>
<li id="indexa">one</li>
<li>two</li>
<li>three</li>
</ol>
</body>
</html>
--->8---
linkchecker 1.8.15 complains: warning anchor #indexa
not found
(it can't find the anchor named "indexa".)
the html4 spec has this[1] to say about anchor names
(<a name="">):
---8<---
This attribute names the current anchor so that it may
be the
destination of another link. The value of this
attribute must be a
unique anchor name. The scope of this name is the
current document.
*** Note that this attribute shares the same name space
as the id
attribute. ***
--->8---
(emphasis mine.)
so i think the above should not cause a warning.
hope that helps,
Converted from SourceForge issue 204050, submitted by nobody
At line 254 in the linkchecker file it says
congif["authentication"](and so on)
this should obviously be:
config["authentication"](and so on)
(This is Version 1.1.4).
Converted from SourceForge issue 203509, submitted by wavy
I did:
./linkchecker -o colored -r 99 -i ".whitbread.co" www.corporate.whitbread.co.uk
and linkchecker trolled through the site, but after a while I got loads of :
Exception in thread Thread-89076:
Traceback (innermost last):
File "/var/tmp/python-root/usr/lib/python1.5/threading.py", line 376, in __bootstrap
self.run()
File "/var/tmp/python-root/usr/lib/python1.5/threading.py", line 364, in run
apply(self.__target, self.__args, self.__kwargs)
File "./linkcheck/UrlData.py", line 107, in check
except LinkCheckerException, msg:
NameError: LinkCheckerException
for different threads, finally finishing with the message:
Fatal Python error: PyThreadState_Delete: invalid tstate
Aborted
I then tried to capture the entire output to a file:
./linkchecker -o colored -r 99 -i ".whitbread.co" www.corporate.whitbread.co.uk > lc.log 2>&1
and linkchecker trolled through the site, but after a while I got:
Segmentation fault
I guess this is not your doing though :) - there where no errors in the log
i have:
Python 1.5.2 (#1, Sep 17 1999, 20:15:36) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux-i386
Converted from SourceForge issue 674391, submitted by mkalastro
Hi Bastian,
Thanks for the great product! I've used previous
versions and this is the first error I've seen. You
can reproduce the error by using lconline from:
<http://www2.soe.ucsc.edu/linkchecker/>, which uses the
default configuration.
exceptions.AttributeError addinfourl instance has no
attribute 'readlines'
Traceback (most recent call last):
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 202, in check
self._check()
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/HttpsUrlData.py",
line 36, in _check
HttpUrlData._check(self)
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 258, in _check
self.checkConnection()
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/HttpUrlData.py",
line 91, in checkConnection
if self.config["robotstxt"] and not
self.robotsTxtAllowsUrl():
File
"/home/local/python-2.2.2/lib/python2.2/site-packages/linkcheck/HttpUrlData.py",
line 289, in robotsTxtAllowsUrl
rp.read()
File "/var/tmp/python/lib/python2.2/robotparser.py",
line 44, in read
lines = f.readlines()
AttributeError: addinfourl instance has no attribute
'readlines'
System info:
LinkChecker 1.8.5
Python 2.2.2 (#1, Jan 23 2003, 14:28:02)
[GCC 2.95.2 19991024 (release)] on sunos5
LC_MESSAGES = 'C'
Converted from SourceForge issue 784977, submitted by nobody
A simple robots.txt file like
User-agent: *
Disallow:
is not understod by linkchecker. I get an error:
Warning Access denied by robots.txt, checked only syntax
and linkchecker aborts checking. For information about robots.txt
syntax, see http://www.robotstxt.org/wc/exclusion-admin.html
Converted from SourceForge issue 776851, submitted by htrd
I am seeing a problem when checking a site that contains:
linkchecker fetches both urls continuously until its recursion
limit is reached.
debug output attached. I will keep these pages available
for a while, but please dont hammer that server harder
than necessary.
Converted from SourceForge issue 612030, submitted by nobody
Hi
Maybe this is not a bug, but anyway. I downloaded
LinkChecker 1.6.2. I followed the instructions on howto
install on a LINUX system. The command "python
setup.py install" bailed out with a not of missing
module "distutils.core". I then then inspected the python
module and found that its need several modules named
distutils.*. Then I searched the system for any module
with a leading name of "distutils". Nothing was found.
Is i a missing modules in the distribution?
Best regards
Bent Vangli
Converted from SourceForge issue 784372, submitted by sbrauer
There are some urls on a site I'm checking that
redirect to some https urls. When linkchecker tries to
follow these redirections, it tries to connect to port
80 instead of 443.
The error I get is:
Error: Attempted connect to ('152.2.46.28', 80)
timed out.
Notice how the port in the error message is 80.
To duplicate this problem, you could create a simple
cgi script like this:
echo "Location: https://www-s3.ais.unc.edu/campus_dir/"
echo
and link to it from a static html page. Then run
linkchecker on the html page.
The strange thing is that some other links that
redirect to https urls (on other hosts) don't exhibit
this problem.
Converted from SourceForge issue 728315, submitted by doggy8088
http://www.nestle-baby.com.tw/member/register.htm
linkchecker -osql -r20 --no-anchor-caching --intern='!
^mailto:' -C http://www.nestle-
baby.com.tw/member/register.htm | mysql -u root test
exceptions.TypeError can only concatenate tuple
(not "list") to tuple
Traceback (most recent call last):
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 202, in check
self._check()
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 291, in _check
self.putInCache()
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 304, in putInCache
cacheKey = self.getCacheKey()
File "/usr/lib/python2.2/site-
packages/linkcheck/UrlData.py", line 315, in
getCacheKey
return urlparse.urlunsplit(self.urlparts[:4]+[''])
TypeError: can only concatenate tuple (not "list") to tuple
System info:
LinkChecker 1.8.11
Python 2.2.2 (#1, Feb 24 2003, 19:13:11)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-4)] on linux2
Converted from SourceForge issue 783662, submitted by hallcp
I get this error when trying to run lc.cgi:
./lc.cgi
Traceback (innermost last):
File "./lc.cgi", line 28, in ?
import linkcheck.lc_cgi
File "./linkcheck/init.py", line 45, in ?
import UrlData
File "./linkcheck/UrlData.py", line 35
print >>sys.stderr, i18n._("""\n********** Oops, I
did it again. *************
^
SyntaxError: invalid syntax
I don't know Python, so this may be something obvious,
but I don't see it. I'm using Python 2.2.1.
linkchecker itself seems to run fine from the command line.
Thanks for any help,
Charles Hall
Raleigh, NC
USA
Converted from SourceForge issue 728709, submitted by doggy8088
Initial Comment:
http://www2.nestle-baby.com.tw
linkchecker -C --no-anchor-caching -r5 -ohtml
http://www2.nestle-baby.com.tw
URL http://www.nestle-baby.com.tw/about/about.htm#0
Parent URL http://www2.nestle-
baby.com.tw/mapabout/aboutus.asp, line 238, col 37
Real URL http://www.nestle-
baby.com.tw/about/about.htm#0
Check Time 0.420 seconds
Result Error: 404 Not Found
I think the "internal page anchor" ( such as #top or
not existed. Because the browser will jump to the top of
the page if the named anchor not exist by default.
Converted from SourceForge issue 864383, submitted by astern
hi,
C:\Python23\Scripts>c:\Python23\python.exe -O
c:\Python23\Scripts\linkchecker
Traceback (most recent call last):
File "c:\Python23\Scripts\linkchecker", line 34, in ?
import getopt, re, os, pprint, socket, linkcheck
File
"c:\Python23\lib\site-packages\linkcheck__init__.py",
line 46, in ?
import UrlData
File
"c:\Python23\lib\site-packages\linkcheck\UrlData.py",
line 27, in ?
DNS.DiscoverNameServers()
File
"c:\Python23\lib\site-packages\linkcheck\DNS\Base.py",
line 35, in DiscoverNameServers
init_dns_resolver_nt()
File
"c:\Python23\lib\site-packages\linkcheck\DNS\Base.py",
line 95, in init_dns_resolver_nt
count, counttype = subkey['DNSServerAddressCount']
TypeError: unpack non-sequence
C:\Python23\Scripts>
regards A.
Converted from SourceForge issue 765016, submitted by saadiq
linkchecker -e '^?' -i zipsell.com -o html -r 4 -W 500 http://
www.zipsell.com/ > public_html/index.html
********** Oops, I did it again. *************
exceptions.AttributeError SMTP instance has no attribute
'read'
Traceback (most recent call last):
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 203, in check
self._check()
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 271, in _check
try: self.checkContent(warningregex)
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 398, in checkContent
match = warningregex.search(self.getContent())
File "/usr/lib/python2.2/site-packages/linkcheck/
UrlData.py", line 389, in getContent
self.data = self.urlConnection.read()
AttributeError: SMTP instance has no attribute 'read'
System info:
LinkChecker 1.8.18
Python 2.2.3 (#1, Jun 4 2003, 02:54:59)
[GCC 3.3 (Debian)] on linux2
******** LinkChecker internal error, bailing out ********
Converted from SourceForge issue 479829, submitted by dvarner
I downloaded linkchecker then ran:
python setup.py config
python setup.py build
python setup.pu install
I used:
python linkchecker
on my NT box ...
Traceback (most recent call last):
File "linkchecker", line 24, in ?
from linkcheck import timeoutsocket
File "linkcheck__init__.py", line 48, in ?
import Config,UrlData,sys,lc_cgi
File "linkcheck\UrlData.py", line 472, in ?
from MailtoUrlData import MailtoUrlData
File "linkcheck\MailtoUrlData.py", line 32, in ?
DNS.init_dns_resolver()
File "DNS\Base.py", line 31, in init_dns_resolver
init_dns_resolver_nt()
File "DNS\Base.py", line 76, in init_dns_resolver_nt
defaults['nameserver'].append(server)
KeyError: nameserver
I changed line 76 in DNS\Base.py to ...
(4 tabs)defaults['server'].append(server)
I think this should be patched.
Re-ran it and I got:
C:\linkchecker-1.3.6>linkchecker
Traceback (most recent call last):
File "linkchecker", line 24, in ?
from linkcheck import timeoutsocket
File "linkcheck__init__.py", line 48, in ?
import Config,UrlData,sys,lc_cgi
File "linkcheck\UrlData.py", line 472, in ?
from MailtoUrlData import MailtoUrlData
File "linkcheck\MailtoUrlData.py", line 32, in ?
DNS.init_dns_resolver()
File "DNS\Base.py", line 31, in init_dns_resolver
init_dns_resolver_nt()
File "DNS\Base.py", line 84, in init_dns_resolver_nt
key = winreg.handle_key(winreg.HKEY_LOCAL_MACHINE,
AttributeError: 'DNS.winreg' module has no
attribute 'handle_key'
In DNS\winreg.py, it looks like this should be
key_handle? Is this right?
I replaced all occurrences of handle_key with
key_handle in DNS\winreg.py.
handle_key is found on lines 65, 69, 84 and 96 of
DNS\winreg.py
The program will execute once these are changed. Can
we patch this? DNS/Base.py distributed with
linkchecker appears different than the latest version
in CVS for PyDNS at SourceForge.
Thanks,
Drew
Converted from SourceForge issue 768661, submitted by nobody
The 'intern' / 'extern' matching appears to operate on
the 'URL' rather than 'Real URL' (after base has been
applied). This makes it impossible to properly check a
site that makes use of BASE HREF without also spidering
the rest of the internet.
Converted from SourceForge issue 663804, submitted by nobody
The URL or file you are testing
http://pcsiwa12.rett.polimi.it/~phdweb/eng/index.htm
Your commandline arguments and/or configuration.
linkchecker -a -r 3
http://pcsiwa12.rett.polimi.it/~phdweb/eng/index.htm
httplib.BadStatusLine Exception in thread Thread-421:
Traceback (most recent call last):
File "/usr/lib/python2.2/threading.py", line 408, in
__bootstrap
self.run()
File "/usr/lib/python2.2/threading.py", line 396, in run
apply(self.__target, self.__args, self.__kwargs)
File
"/usr/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 202, in check
internal_error()
File
"/usr/lib/python2.2/site-packages/linkcheck/UrlData.py",
line 49, in internal_error
print >> sys.stderr, type, value
AttributeError: BadStatusLine instance has no attribute
'args'
Converted from SourceForge issue 729007, submitted by doggy8088
http://www2.nestle-baby.com.tw
linkchecker -C --no-anchor-caching -r5 -ohtml
http://www2.nestle-baby.com.tw
URL /member/member5.asp
Parent URL http://www2.nestle-baby.com.tw/, line 113,
col 11
Real URL http://www2.nestle-
baby.com.tw/member/member5.asp
Check Time 7.652 seconds
Result Error: (-2, 'Name or service not known')
The URL "http://www2.nestle-
baby.com.tw/member/member5.asp" is a form's action
url. It's a valid link, but what's the "'Name or service not
known'"'s mean? I don't understand?! Is it a bug?
Converted from SourceForge issue 204205, submitted by stinnux
When i try to kill a running linkchecker (by using Ctrl-C for example) it doesn't disappear correctly.
It says "Stopped checking..." but stands there for a couple of seconds. It eventually really stops then.
This seems to be related to the number of threads that are running (the more threads the longer you wait).
Converted from SourceForge issue 622618, submitted by calvin
An FTP proxy setting will be ignored.
Converted from SourceForge issue 654249, submitted by nobody
~$ linkchecker -r 2 -i ailab -qF html
http://www.ifi.unizh.ch/ailab/
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please write a
bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to and include the following
information:
If you disclose some information because its too private to you
thats ok.
I will try to help you nontheless (but you have to give me
something
I can work with ;).
exceptions.TypeError putrequest() got an unexpected keyword
argument
'skip_host'
Traceback (most recent call last):
File "/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 260, in
check
self._check()
File "/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 317, in
_check
self.checkConnection()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py", line
95,
in checkConnection
response = self._getHttpResponse()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py", line
220,
in _getHttpResponse
self.urlConnection.putrequest(method, path, skip_host=1)
TypeError: putrequest() got an unexpected keyword argument
'skip_host'
System info:
LinkChecker 1.6.6
Python 2.1.3 (#1, Jul 29 2002, 22:34:51)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
******** LinkChecker internal error, bailing out ********
----- The following addresses had permanent fatal errors -----
<[email protected]>
(reason: 550-Envelope sender verification failed)
----- Transcript of session follows -----
... while talking to mail.sourceforge.net.:
>>> DATA
<<< 550-Envelope sender verification failed
<<< 550 rejected: there is no valid sender in any header line
(envelope sender is <[email protected]>). Your mail
server returned: response from externalmx.valinux.com
[198.186.202.147] was 550 authentication required.
554 5.0.0 Service unavailable
Converted from SourceForge issue 205658, submitted by mschmitz
Some of the HTML files I checked with linkchecker contain comments
like the following
<!--
foo
bar
-->
Linkchecker does not handle this type of comments correctly since it
seems to assume that all comments are just single line comments. In
the example above all messages concerning lines following the block comment contain line numbers that differ by 3 from the correct line
number.
Converted from SourceForge issue 749543, submitted by finnertyp
If a web page contains a link to a document or web page that contains an ampersand in its name, LinkChecker reports it as a broken link.
Converted from SourceForge issue 769328, submitted by markjugg
It fails like this:
building 'linkcheck.parser.htmlsax' extension
cc -DNDEBUG -O -pipe -D_THREAD_SAFE -fPIC -Ilinkcheck/
parser -I/usr/local/include/python2.2 -c linkcheck/parser/
htmllex.c -o build/temp.freebsd-4.8-RELEASE-i386-2.2/
htmllex.o
htmllex.c:36: stdint.h: No such file or directory
error: command 'cc' failed with exit status 1
Converted from SourceForge issue 212504, submitted by jrmitche
It appears linkchecker doesn't understand the special character syntax that browsers do. It shows documents that have amprsands in their names as bad links after translating them to &
Converted from SourceForge issue 863227, submitted by jimwright
I get this error:
subkey DhcpDomain not found
When running python linkchecker on Win 2000
Here's the complete error message
C:\Python23\Scripts>..\python linkchecker
Traceback (most recent call last):
File "linkchecker", line 34, in ?
import getopt, re, os, pprint, socket, linkcheck
File
"c:\Python23\Lib\site-packages\linkcheck__init__.py",
line 46, in ?
import UrlData
File
"c:\Python23\Lib\site-packages\linkcheck\UrlData.py",
line 27, in ?
DNS.DiscoverNameServers()
File
"c:\Python23\Lib\site-packages\linkcheck\DNS\Base.py",
line 35, in Discov
erNameServers
init_dns_resolver_nt()
File
"c:\Python23\Lib\site-packages\linkcheck\DNS\Base.py",
line 87, in init_d
ns_resolver_nt
for item in winreg.stringdisplay(key["DhcpDomain"]):
File
"c:\Python23\Lib\site-packages\linkcheck\DNS\winreg.py",
line 39, in ge
titem
raise IndexError, "subkey %s not found"%key
IndexError: subkey DhcpDomain not found
Converted from SourceForge issue 205659, submitted by mschmitz
When I checked my intranet homepage today I saw the following lines
in linkcheckers output.
URL http://madeira/~mschmitz/usage/index.html
Parent URL http://madeira/HomeDirs/mschmitz/index.html, line 35
Real URL http://madeira/HomeDirs/mschmitz/usage/index.html
Check Time 0.413 seconds
Warning Effective URL
http://madeira/HomeDirs/mschmitz/usage/index.html
Result Error: 404 Not Found
URL http://madeira/~mschmitz/usage/index.html
Parent URL http://madeira/HomeDirs/mschmitz/, line 35
Real URL http://madeira/HomeDirs/mschmitz/usage/index.html
Check Time 0.153 seconds
Warning Effective URL
http://madeira/HomeDirs/mschmitz/usage/index.html
Result Error: 404 Not Found
This indicates that linkchecker did not cache the URL since "(cached)"
is missing in the output of the second visit of http://madeira/~mschmitz/usage/index.html
Converted from SourceForge issue 440276, submitted by ajmitch
Great app, very useful, but I've found that it
doesn't ignore links in commented out sections of
html. Unfortunately I've taken over the maintenance
of a reasonable-size site, where some things were
commetned out, and linkchecker reports false
positives on broken links :)
Thanks
Andrew
Converted from SourceForge issue 836864, submitted by mike_j_brown
LinkChecker 1.9.5
Python 2.3.2 (#1, Oct 30 2003, 04:49:57)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
One of the links in an HTML doc I was checking is
http://users.compaqnet.be/avalon/
Their server is misconfigured, such that trying to
get /robots.txt
results in a series of redirects that eventually become
recursive:
A request for /robots.txt results in a 302 with
Location: http://www.compaqnet.be/index.html?
l1=search&a=error404&error=http://users.compaqnet.be/r
obots.txt
A request for /index.html?
l1=search&a=error404&error=http://users.compaqnet.be/r
obots.txt
results in a 302 with
Location: http://www.compaqnet.be/index.html?
l1=search&a=error404&error=http://users.compaqnet.be/i
ndex.html
A request for /index.html?
l1=search&a=error404&error=http://users.compaqnet.be/i
ndex.html
results in a 302 with
Location: http://www.compaqnet.be/index.html?
l1=search&a=error404&error=http://users.compaqnet.be/i
ndex.html
(same as before).
Somehow, this leads to httplib thinking that it is dealing
with this
string as a host and port:
'www.belgacom.netindex.html?
l1=search&a=error404&error=http:'
I had to put a print into HTTPConnection._set_hostport
() to see this.
The traceback that Linkchecker produces reveals the
trouble
that httplib has in trying to parse that string.
Traceback (most recent call last):
File "/usr/local/lib/python2.3/site-
packages/linkcheck/UrlData.py", line 285, in check
self._check()
File "/usr/local/lib/python2.3/site-
packages/linkcheck/UrlData.py", line 346, in _check
self.checkConnection()
File "/usr/local/lib/python2.3/site-
packages/linkcheck/HttpUrlData.py", line 110, in
checkConnection
if not self.robotsTxtAllowsUrl():
File "/usr/local/lib/python2.3/site-
packages/linkcheck/HttpUrlData.py", line 396, in
robotsTxtAllowsUrl
rp.read()
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 70, in read
f = _opener.open(req)
File "/usr/local/lib/python2.3/urllib2.py", line 333, in
open
'_open', req)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 307, in
http_open
return decode(urllib2.HTTPHandler.http_open(self,
req))
File "/usr/local/lib/python2.3/urllib2.py", line 849, in
http_open
return self.do_open(httplib.HTTP, req)
File "/usr/local/lib/python2.3/urllib2.py", line 843, in
do_open
return self.parent.error('http', req, fp, code, msg,
hdrs)
File "/usr/local/lib/python2.3/urllib2.py", line 353, in error
result = self._call_chain(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 479, in
http_error_302
return self.parent.open(new)
File "/usr/local/lib/python2.3/urllib2.py", line 333, in
open
'_open', req)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 307, in
http_open
return decode(urllib2.HTTPHandler.http_open(self,
req))
File "/usr/local/lib/python2.3/urllib2.py", line 849, in
http_open
return self.do_open(httplib.HTTP, req)
File "/usr/local/lib/python2.3/urllib2.py", line 843, in
do_open
return self.parent.error('http', req, fp, code, msg,
hdrs)
File "/usr/local/lib/python2.3/urllib2.py", line 353, in error
result = self._call_chain(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(_args)
File "/usr/local/lib/python2.3/urllib2.py", line 479, in
http_error_302
return self.parent.open(new)
File "/usr/local/lib/python2.3/urllib2.py", line 333, in
open
'_open', req)
File "/usr/local/lib/python2.3/urllib2.py", line 313, in
_call_chain
result = func(*args)
File "/usr/local/lib/python2.3/site-
packages/linkcheck/robotparser2.py", line 307, in
http_open
return decode(urllib2.HTTPHandler.http_open(self,
req))
File "/usr/local/lib/python2.3/urllib2.py", line 849, in
http_open
return self.do_open(httplib.HTTP, req)
File "/usr/local/lib/python2.3/urllib2.py", line 808, in
do_open
h = http_class(host) # will parse host:port
File "/usr/local/lib/python2.3/httplib.py", line 986, in
init
self._setup(self._connection_class(host, port, strict))
File "/usr/local/lib/python2.3/httplib.py", line 507, in
init
self._set_hostport(host, port)
File "/usr/local/lib/python2.3/httplib.py", line 519, in
_set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
InvalidURL: nonnumeric port: ''
Converted from SourceForge issue 863220, submitted by nobody
linkchecker -r999 -s http://www.agrussell.com/
LinkChecker 1.9.3
System is 2 way SMP Dell, with 1 gig of mem
running FreeBSD 4.8 with security patches
Ran for over an hour, site has about 4800 links.
Thank you for your time.
A.G.
Start checking at 2003-12-19 15:32:35-005
********** Oops, I did it again. *************
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 281, in check
Traceback (most recent call last):
File "/usr/local/bin/linkchecker", line 440, in ?
linkcheck.checkUrls(config)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/init.py",
line 62, in checkUrls
config.checkUrl(config.getUrl())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Config.py",
line 339, in checkUrl_Threads
self.threader.startThread(url.check, ())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Threader.py",
line 56, in startThread
t.start()
File "/usr/local/lib/python2.3/threading.py", line
410, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
Unhandled exception in thread started by <bound method
Thread.__bootstrap of <Thread(Thread-1343, stopped)>>
Traceback (most recent call last):
File "/usr/local/lib/python2.3/threading.py", line
444, in __bootstrap
_print_exc(file=s)
File "/usr/local/lib/python2.3/traceback.py", line
210, in print_exc
print_exception(etype, value, tb, limit, file)
File "/usr/local/lib/python2.3/traceback.py", line
123, in print_exception
print_tb(tb, limit, file)
File "/usr/local/lib/python2.3/traceback.py", line
68, in print_tb
line = linecache.getline(filename, lineno)
File "/usr/local/lib/python2.3/linecache.py", line
14, in getline
lines = getlines(filename)
File "/usr/local/lib/python2.3/linecache.py", line
40, in getlines
return updatecache(filename)
File "/usr/local/lib/python2.3/linecache.py", line
93, in updatecache
lines = fp.readlines()
MemoryError
Converted from SourceForge issue 204259, submitted by mherbene
Appears to dislike A tags that go to anchors elsewhere in the page.
Sample html file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>LinkChecker len error test</title>
</head>
<body>
<a name="my_target"></a>
Here is the target anchor.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
Here is some text.<BR>
<a href="#my_target">This</a> link goes back to the target.
</body>
</html>
Results file:
<html><head><title>LinkChecker</title></head><body bgcolor="#fff7e5" link="#191c83" vlink="#191c83" alink="#191c83"><center><h2><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">LinkChecker</font></center></h2><br><blockquote>LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' whithin this
distribution.<br><br>Start checking at 12.04.2000 19:03:49<br><br><table align=left border="0" cellspacing="0" cellpadding="1" bgcolor="#000000"><tr><td><table align=left border="0" cellspacing="0" cellpadding="3" bgcolor="#fff7e5"><tr><td bgcolor="#dcd5cf"><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">URL</font></td><td bgcolor="#dcd5cf"><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">#my_target</font></td></tr>
<tr><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">Parent URL</font></td><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica"><a href="http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html">http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html</a> line 37</font></td></tr>
<tr><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">Real URL</font></td><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica"><a href="http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html#my_target">http://kdewebdev.kde.state.ky.us/martin/linkchecker_test.html#my_target</a></font></td></tr>
<tr><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">D/L Time</font></td><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">0.197 seconds</font></td></tr>
<tr><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">Check Time</font></td><td><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">0.424 seconds</font></td></tr>
<tr><td bgcolor="db4930"><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">Result</font></td><td bgcolor="db4930"><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">Error: len() of unsized object</font></td></tr>
</table></td></tr></table><br clear=all><br><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">Thats it. 0 warnings, 1 error found.<br>Stopped checking at12.04.2000 19:03:50 (1.077 seconds)</font></blockquote><br><hr noshade size=1><small><font face="Lucida,Verdana,Arial,sans-serif,Helvetica">LinkChecker 1.2.1, Copyright © 2000 by Bastian Kleineidam<br>Get the newest version at <a href="http://linkchecker.sourceforge.net/">http://linkchecker.sourceforge.net/</a>.<br>Write comments and bugs to <a href="mailto:[email protected]">[email protected]</a>.</font></small></body></html>
Converted from SourceForge issue 634679, submitted by nobody
linkchecker was called as:
linkchecker -r 3 http://www.math.lsu.edu
and the following error showed up after about an hour
of checking. I can be reached at [email protected] if
there are further questions.
exceptions.AttributeError 'None' object has no
attribute 'read'
Traceback (most recent call last):
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 260, in check
self._check()
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 339, in _check
try: self.parseUrl()
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 476, in parseUrl
bases = self.searchInForTag(BasePattern)
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 504, in searchInForTag
match =
pattern['pattern'].search(self.getContent(), index)
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py",
line 259, in getContent
self.data = self.urlConnection.read()
AttributeError: 'None' object has no attribute 'read'
System info:
LinkChecker 1.6.3
Python 2.1.3 (#1, Sep 7 2002, 15:29:56)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
LC_ALL = 'en_US'
http_proxy = 'http://erdos.math.lsu.edu:3128'
ftp_proxy = 'http://erdos.math.lsu.edu:3128'
Converted from SourceForge issue 864516, submitted by arussell
bifrost:/home/arussell$ cat linkchecker
linkchecker -r999 -s http://www.agrussell.com/
LinkChecker 1.10.1 Copyright � 2000-2003
Bastian Kleineidam
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to
redistribute it
under certain conditions. Look at the file `LICENSE'
within this
distribution.
Get the newest version at
http://linkchecker.sourceforge.net/
Write comments and bugs to [email protected]
Start checking at 2003-12-19 17:01:53-005
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 543, in parse_html
h = LinkParser(self.getContent())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 72, in init
self.feed(self.content)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 89, in startElement
name = linkname.href_name(self.content[self.pos():])
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
Traceback (most recent call last):
File "/usr/local/bin/linkchecker", line 449, in ?
linkcheck.checkUrls(config)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/init.py",
line 63, in checkUrls
config.checkUrl(config.getUrl())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Config.py",
line 339, in checkUrl_Threads
self.threader.startThread(url.check, ())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/Threader.py",
line 56, in startThread
t.start()
File "/usr/local/lib/python2.3/threading.py", line
410, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 543, in parse_html
h = LinkParser(self.getContent())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 72, in init
self.feed(self.content)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 89, in startElement
name = linkname.href_name(self.content[self.pos():])
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 543, in parse_html
h = LinkParser(self.getContent())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 72, in init
self.feed(self.content)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 89, in startElement
name = linkname.href_name(self.content[self.pos():])
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 543, in parse_html
h = LinkParser(self.getContent())
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 72, in init
self.feed(self.content)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/linkparse.py",
line 89, in startElement
name = linkname.href_name(self.content[self.pos():])
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
********** Oops, I did it again. *************
You have found an internal error in LinkChecker. Please
write a bug report
at
http://sourceforge.net/tracker/?func=add&group_id=1913&atid=101913
or send mail to [email protected] and
include the following information:
If you disclose some information because its too
private to you thats ok.
I will try to help you nontheless (but you have to give
me something
I can work with ;).
exceptions.MemoryError
Traceback (most recent call last):
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 287, in check
self._check()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 372, in _check
try: self.parseUrl()
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 524, in parseUrl
self.parse_html();
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 555, in parse_html
line=line, column=column, name=name))
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 144, in GetUrlDataFrom
line=line, column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/HttpUrlData.py",
line 47, in init
column=column, name=name)
File
"/usr/local/lib/python2.3/site-packages/linkcheck/UrlData.py",
line 199, in init
self.extern = (1, 0)
MemoryError
System info:
LinkChecker 1.10.1
Python 2.3.2 (#1, Oct 24 2003, 14:36:59)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
******** LinkChecker internal error, bailing out ********
bifrost:/home/arussell$
Converted from SourceForge issue 204146, submitted by stinnux
The setup.py builds a wrong ssl.so on my system
(SuSE Linux 6.4, Python 1.5.2, OpenSSL 0.9.4)
It seems that the libssl and libcrypto are not included correctly.
I fixed it by linking the ssl.so manually with the following command:
gcc -shared -L/usr/local/ssl/lib -lssl -lcrypto ssl.o -o build/platlib/ssl.so
In /usr/local/ssl/lib are libssl.a and libcrypto.a, no shared object files.
Trying to link them against the .so files results in undefined symbols from python.
I'm not a pyhton guru and don't have much knowledge about distutil, so i cannot really help more on this.
Converted from SourceForge issue 652560, submitted by chris01
I installed the new version of linkchecker on MacOS X 10.2.2 with the fink packaging system. Doing
linkchecker -v -ohtml -r2 -s http://www.eeh.ee.ethz.ch > linkcheck_eeh.html
i got the following errors:
exceptions.AttributeError addinfourl instance has no attribute 'readlines'
Traceback (most recent call last):
File "/sw/lib/python2.2/site-packages/linkcheck/UrlData.py", line 190, in check
self._check()
File "/sw/lib/python2.2/site-packages/linkcheck/HttpsUrlData.py", line 36, in _check
HttpUrlData._check(self)
File "/sw/lib/python2.2/site-packages/linkcheck/UrlData.py", line 246, in _check
self.checkConnection()
File "/sw/lib/python2.2/site-packages/linkcheck/HttpUrlData.py", line 91, in checkConnection
if self.config["robotstxt"] and not self.robotsTxtAllowsUrl():
File "/sw/lib/python2.2/site-packages/linkcheck/HttpUrlData.py", line 290, in robotsTxtAllowsUrl
rp.read()
File "/sw/src/root-python22-2.2.2-1/sw/lib/python2.2/robotparser.py", line 44, in read
lines = f.readlines()
AttributeError: addinfourl instance has no attribute 'readlines'
System info:
LinkChecker 1.8.1
Python 2.2.2 (#1, 11/12/02, 13:19:40)
[GCC Apple cpp-precomp 6.14] on darwin
Linkchecker keeps on running but sending out this error message again and again. Could anybody help me here?
Thanks, Chris.
Converted from SourceForge issue 202482, submitted by rasputen
If you run linkchecker on a Zope site, it returns without having checked anything. Here's the output for a run on zope.org:
./linkchecker -v http://www.zope.org/
LinkChecker
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' whithin this
distribution.
Get the newest version at http://linkchecker.sourceforge.net/
Write comments and bugs to [email protected]
Start checking at 02.03.2000 15:10:18
URL http://www.zope.org/
Real URL http://www.zope.org/
Result Valid: 200 OK
Thats it. 0 warnings, 0 errors found.
Stopped checking at 02.03.2000 15:10:19
Converted from SourceForge issue 568099, submitted by calvin
If you set http_proxy="http://user:pass@host:port/" then
linkchecker reports an error for every http url.
This is unfortunately a Python bug: see
https://sourceforge.net/tracker/index.php?func=detail&aid=527518&group_id=5470&atid=305470
Converted from SourceForge issue 784331, submitted by hallcp
You need to print "Content-type: text/html" as the
first thing in lc.cgi. As it is now, when there is an
error, the error message is sent before the
Content-type, and the Apache server will not transmit
the page.
It's best to always print the Content-type as early as
possible.
Thanks,
Charles Hall
Converted from SourceForge issue 857748, submitted by brianiac
Windows is listed as a supported OS for this project,
but there are no binaries available.
Windows does not include any compilers, and anyone
using ActivePython is given the error "Python was built
with version 6 of Visual Studio, and extensions need to
be built with the same version of the compiler, but it
isn't installed.".
This seems to indicate that the only way to install
LinkChecker (aside from shelling out >$1K) is to build
Python with MinGW, or find a binary built that way
(suboptimal).
Converted from SourceForge issue 601707, submitted by nobody
linkchecker -v -ohtml -r2 -s http://www.servery.cz >
dd.html
exceptions.AttributeError addinfourl instance has no
attribute 'readlines'
Traceback (most recent call last):
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 253, in check
self._check()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpsUrlData.py",
line 35, in _check
HttpUrlData._check(self)
File
"/usr/lib/python2.1/site-packages/linkcheck/UrlData.py",
line 310, in _check
self.checkConnection()
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py",
line 88, in checkConnection
if self.config["robotstxt"] and not
self.robotsTxtAllowsUrl():
File
"/usr/lib/python2.1/site-packages/linkcheck/HttpUrlData.py",
line 280, in robotsTxtAllowsUrl
rp.read()
File "/usr/lib/python2.1/robotparser.py", line 44, in
read
lines = f.readlines()
AttributeError: addinfourl instance has no attribute
'readlines'
System info:
LinkChecker 1.6.0
Python 2.1.3 (#1, Aug 25 2002, 10:07:39)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
http_proxy = ''
Converted from SourceForge issue 776416, submitted by htrd
I seeing errors in version 1.8.19 when urls contain a space
character. The referring page has this correctly quoted as
%20, but linkchecker is sending a single space character in
its http request.
line 241 of HttpUrlData.py contains a comment:
but it uses map in a way that leaves self.urlparts
unchanged. (map returns a new list - it does not mutate
the parameter)
patch attached
Converted from SourceForge issue 680426, submitted by majid
When checking a site with Amazon.com links, reports
timeout errors. Apparently Amazon just hangs when a
HEAD request to a URL with an affiliate code is
submitted, such as
http://www.amazon.com/exec/obidos/ASIN/0393320928/fazalmajidswe-20
I don't know if this normal Amazon.com behavior or if
this is just a temporary situation.
A GET request works perfectly.
There is apparently some support for falling back from
HEAD to GET if an error code such as a 405 is
encountered, but not if a timeout is encountered.
There should be an option to use GET always instead of
HEAD (perhaps with the use of the If-Modified-Since
header, although that also makes Amazon hang)
I first encountered this issue when testing the new
1.8.7 release, but it may have been lurking for longer
than that.
Converted from SourceForge issue 852627, submitted by nobody
I'd like an option to not count 401s as broken.
Converted from SourceForge issue 833419, submitted by volkerjaenisch
Checking a Site which using heavily the apache mod_rewrite module I
have found the following:
$ linkchecker -v http://unterhalt.selbmann-bergert.de
LinkChecker 1.9.4 Copyright © 2000-2003 Bastian
Kleineidam
[..]
URL http://unterhalt.selbmann-bergert.de
Tats. URL
http://www.selbmann-bergert.de/dynamic/gebiete/Familienrecht_u.
Erbrecht/03_gebiete_details_KK.html
Prüfzeit 0.908 Sekunden
Warnung URL Pfad ist leer, verwende '/'
Effektive URL
http://www.selbmann-bergert.
de/dynamic/gebiete/Familienrecht_u.
Erbrecht/03_gebiete_details_KK.html
Ergebnis Fehler: 404 Not Found
That result is definitivly wrong, since the effective URL exists.
Looking at the responsible code in HttpUrlData.py I think I found
an TAB-Allignment-Error:
The Block (starting line 243)
# check url warnings
effectiveurl = urlparse.urlunsplit(self.urlparts)
if self.url != effectiveurl:
self.setWarning(i18n._("Effective URL %s")
% effectiveurl)
self.url = effectiveurl
# check response
self.checkResponse(response)
is outside the while-loop - causing the line
self.url = effectiveurl
to be to be of no effect.
Yust indenting the block one TAB right solved the problem.
Thank you for your nice Program, I like it very much. Hope my hint
is valueable for you - let me know
Volker
inqbus it-consulting +49 ( 341 ) 5643800
Dr. Volker Jaenisch http://www.inqbus.de
Herloßsohnstr. 12 0 4 1 5 5 Leipzig
Converted from SourceForge issue 481565, submitted by dvarner
I am getting the following errors running Linkchecker
on an NT box...
URL s/2690
Name Personal Email
Parent URL http://www.yahoo.com, line 28
Base http://www.yahoo.com/
Real URL http://www.yahoo.com/s/2690
Check Time 0.060 seconds
Result Error: (10056, 'Socket is already
connected')
socket.error (10056, 'Socket is already connected')
URL r/m5
Name Yahoo! Mail
Parent URL http://www.yahoo.com, line 28
Base http://www.yahoo.com/
Real URL http://www.yahoo.com/r/m5
Check Time 0.050 seconds
Result Error: (10056, 'Socket is already
connected')
...
Have you seen these errors running Linkchecker on a
Windoze platform? I always get a Socket error. Is this
just me? I am on an NT box.
Thanks,
Drew
Converted from SourceForge issue 636802, submitted by majid
I have the following code on one of my pages
(http://www.majid.info/radio/):
<applet code=panoapplet script language="JavaScript"
codebase="http://www.majid.info/images/"
height=266
name=FPViewer1 width=400
archive="panoapplet.jar">
<param name=file
value="http://www.majid.info/images/louvre.ivr">
<param name="autoSpin" value="-50">
</applet>
And linkchecker reports an error because it tries to
load http://www.majid.info/radio/panoapplet.jar instead
of codebase + archive =
http://www.majid.info/images/panoapplet.jar
There seems to be some minimal support for codebase in
UrlData.py, but not as far as prepending it
intelligently to the URL the way <A BASE> is handled.
URL panoapplet.jar
Parent URL http://www.majid.info/radio/, line 46
Real URL http://www.majid.info/radio/panoapplet.jar
Check Time 0.051 seconds
Result Error: 404 Not Found
Converted from SourceForge issue 202482, submitted by rasputen
If you run linkchecker on a Zope site, it returns without having checked anything. Here's the output for a run on zope.org:
./linkchecker -v http://www.zope.org/
LinkChecker
LinkChecker comes with ABSOLUTELY NO WARRANTY!
This is free software, and you are welcome to redistribute it
under certain conditions. Look at the file `LICENSE' whithin this
distribution.
Get the newest version at http://linkchecker.sourceforge.net/
Write comments and bugs to [email protected]
Start checking at 02.03.2000 15:10:18
URL http://www.zope.org/
Real URL http://www.zope.org/
Result Valid: 200 OK
Thats it. 0 warnings, 0 errors found.
Stopped checking at 02.03.2000 15:10:19
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.