作者:liuzhijun
微信: lzjun567
公众号:Python之禅(id:VTtalk)
lzjun567 / python_scripts Goto Github PK
View Code? Open in Web Editor NEW一些python相关的演示代码
License: Apache License 2.0
一些python相关的演示代码
License: Apache License 2.0
作者:liuzhijun
微信: lzjun567
公众号:Python之禅(id:VTtalk)
明明已经有click了,一运行就报错
运行 heart.py
,response.json()[0]
报错:
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
新浪微博API返回的是html,并不是json格式,因此报错
关于beautifulsoup3不支持python2,是不是作者写错了。不支持python3吧?
root@raspberrypi:/home/pi/python/crawler_html2pdf/pdf# python3 crawler.py
Traceback (most recent call last):
File "crawler.py", line 14, in
import pdfkit
ImportError: No module named 'pdfkit'
这是为啥?
Traceback (most recent call last):
File "crawler.py", line 165, in
crawler.run()
File "crawler.py", line 99, in run
pdfkit.from_file(htmls, self.name + ".pdf", options=options)
File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
return r.to_pdf(output_path)
File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
The switch --outline-depth, is not support using unpatched qt, and will be ignored.Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.
blog/crawler_blog.py
File "crawler.py", line 163, in <module>
crawler.run()
File "crawler.py", line 97, in run
pdfkit.from_file(htmls, self.name + ".pdf", options=options)
File "C:\Anaconda3\envs\py3-dj\lib\site-packages\pdfkit\api.py", line 49, in from_file
return r.to_pdf(output_path)
File "C:\Anaconda3\envs\py3-dj\lib\site-packages\pdfkit\pdfkit.py", line 156, in to_pdf
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Loading pages (1/6)
Warning: Failed to load http://www.liaoxuefeng.comhttp//service.t.sina.com.cn/widget/qmd/1658384301/078cedea/2.png (ignore)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done
Exit with code 1 due to network error: ProtocolUnknownError
您好,请问一下,目前知乎的验证码变成了倒立汉字,请问应该如何做呢?
I highly recommend you to use 'weasyprint' as an alternative of 'pypdf' to avoid font size issue.
And as far as I know, ajax pics cannot be extracted for all webpage2pdf modules. :)
Traceback (most recent call last): File "crawler.py", line 163, in <module> crawler.run() File "crawler.py", line 90, in run for index, url in enumerate(self.parse_menu(self.request(self.start_url))): File "crawler.py", line 116, in parse_menu menu_tag = soup.find_all(class_="uk-nav uk-nav-side")[1]
OSError: No wkhtmltopdf executable found: "b''"
那个报错
我环境变量里加的是 D:\Program Files\wkhtmltopdf\bin\
以为是\b这个在python里解析出错的造成的,于是去改成 D:\\Program Files\\wkhtmltopdf\\bin\\
还是不行。
我参考了老外的问答
http://stackoverflow.com/questions/27673870/cant-create-pdf-using-python-pdfkit-error-no-wkhtmltopdf-executable-found
改成
config = pdfkit.configuration(wkhtmltopdf=r"D:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe")
pdfkit.from_file(htmls, file_name, options=options, configuration=config)
就可以正常运行了
请问只能这样处理吗?
看到文档中提到爬取微博数据时需要 cookies,是否意味着非本人账户(无密码)的微博无法爬取?
File "crawler.py", line 35
"""
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc5 in position 5: invalid continuation byte
周半部分的好多图片下载失败,是不是wkhtmltopdf分配的缓存太小了。因为失败的图片总是后半部分的图。而错误信息也没有提示什么有用的:
Traceback (most recent call last):
File "crawler.py", line 165, in
crawler.run()
File "crawler.py", line 99, in run
pdfkit.from_file(htmls, self.name + ".pdf", options=options)
File "D:\Program Files\Python36\lib\site-packages\pdfkit\api.py", line 49, in from_file
return r.to_pdf(output_path)
File "D:\Program Files\Python36\lib\site-packages\pdfkit\pdfkit.py", line 156, in to_pdf
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Loading pages (1/6)
Warning: Failed to load file:///static/img/404.png (ignore)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done
Exit with code 1 due to network error: ProtocolUnknownError
在执行转换pdf文件命令时总是报这个错,有人知道吗?查了一下都没有看到解决方法。
def func(m):
if not m.group(3).startswith("http"):
rtn = m.group(1) + get_domain(url) + "/" + m.group(2) + m.group(3)
#rtn = m.group(1) + domain + m.group(2) + m.group(3)
return rtn
else:
return m.group(1) + m.group(2) + m.group(3)
html = re.compile(pattern).sub(func, html)
我发现里面有问题,于是修改为
大家可以看下 https://regex101.com/ 的测试效果
m.group(2)
才是匹配那个网址哦
所以并不是错误的 m.group(3)
那个只是匹配到 ”
而我看不懂那个正侧替换,查参考资料官方是
re.sub(pattern, repl, string, count=0, flags=0)
repl是字符串 或者 函数
#给 h1 tag 设置居中属性
body.find('h1')['style'] = "text-align:center;"
这时候要用
body = soup.find(class_="article-intro")
#body = soup.find_all(class_="article-intro") #如果用find_all 那后面就要用 html = h[1:-1] 去掐头去尾 去掉 [ 和 ]
if not m.group(3).startswith("http"):
应该是group(2)吧
ubuntu 16.04
$ wkhtmltopdf --version
wkhtmltopdf 0.12.2.4
在windows10下pip需要安装beautifulsoup4 不加4 默认安装的是3.
在win ubuntu全提示字符错误
windows
ERROR:root:瑙f瀽閿欒 Traceback (most recent call last): File "crawler.py", line 56, in parse_url_to_html html = html.encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 134: ordinal not in range(128)
ubuntu
ERROR:root:解析错误 Traceback (most recent call last): File "crawler.py", line 56, in parse_url_to_html html = html.encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 134: ordinal not in range(128) ERROR:root:解析错误 Traceback (most recent call last): File "crawler.py", line 56, in parse_url_to_html html = html.encode("utf-8")
for card in cards:
TypeError: 'NoneType' object is not iterable
Traceback (most recent call last):
File "crawler.py", line 56, in parse_url_to_html
f.write(html)
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "crawler.py", line 119, in
main()
File "crawler.py", line 108, in main
htmls = [parse_url_to_html(url, str(index) + ".html") for index, url in enumerate(urls)]
File "crawler.py", line 108, in
htmls = [parse_url_to_html(url, str(index) + ".html") for index, url in enumerate(urls)]
File "crawler.py", line 60, in parse_url_to_html
print(e.message)
AttributeError: 'TypeError' object has no attribute 'message'
错误信息:OSError: wkhtmltopdf exited with non-zero code 1. error:
You need to specify at least one input file, and exactly one output file
请问,pdfkit是根据什么自动生成目录的?我修改代码后,生成的pdf文件没有生成目录
爬取我想要的代码时 对于网页里的所有图片 全部都是failed to load
python3 crawler.py
Traceback (most recent call last):
File "crawler.py", line 163, in
crawler.run()
File "crawler.py", line 90, in run
for index, url in enumerate(self.parse_menu(self.request(self.start_url))):
File "crawler.py", line 116, in parse_menu
menu_tag = soup.find_all(class_="uk-nav uk-nav-side")[1]
IndexError: list index out of range
Traceback (most recent call last):
File "crawler.py", line 165, in
crawler.run()
File "crawler.py", line 99, in run
pdfkit.from_file(htmls, self.name + ".pdf", options=options)
File "/usr/local/lib/python3.4/dist-packages/pdfkit/api.py", line 49, in from_file
return r.to_pdf(output_path)
File "/usr/local/lib/python3.4/dist-packages/pdfkit/pdfkit.py", line 159, in to_pdf
raise IOError("wkhtmltopdf exited with non-zero code {0}. error:\n{1}".format(exit_code, stderr))
OSError: wkhtmltopdf exited with non-zero code -6. error:
The switch --outline-depth, is not support using unpatched qt, and will be ignored.QXcbConnection: Could not connect to display
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.