lanbing510 / doubanspider Goto Github PK
View Code? Open in Web Editor NEW豆瓣读书的爬虫
Home Page: http://sobook.lanbing510.info
豆瓣读书的爬虫
Home Page: http://sobook.lanbing510.info
我将其改成python3的版本,但输出的编码总是有问题,输出的是一串utf8代码,这块应该怎么弄?
wb=Workbook(optimized_write = True)
这个地方提示出错,是由于版本不对吗,你用的是哪个版本
比如:
https://book.douban.com/tag/%E5%B0%8F%E8%AF%B4?start=1099&type=T
返回没有找到符合条件的图书
File "doubanspider.py", line 75
except (urllib2.HTTPError, urllib2.URLError), e:
^
SyntaxError: invalid syntax
Python 新手
爬完保存数据时出问题
Traceback (most recent call last):
File "doubansider.py", line 142, in <module>
print_book_lists_excel(book_lists,book_tag_lists)
File "doubansider.py", line 110, in print_book_lists_excel
wb=Workbook(optimized_write=True)
TypeError: __init__() got an unexpected keyword argument 'optimized_write'
然后我去掉参数optimized_write
, 又报另外的错误
Traceback (most recent call last):
File "doubansider.py", line 143, in <module>
print_book_lists_excel(book_lists,book_tag_lists)
File "doubansider.py", line 119, in print_book_lists_excel
ws[i].append([count,bl[0],float(bl[1]),int(bl[2]),bl[3],bl[4]])
UnicodeEncodeError: 'decimal' codec can't encode characters in position 0-2: invalid decimal Unicode string
是否可以爬一段数据就保存到excel中,一直放在内存中对性能没有影响吗?
python3.7 无法运行
请问,关于书的标签您是怎么取到的?我在豆瓣读书的网页下发现,很多标签都没有展现出来,但是您的代码中有很多比较细的标签。(不知道在哪能私信到您,只能在issue里边发问。希望没有打扰到您)
作者你好,这段代码在python3上边无法直接使用。我是初学,看了一下代码 好像urllib2等模块是在python2上边使用的,其他原因还未找到,还在学习。请指教 谢谢
请问一下,你的代码爬完整个豆瓣大概用了多长时间?
另外,豆瓣所有的图书信息(3088633本,2138386KB),大概2GB,能不能分享一下,比如百度云盘?我想结合这些数据多找些高分的书看看。谢谢!
403 Forbidden
nginx
将代码运行在Python3.7环境总是报各种错误,不兼容
请问交互界面代码能放出吗?还有,我运行了代码,只能爬到1个excel,是什么原因呢?
豆瓣会根据 https://book.douban.com/subject/2235855/ 类似这样的 URL来表示一本书,这边的 Excel 中并没有记录 豆瓣的 ID,稍微有些遗憾。
请问一下,(3088633本,2138386KB)这么多书籍都是通过 图书标签 来抓取到的吗?如果是的话,抓取了哪些图书标签呢?
RT,建议将110行的wb=Workbook(optimized_write=True)
改为Workbook(write_only = True)
REF:
[typeerror-init-got-an-unexpected-keyword-argument-optimized-write]
(https://stackoverflow.com/questions/45073053/typeerror-init-got-an-unexpected-keyword-argument-optimized-write)
在运行到110行wb=Workbook(optimized_write=True)的时候,报这个错:
TypeError: init() got an unexpected keyword argument 'optimized_write'
Mac book, python 2.7.13
改成wb=Workbook(),问题解决。
请问optimized_write=True的作用是什么?删掉有影响吗?
作者您好,我们也是一家专业做IP代理的服务商,极速HTTP,想跟您谈谈是否能够达成商业推广上的合作。如果您,有意愿的话,可以联系我,微信:13982004324 谢谢(如果没有意愿的话,抱歉,打扰了)
我在Mac上使用这个脚本的时候,出现了这个错误。应该怎么解决?我Google了好久。请查看一下。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.