Coder Social home page Coder Social logo

lofterspider's People

Contributors

ishtartang avatar lyc8503 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lofterspider's Issues

lxml.etree.SerialisationError: IO_ENCODER

我在用通用模板的时候碰到了这个lxml.etree.SerialisationError: IO_ENCODER
搜了一下,在parse_template.py的第35行的etree.tostring(temp)里面加了一个encoding='utf-8'解决了问题:

open("test.html", "w", encoding="utf-8").write(etree.tostring(parse,encoding='utf-8').decode("utf-8"))

因为太短了就不提merge了

l9、l10都有这个报错,不知道是什么情况?

Traceback (most recent call last):
File "D:\PythonFiles\lofterSpider-master\l10_blogs_txt.py", line 111, in
archives_info = save_files(blog_urls)
File "D:\PythonFiles\lofterSpider-master\l10_blogs_txt.py", line 77, in save_files
time_and_title = get_time_and_title(blog_url, author_page_parse)
File "D:\PythonFiles\lofterSpider-master\l10_blogs_txt.py", line 24, in get_time_and_title
author_id = author_page_parse.xpath("//body/iframe[@id='control_frame']/@src")[0].split("blogId=")[1]
IndexError: list index out of range

也许能通过下载具体年月下的作品来迂回的解决tag模式1200限制的问题?

我看了下,目前tag模式是用的网页端的接口

https://www.lofter.com/tag/xxx/new

有最大数量(1200)的限制?

但移动端现在有个新接口,感觉可以通过自己手动筛选年月来迂回的解决这个问题?
理想状态是也可以通过年月遍历这个接口获取全部
(不过也不知道这个是否会有限制)

https://api.lofter.com/newapi/tagPosts.json

image

对应app端下面这个功能,能获取 tag-具体年月下-按热度排序的作品
171845a4dc44c6b4ac39d45293d0630

抓了下包,对Python不是非常很了解,只能拜托大大研究下这个是否可行了TUT
不知道能不能爬取app上的接口数据

I4、I9无法下载了

Traceback (most recent call last):
File "F:\lofterSpider-master\l9_author_txt.py", line 472, in
run(author_url, get_comm, additional_break, start_time, end_time, chapter_merge_title, additional_chapter_index)
File "F:\llofterSpider-master\l9_author_txt.py", line 273, in run
author_id = author_page_parse.xpath("//body//iframe[@id='control_frame']/@src")[0].split("blogId=")[1]
IndexError: list index out of range

更新了也不行,无法下载

l9_author_txt 好像用不了了

链接 https://weitayiji.lofter.com/
报错
Traceback (most recent call last):
File "E:\py\lofterSpider-master\l9_author_txt.py", line 472, in
run(author_url, get_comm, additional_break, start_time, end_time, chapter_merge_title, additional_chapter_index)
File "E:\py\lofterSpider-master\l9_author_txt.py", line 273, in run
author_id = author_page_parse.xpath("//body/iframe[@id='control_frame']/@src")[0].split("blogId=")[1]
IndexError: list index out of range

爬取大量博文时出错

认真地拜读了您的这个项目
您为每一个功能都详细地解说,做了辛勤的工作,再此首先感谢

我下午试用了本项目的python脚本
发现当爬取某一个博主的所有作品时,如果其作品数非常多的话
本脚本就会没爬取到,结果是没有爬到照片

我试了另一个作者的作品
发现其脚本能够克服以上问题
在此贴出出处希望可以给您提供参考,改进本项目
https://github.com/Litreily/capturer
https://www.litreily.top/2018/03/17/lofter/

因为我觉得您的项目提供了更为丰富的功能
前文的作者只提供了lofter-图片这一功能
而您的项目提供了文字、图片的功能
一旦前文的问题得以改进
一定是一个更好的项目
所以才敢大胆的跟您提出以上想法

前文提到的那个项目,只能实现lofter的图片保存
如果您能克服爬取丢失的bug,而且又能覆盖文字、图片、视频
那么就会成为关于lofter最全面的项目了
这也是我发这个issue的原因和动机

最后再次感谢您做出的辛勤工作

是否考虑添加收录图片+文字并且根据文章分类到文件夹?

您好,可能说的不太清楚,但是有些文章是包含图片和文字的,是否能够将收录模式更改成一篇文章是一个文件夹,里面包括图片和txt呢?我们正在进行资源站建设所以量比较大……就用了您的爬虫啦,非常感谢。有些太太授权之后也无法提供所有的原文了,所以图片和 txt 分开收录,到时候拼在一起也挺麻烦的。不知道您是否有时间进行更新?

https://msy60048.lofter.com/view
目前遇到困难的是收录这位太太的文章,您可以看一下。

顺便,好兆头我也磕一口!

图片下载顺序错误

你好,谢谢写了这个!
我发现在下载作者博客图片以后图片的顺序不是原博客作者贴的顺序,请问有办法解决嘛?
下载的是 https://manon09.lofter.com/ 里的所有图片,用了l4_author_img.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.