Coder Social home page Coder Social logo

scrapy-zhihu-github's Introduction

👋 Hi there

健康,爱情和使命,按照这个顺序,其它的都不重要。 ------ 纳瓦尔

我是 ChenSoul,Java 软件工程师,目前在武汉,曾就职于阿里巴巴、拉手网等互联网公司,现就职于多点,从事新零售、物联网、智能安防方面的开发工作。

热爱编程,喜欢跑步。读书、健身、定投、帮朋友、陪家人,做一个长期主义者。

如何找到我

博客最近更新

更多内容直接点击:https://blog.chensoul.cc

豆瓣最近活动

Github最近发布

最近一周编码

light

最近一年跑步

light

更多内容直接点击:https://run.chensoul.cc

scrapy-zhihu-github's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scrapy-zhihu-github's Issues

知乎登录如何解决?

你好~
我想问下,你是如何解决登录知乎的,因为最近知乎在登录方面又加了验证码。

因为我之前用requests试了下,保持一个session,下载验证码图片,在用这个session登录,可以成功。

我的想法是在scrapy中先用requests登录完,把session传给scrapy中的 cookiejar:requests.session(),
请问这样做可以么?或者有好的方法可以推荐么?

requests登录知乎代码
谢谢~

你好,请教一个抓取问题

比如我有这样一段html,结构很乱的那种,直接贴源代码:

<div id="fontzoom" class="display_text">
          <p></p>
          <div style="overflow-x: hidden; word-break: break-all" class="TRS_PreAppend">
              <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; font-weight: normal"><font style="font-size: 16pt">&nbsp;&nbsp;&nbsp;&nbsp;12<font face="宋体" style="font-size: 16pt">日至</font><font face="Times New Roman" style="font-size: 16pt">14</font><font face="宋体" style="font-size: 16pt">日,由工业和信息化部、教育部、科技部、**科学院、湖北省人民政府、武汉市人民政府主办的“第十一届**·湖北产学研合作项目洽谈会”在武汉国际博览中心召开。</font></font></span></p>
             <p style="margin-top: 0pt; margin-bottom: 0pt" class="p0"><span style="font-family: '宋体'; font-size: 10.5pt; font-weight: normal"><font style="font-size: 16pt">&nbsp;&nbsp;&nbsp;&nbsp;我市四大产业集群<font face="Times New Roman" style="font-size: 16pt">25</font><font face="宋体" style="font-size: 16pt">家企业代表参加了展会。洽谈会上,我市晨信光电、天泰辐照等</font><font face="Times New Roman" style="font-size: 16pt">5</font><font face="宋体" style="font-size: 16pt">家企业与科研院所进行了产学研项目现场签约,取得了良好的辐射带动效果。(今日大冶&nbsp;周雨婷&nbsp;周晓东)&nbsp;&nbsp;</font></font></span>
              </p>
           </div>
           <p></p>
</div>

我现在想两种方式提取内容,一种是保留这个div的html,一种是只要div内的所有文字,请问该怎样写呢?

selector.xpath('div[@class="TRS_PreAppend"]/p/text()').extract()

我这样内容都取不到,因为文字在p后面的font标签里。求教!谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.