Coder Social home page Coder Social logo

scrapy_baidu's Introduction

scrapy_baidu

百度网页搜索爬虫(查询结果列表页和详情页抓取,详情页正文提取)


爬虫设计

主要python包:requests+BeautifulSoup+jparser+url2io 其中jparser、url2io都用于网页文本正文提取,url2io准确率高,但不稳定,解析错误时则调用jparser。通过两者结合使用来提高正文提取的效果。

jparser

url2io

运行

python scrapy_baidu.py

抓取结果展示

record_id,query,title,abstract,link,content
1,消息面看贸易战进入持久战当下只能说是短暂靴子落地美公布关税豁免程序,贸易战后操作思路_烽火通信(600498)聊吧_赢家聊吧【股吧】,"贸易战既然已经开始,时间上持久战的概率较大。而当下也只能说是短暂的靴子落地,随时还有再起烽火可能,美国公布了个关税豁免程序但那只是正常程序,切勿就...",http://www.baidu.com/link?url=fJ5P3mDdJxPJPBMMaMXI66n0_l72H4Q-ouH3LrM6jwrPW3_Eknpa2LfH_fk3zh_29wqXlDRCR9--xoHr1YMLLa, 周五盘面情绪波动过激主要来源于中美贸易……CHgsuVPJQcO56K0ZGpT8Qw短线宜做逆风的墙头草,即再跌不悲观,反弹则不能盲目乐观,操作上忌追涨杀跌。把控仓位、跟随市场热点小打小闹为好。 2,消息面看贸易战进入持久战当下只能说是短暂靴子落地美公布关税豁免程序,"保持淡定,为什么说“贸易战”并不可怕?","2018年3月26日 - 既然靴子还没有落地,就不必过于惊慌,意向在落实...周边市场的需求推动下,我认为长期看是利好消息...美商务部长:惩罚关税不会引发贸易战,最终会以谈判...",http://www.baidu.com/link?url=SJ5MEHBKnkWGAdHs_6HCZgjvx1ou6ZVDxEA64Sp_Le5gLoXcyVzn7eL0GDzn4yMi7x5HxOmiMr0jdl_EOtEet_7TGJFxSYnbTIOjdJA81Y_, 霸越英百家号03-2619:31让我们先来回顾下当下热议的贸易战的由来——媒体错误描述贸易战经过3月8日,美国总统特朗普宣布,以损害国家安全为由……另外2017年出口对于GDP的贡献度只有18.5%,即使600亿美元产品征税导致出口额全部取消,对于**GDP的影响也只有4%*18.5%=0.74%。

scrapy_baidu's People

Contributors

neo-luo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.