Coder Social home page Coder Social logo

scrapy-zok's Introduction

Zok 组件使用说明

by: [email protected] 自用

目录

  • repetition 内容更新处理
  • save 通用持久化存储组件
  • random_UA 随机UA
  • proxies 阿布云代理组件

mysql储存

  1. 必须在zok_config中配置要持久化的数据库账户密码
  2. 在爬虫项目文件pipelines管道中,引入并使用
from zok.save.to_mysql import SaveToMysqlBase

class CityLandmarkListPipeline(SaveToMysqlBase):
    @staticmethod
    def get_sql(item):
        sql = """INSERT INTO base_city_landmark(city, county, landmark) VALUES ("{city}","{county}","{landmark}") """.format(
            city=item['city'],
            county=item['county'],
            landmark=item['landmark'],
        )
        return sql
        
'''必须调用 def_sql(item)方法,并返回sql语句即可'''

随机UA

# setting.py中 加入即可
DOWNLOADER_MIDDLEWARES = {
   'zok.random_UA.ua_random.RandomUserAgentMiddleware': 20,
}

代理ip设置

# 在setting中配置即可
DOWNLOADER_MIDDLEWARES = {
   'zok.proxies.proxies.ProxyMiddleware': 15,  # 自定义的中间件
}

基于redis内容去重更新

原理: 在储存数据之前取到hash数据值,并加以对比,如果有值就跳过不储存,无值就set(md5, id)

  1. 开启redis服务
  2. 在 zok_config中配置 redis配置
  3. 应用储存组件 mysql 就会自动启用去重增量更新功能

scrapy-zok's People

Contributors

wkunzhi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.