Coder Social home page Coder Social logo

zifangsky / weatherspider Goto Github PK

View Code? Open in Web Editor NEW
363.0 29.0 142.0 55.04 MB

天气爬虫(全国城镇天气自动定时抓取更新,并开放RESTful查询接口),附带代理IP池定时更新并检测其可用性

Java 99.02% HTML 0.51% JavaScript 0.47%
java spring spider weather webmagic springboot restful-api

weatherspider's Introduction

WeatherSpider

天气爬虫(全国城镇天气自动定时抓取更新,并开放RESTful查询接口),附带代理IP池定时更新并检测其可用性

2018年6月更新说明:

这个项目本来是我两年前写的一个Demo项目,当时为了熟悉Java开发的常用框架,因此在项目中使用了一些没必要的框架,导致最后项目的框架依赖问题比较严重,且部分代码的写法较为繁琐。这两年来,因为这个项目还是获得了一些朋友的喜欢,加上自己对Java技术有了更多的了解,因此在这个6月我对项目代码进行了重构。项目的总体逻辑没有改变,只是优化了项目的架构、减少了技术依赖以及优化了部分代码的写法。

特别提醒:

技术依赖:

  • Spring Boot:项目基础架构,包括提供基本Web服务定时调度服务
  • DruidAlibaba开源的JDBC数据库连接池
  • MyBatis :一款优秀的持久层框架 ,用于访问MySQL数据库
  • WebMagic:轻量型爬虫框架,用于抓取每个城镇的天气以及抓取免费代理IP(抓取到的代理IP没有在项目中使用
  • Spring Kafka:用于在Spring应用中连接Kafka集群,在项目中主要通过消息队列的方式更新各个城市的天气以及检测爬取到的代理IP是否失效
  • ZooKeeperSpring Kafka环境需要的依赖

环境依赖:

  • JDK8+
  • MYSQL5.7+
  • Kafka集群
特别提醒:

项目结构:

项目结构

项目运行:

1、下载源码:

项目地址:https://github.com/zifangsky/WeatherSpider

PS:希望感兴趣的朋友给我来一波star,谢谢!

2、配置Kafka集群:

具体安装步骤可以参考上面给出的链接,然后创建两个项目中需要使用的Topic,它们分别是:topic-proxyIptopic-weather

3、配置数据库环境:

具体用到的SQL文件可以参考上面给出的链接

4、修改项目的配置文件:

在这里只需要修改项目中application-dev.properties文件的相关配置即可,具体配置项的含义可以参考注释

5、运行项目:

使用Maven编译项目,然后运行编译生成的jar文件(或者可以直接下载Releases的jar包)。在项目启动起来之后,就可以根据前面设置的定时任务、在指定时间执行天气更新任务、代理IP获取任务以及代理IP的可用性检测任务

项目对外提供的RESTful接口:

本项目对外发布了4个RESTful风格的接口。它们分别是:

i)随机返回一个可用的代理IP:

http://localhost:7080/proxyIp/selectRandomIP

其输出报文如下:

{
  "id": 676,
  "ip": "101.37.79.125",
  "port": 3128,
  "type": "HTTPS",
  "addr": null,
  "used": false,
  "other": null
}
ii)返回当前所有可用的代理IP:

http://localhost:7080/proxyIp/selectAll

其输出报文如下:

[
  {
    "id": 667,
    "ip": "61.135.217.7",
    "port": 80,
    "type": "HTTP",
    "addr": "北京",
    "used": false,
    "other": null
  },
  {
    "id": 668,
    "ip": "122.114.31.177",
    "port": 808,
    "type": "HTTP",
    "addr": "河南郑州",
    "used": false,
    "other": null
  },
  {
    "id": 669,
    "ip": "221.228.17.172",
    "port": 8181,
    "type": "HTTPS",
    "addr": "江苏无锡",
    "used": false,
    "other": null
  },
  
  ...
  
]
iii)根据城镇CODE返回一个城镇天气:

http://localhost:7080/weather/selectByStationCode/{stationCode}

注意:这里的 CODE可以从数据库的weather_station表查询

其输出报文如下:

{
  "country": {
    "id": 1,
    "code": "101",
    "name": "**",
    "description": null
  },
  "province": {
    "id": 18,
    "code": "10106",
    "countryId": 1,
    "name": "吉林",
    "description": null
  },
  "city": {
    "id": 175,
    "code": "1010604",
    "provinceId": 18,
    "name": "四平",
    "description": null
  },
  "station": {
    "id": 214,
    "code": "101060404",
    "cityId": 175,
    "name": "公主岭",
    "description": null
  },
  "weather": {
    "id": 172,
    "stationId": 214,
    "hour": "[\"21日08时,d01,多云,21℃,西南风,<3级,2\",\"21日11时,d00,晴,24℃,西南风,<3级,1\",\"21日14时,d00,晴,24℃,西南风,<3级,1\",\"21日17时,d00,晴,26℃,西南风,<3级,1\",\"21日20时,n00,晴,23℃,西南风,<3级,0\",\"21日23时,n00,晴,19℃,西南风,<3级,0\",\"22日02时,n00,晴,19℃,西南风,<3级,0\",\"22日05时,n00,晴,19℃,西南风,3-4级,0\",\"22日08时,d00,晴,24℃,西南风,3-4级,1\"]",
    "today": "21日(今天),晴,27/19℃,西南风/西南风,<3级转3-4级",
    "nextday": "22日(明天),晴,30/22℃,西南风/西南风,4-5级转3-4级",
    "next2day": "23日(后天),小雨转晴,29/20℃,西南风/西南风,<3级",
    "next3day": "24日(周日),晴,29/18℃,西南风/西南风,<3级转4-5级",
    "next4day": "25日(周一),晴,28/17℃,西南风/西南风,4-5级转<3级",
    "next5day": "26日(周二),小雨,25/16℃,西南风/西南风,<3级",
    "next6day": "27日(周三),小雨转大雨,26/15℃,西南风/无持续风向,<3级",
    "t24situation": null,
    "other": null
  }
}
iv)根据城镇名称模糊查询,返回所有匹配的城镇天气:

http://localhost:7080/weather/selectByStationName/{stationName}

注意:这里的stationName可以是城市/地区的中文名称,比如:朝阳

其输出报文如下:

[
  {
    "country": {
      "id": 1,
      "code": "101",
      "name": "**",
      "description": null
    },
    "province": {
      "id": 16,
      "code": "10107",
      "countryId": 1,
      "name": "辽宁",
      "description": null
    },
    "city": {
      "id": 144,
      "code": "1010712",
      "provinceId": 16,
      "name": "朝阳",
      "description": null
    },
    "station": {
      "id": 13,
      "code": "101071201",
      "cityId": 144,
      "name": "朝阳",
      "description": null
    },
    "weather": {
      "id": 105,
      "stationId": 13,
      "hour": "[\"21日08时,d00,晴,24℃,西北风,<3级,2\",\"21日11时,d00,晴,32℃,西北风,<3级,2\",\"21日14时,d00,晴,33℃,西北风,<3级,2\",\"21日17时,d00,晴,33℃,西北风,<3级,2\",\"21日20时,n00,晴,26℃,西北风,3-4级,0\",\"21日23时,n00,晴,22℃,西南风,3-4级,0\",\"22日02时,n00,晴,21℃,西南风,3-4级,0\",\"22日05时,n00,晴,20℃,西南风,<3级,0\",\"22日08时,d00,晴,26℃,西南风,3-4级,2\"]",
      "today": "21日(今天),晴,34/20℃,西北风/西南风,3-4级",
      "nextday": "22日(明天),多云,34/22℃,西南风/西南风,4-5级",
      "next2day": "23日(后天),多云,35/21℃,西南风/西北风,4-5级转3-4级",
      "next3day": "24日(周日),多云转雷阵雨,34/21℃,东南风/西南风,4-5级转3-4级",
      "next4day": "25日(周一),多云转小雨,33/20℃,西南风/西南风,4-5级转3-4级",
      "next5day": "26日(周二),多云转晴,31/20℃,西南风/西南风,3-4级",
      "next6day": "27日(周三),晴转小雨,33/19℃,西南风/无持续风向,3-4级转<3级",
      "t24situation": null,
      "other": null
    }
  },
  {
    "country": {
      "id": 1,
      "code": "101",
      "name": "**",
      "description": null
    },
    "province": {
      "id": 24,
      "code": "10101",
      "countryId": 1,
      "name": "北京",
      "description": null
    },
    "city": {
      "id": 253,
      "code": "1010100",
      "provinceId": 24,
      "name": "北京",
      "description": null
    },
    "station": {
      "id": 803,
      "code": "101010300",
      "cityId": 253,
      "name": "朝阳",
      "description": null
    },
    "weather": {
      "id": 416,
      "stationId": 803,
      "hour": "[\"21日08时,d01,多云,25℃,东南风,<3级,1\",\"21日11时,d01,多云,33℃,东南风,<3级,1\",\"21日14时,d01,多云,34℃,东南风,<3级,3\",\"21日17时,d01,多云,33℃,东南风,<3级,3\",\"21日20时,n01,多云,32℃,东南风,<3级,0\",\"21日23时,n01,多云,24℃,东南风,<3级,0\",\"22日02时,n01,多云,24℃,东南风,<3级,0\",\"22日05时,n01,多云,23℃,东南风,<3级,0\",\"22日08时,d01,多云,23℃,东南风,<3级,1\"]",
      "today": "21日(今天),多云,35/23℃,东南风/东南风,<3级",
      "nextday": "22日(明天),雷阵雨转多云,32/21℃,南风/北风,<3级",
      "next2day": "23日(后天),多云转晴,35/24℃,南风/南风,<3级",
      "next3day": "24日(周日),多云,33/23℃,东南风/东风,<3级",
      "next4day": "25日(周一),阴转雷阵雨,34/24℃,北风/北风,<3级",
      "next5day": "26日(周二),多云转晴,34/22℃,西南风/西南风,<3级",
      "next6day": "27日(周三),多云,36/24℃,西南风/西南风,3-4级转<3级",
      "t24situation": null,
      "other": null
    }
  }
]

项目开发和重构思路

  1. 最原始的项目的开发思路:https://www.zifangsky.cn/901.html
  2. 本次的重构思路:https://www.zifangsky.cn/1271.html

weatherspider's People

Contributors

zifangsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weatherspider's Issues

DruidDataSource.java:688 连接池连接不了Mysql数据库是什么问题啊...(账号密码都改了》。《》)

18-01-22 00:36:29,373 ERROR com.alibaba.druid.pool.DruidDataSource(DruidDataSource.java:688) ## init datasource error, url: jdbc:mysql://localhost:3306/weatherspider?autoReconnect=true&useUnicode=true&characterEncoding=utf-8&failOverReadOnly=false&useSSL=false
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1015)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920)
at com.mysql.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:2388)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2309)
at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:834)
at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:416)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:346)
at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:148)
at com.alibaba.druid.filter.FilterAdapter.connection_connect(FilterAdapter.java:785)
at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:142)
at com.alibaba.druid.filter.stat.StatFilter.connection_connect(StatFilter.java:211)
at com.alibaba.druid.filter.FilterChainImpl.connection_connect(FilterChainImpl.java:142)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1423)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1477)
at com.alibaba.druid.pool.DruidDataSource.init(DruidDataSource.java:677)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:988)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:984)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:103)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:289)
at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:329)
at org.springframework.scheduling.quartz.LocalDataSourceJobStore.initialize(LocalDataSourceJobStore.java:150)
at org.quartz.impl.StdSchedulerFactory.instantiate(StdSchedulerFactory.java:1321)
at org.quartz.impl.StdSchedulerFactory.getScheduler(StdSchedulerFactory.java:1525)
at org.springframework.scheduling.quartz.SchedulerFactoryBean.createScheduler(SchedulerFactoryBean.java:597)
at org.springframework.scheduling.quartz.SchedulerFactoryBean.afterPropertiesSet(SchedulerFactoryBean.java:480)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1637)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1574)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:545)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:753)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:838)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:537)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:446)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:328)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:107)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4745)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5207)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1419)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1409)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.sql.SQLException: java.lang.ClassCastException: java.math.BigInteger cannot be cast to java.lang.Long
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920)
at com.mysql.jdbc.ConnectionImpl.buildCollationMapping(ConnectionImpl.java:1074)
at com.mysql.jdbc.ConnectionImpl.initializePropsFromServer(ConnectionImpl.java:3593)
at com.mysql.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:2351)
... 53 more

请求商务推广合作

作者您好,我们也是一家专业做IP代理的服务商,极速HTTP,我们注册认证会送10000IP(可以帮助您的学者适当薅羊毛试用 :) 。想跟您谈谈是否能够达成商业推广上的合作。如果您,有意愿的话,可以联系我,微信:13982004324 谢谢(如果没有意愿的话,抱歉,打扰了)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.