librauee / reptile Goto Github PK
View Code? Open in Web Editor NEW🏀 Python3 网络爬虫实战(部分含详细教程)猫眼 腾讯视频 豆瓣 研招网 微博 笔趣阁小说 百度热点 B站 CSDN 网易云阅读 阿里文学 百度股票 今日头条 微信公众号 网易云音乐 拉勾 有道 unsplash 实习僧 汽车之家 英雄联盟盒子 大众点评 链家 LPL赛程 台风 梦幻西游、阴阳师藏宝阁 天气 牛客网 百度文库 睡前故事 知乎 Wish
🏀 Python3 网络爬虫实战(部分含详细教程)猫眼 腾讯视频 豆瓣 研招网 微博 笔趣阁小说 百度热点 B站 CSDN 网易云阅读 阿里文学 百度股票 今日头条 微信公众号 网易云音乐 拉勾 有道 unsplash 实习僧 汽车之家 英雄联盟盒子 大众点评 链家 LPL赛程 台风 梦幻西游、阴阳师藏宝阁 天气 牛客网 百度文库 睡前故事 知乎 Wish
为什么爬取成功的,但是没有发送邮件啊
正则表达-公交信息 =代理IP
我手抄了一下台风历史信息的脚本,运行的时候发现总有一个错误发生
麻烦你帮我看一下哪里出错了
-*- mode: compilation; default-directory: "~/spider/spider/spiders/" -*-
Compilation started at Thu Mar 4 14:19:50
python3 typhoon.py
Traceback (most recent call last):
File "typhoon.py", line 114, in <module>
tfcraw.get_tf_detail()
File "typhoon.py", line 62, in get_tf_detail
tf_list = self.get_tf_list()
File "typhoon.py", line 44, in get_tf_list
year_list = self.get_year()
File "typhoon.py", line 34, in get_year
years = r.json()
File "/home/steiner/.local/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Compilation exited abnormally with code 1 at Thu Mar 4 14:19:51
代码在这
import requests
from pymongo import MongoClient
import time
import random
class Typhoon:
def __init__(self):
self.user_agent = [
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0",
"Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
]
self.base_url = 'http://www.wztf121.com/data/complex/{}.json'
self.headers = {
'Cookie': '_gscu_1378142123=65572018r5on4x80; _gscbrs_1378142123=1; vjuids=30469f88b.16c835d32ea.0.8062809782e9b; vjlast=1565572019.1565572019.30; Hm_lvt_e592d6befa4f9918e6496980d22c5649=1565572019; Wa_lvt_1=1565572019; Wa_lpvt_1=1565576034; _gscs_1378142123=65572018v2ofkf80|pv:8; Hm_lpvt_e592d6befa4f9918e6496980d22c5649=1565576061',
'Host': 'www.wztf121.com',
'Referer': 'http://www.wztf121.com/history.html',
'User-Agent': random.choice(self.user_agent)
}
self.client = MongoClient()
self.db = self.client.typhoon
def get_year(self):
year_list = []
years_url = self.base_url.format('years')
r = requests.get(years_url, headers = self.headers)
years = r.json()
for year in years:
year_list.append(year['year'])
print('以获取所有台风记录的年份')
return year_list
def get_tf_list(self):
tf_list = []
year_list = self.get_year()
for year in year_list:
url = self.base_url.format(year)
r = requests.get(url, headers = self.headers)
tfs = r.json()
for tf in tfs:
tfbh = tf['tfbh']
tf_list.append(tfbh)
time.sleep(random.random())
print('已获得所有台风的编号,格式为 年份 + 次序')
return tf_list
def get_tf_detail(self):
tf_list = self.get_tf_list()
count = 1
for tf in tf_list:
tf_url = self.base_url.format(tf)
r = requests.get(tf_url, headers = self.headers)
tf_detail = r.json()
begin_time = tf_detail[0]['begin_time']
ename = tf_detail[0]['ename']
end_time = tf_detail[0]['end_time']
name = tf_detail[0]['name']
points = tf_detail[0]['points']
for point in points:
latitude = point['latitude']
longitude = point['longitude']
power = point['power']
speed = point['speed']
pressure = point['pressure']
strong = point['strong']
real_time = point['time']
detail = {
'name': name,
'ename': ename,
'latitude': latitude,
'longitude': longitude,
'power': power,
'speed': speed,
'pressure': pressure,
'strong': strong,
'time': real_time,
}
self.db['detail'].insert_one(detail)
time.sleep(5 * random.random())
tf_info = {
'name': name,
'ename': ename,
'begin_time': begin_time,
'end_time': end_time,
}
self.db['info'].insert_one(tf_info)
print('已存入第{}条台风详细信息!'.format(count))
count += 1
tfcraw = Typhoon()
tfcraw.get_tf_detail()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.