Coder Social home page Coder Social logo

wistbean / learn_python3_spider Goto Github PK

View Code? Open in Web Editor NEW
15.4K 15.4K 3.5K 22.12 MB

python爬虫教程系列、从0到1学习python爬虫,包括浏览器抓包,手机APP抓包,如 fiddler、mitmproxy,各种爬虫涉及的模块的使用,如:requests、beautifulSoup、selenium、appium、scrapy等,以及IP代理,验证码识别,Mysql,MongoDB数据库的python使用,多线程多进程爬虫的使用,css 爬虫加密逆向破解,JS爬虫逆向,分布式爬虫,爬虫项目实战实例等

Home Page: http://fxxkpython.com

License: MIT License

Python 95.62% Shell 0.02% C 3.02% XSLT 0.66% Roff 0.06% GAP 0.08% Cython 0.54%
python-script python-spider python3

learn_python3_spider's Introduction

小帅b AKA wistbean

😘帅b老仙,法力无边😘

Languages and Tools:

android flask git html5 java javascript linux mysql nginx python pytorch redis spring tensorflow vuejs


关于我

  • 我叫 wistbean ,这名字是我很久以前的英语老师帮我取的,因为我的名字里有个「彬」字,而 wist 是风趣、才智的意思,b ean和彬谐音,所以你懂啦~
  • 当然,也有人叫我 bean哥,彬哥,小帅b... 随你们喜欢。
  • 我喜欢自由,喜欢写作,喜欢写代码,喜欢分享,喜欢妹纸,喜欢装那种通过自己努力得到的逼。
  • 我的信条是:答应自己的事情,就要做到!
  • 这是我的博客 ,我会在这里做一件关于坚持的事。
  • 我对于认定的事情,会很认真专注。
  • 我很帅,至于多帅?我想破裂的镜子已经给我答案。
  • 向往数字游民,对世界充满好奇心。
  • 目前单身。
  • 关注我的公众号「学习 Python 的正确姿势」,发送「帅书」可以获取我写的一本很好玩的电子书。

learn_python3_spider's People

Contributors

eunknight avatar fuercaisi avatar huangwb8 avatar lovevantt avatar naelsondouglas avatar ttzc avatar wistbean avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learn_python3_spider's Issues

TypeError: object of type 'NoneType' has no len()

我在豆瓣top250那个案例里,运行文件,报错TypeError: object of type 'NoneType' has no len(),
报错语句是 soup = BeautifulSoup(html, 'lxml')
请教这个怎么解决,是什么原因?

咋回事

list = soup.find(class_='grid_view').findall('li')
TypeError: 'NoneType' object is not callable

文章被删除了

python爬虫系列教程14 | 害羞,用多线程秒爬那些万恶的妹纸们,纸巾呢? 作者补充一哈嘛!/滑稽

一直得到418

你好, 我根据你的代码, 可以使用request.get. response.status_code == 418,

这怎样修改? 求教

数据有问题

河北高考2011年的分数线不对,可以核查一下。

没有想到原因,求解

File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4_init_.py", line 287, in init
elif len(markup) <= 256 and (
TypeError: object of type 'NoneType' has no len()

clone失败

19052225@68A036519052225 MINGW64 /d/sipder
$ git clone https://github.com/wistbean/learn_python3_spider
Cloning into 'learn_python3_spider'...
remote: Enumerating objects: 8, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 4546 (delta 1), reused 0 (delta 0), pack-reused 4538
Receiving objects: 100% (4546/4546), 21.89 MiB | 5.62 MiB/s, done.
Resolving deltas: 100% (310/310), done.
error: invalid path '全国高考历年录取分数据/
2006-2017上海高考录取分数线(汇总)
.html'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

.xls

兄弟,第二个爬虫程序里面的book.save(),它的格式不应该是.xls吗

教程有一点过时了

后面的安卓爬虫,如果是version9就用不了uiautoformatter了,得用appium desktop

爬虫14 ThreadPoolExecutor 使用有点错误

爬虫14 ThreadPoolExecutor 使用有点错误

pool.submit(moyu_time('xiaoshuaib'+str(i),1,3))

应该是

pool.submit(moyu_time,'xiaoshuaib'+str(i),1,3)

否则根本就不是多线程了

(please help) having problem in my code

Description of the Bug

1.Hi Developers im new to coding. i tried to make this video uploading app to my firebase storage but im not that good at coding so im stuck at this problem. when upload button is pressed the filechooser is opened and if i select a file it gets uploaded to the firebase But if i cancel and close the filechooser it gives an error.
2. i am unaware if this file chooser works on android or nor...if it does not please help me with this code too. im stuck on this since 6 January ,thats how poor i am.

Code

from kivy.app import App
from kivy.lang import Builder

kv = '''
<MainScreen>:
    name: 'mainscreen'
    MDLabel:
        id:username_info
        text:'Hello Main'
        font_style:'H1'
        halign:'center'

    MDFloatLayout:
        id:floate
        Video:
            id:vid

        MDToolbar:
            title: 'Bottom navigation'
            md_bg_color: .2, .2, .2, 1
            specific_text_color: 1, 1, 1, 1
        MDBottomNavigation:
            panel_color: 1,1,1,1
            MDBottomNavigationItem:
                name: 'screen 1'
                text: 'Home'
                icon: 'home-outline'
                MDRaisedButton:
                    id:upload
                    text:'Upload'
                    pos_hint:{'center_x':.5, 'center_y':.4}
                    on_release:
                        app.file_chooser()
                        upload.disabled=False
'''


class goodApp(MDApp):

    def build(self):
        self.strng = Builder.load_string(help_str)
        self.url  = "link.json"
        return self.strng

    def file_chooser(self):
        filechooser.open_file(on_selection=self.selected)        
    def selected(self,selection):
        config={
              'Api keys of firebase'
        }

        firebase=pyrebase.initialize_app(config)
        storage=firebase.storage()            
        Directory=selection[0]
        Name=re.findall('[ \w-]+\..*',Directory)
        loginEmail = self.strng.get_screen('loginscreen').ids.login_email.text
        storage.child(str(f"{loginEmail}")).child(str(f"{Name}")).put(str(f"{Name[0]}"))

        if selection==True:
            self.root.ids.vid.source=firebase
            self.strng.get_screen('mainscreen').ids.upload.disabled=True
            self.strng.get_screen('mainscreen').manager.current ='uploadscreen'
            self.strng.get_screen('mainscreen').manager.transition.direction='left'            

        if selection==False:
            self.strng.get_screen('mainscreen').ids.upload.disabled=False



if __name__ == '__main__':
    goodApp().run()

Logs

[INFO ] [Logger ] Record log in C:\Users\Dheeraj.kivy\logs\kivy_22-01-19_1.txt
[INFO ] [deps ] Successfully imported "kivy_deps.angle" 0.3.0
[INFO ] [deps ] Successfully imported "kivy_deps.glew" 0.3.0
[INFO ] [deps ] Successfully imported "kivy_deps.sdl2" 0.3.1
[INFO ] [Kivy ] v2.0.0
[INFO ] [Kivy ] Installed at "C:\Users\Dheeraj\AppData\Roaming\Python\Python38\site-packages\kivy_init_.py"
[INFO ] [Python ] v3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
[INFO ] [Python ] Interpreter at "C:\Users\Dheeraj\AppData\Local\Programs\Python\Python38\pythonw.exe"
[INFO ] [Factory ] 186 symbols loaded
[INFO ] [KivyMD ] 0.104.2, git-bc7d1f5, 2021-06-06 (installed at "C:\Users\Dheeraj\AppData\Local\Programs\Python\Python38\lib\site-packages\kivymd_init_.py")
[INFO ] [Image ] Providers: img_tex, img_dds, img_sdl2, img_pil (img_ffpyplayer ignored)
[INFO ] [Text ] Provider: sdl2
[INFO ] [Window ] Provider: sdl2
[INFO ] [GL ] Using the "OpenGL" graphics system
[INFO ] [GL ] GLEW initialization succeeded
[INFO ] [GL ] Backend used
[INFO ] [GL ] OpenGL version <b'2.1 Mesa 10.0.2 (git-675cd84)'>
[INFO ] [GL ] OpenGL vendor <b'VMware, Inc.'>
[INFO ] [GL ] OpenGL renderer <b'Gallium 0.4 on llvmpipe (LLVM 3.4, 128 bits)'>
[INFO ] [GL ] OpenGL parsed version: 2, 1
[INFO ] [GL ] Shading version <b'1.30'>
[INFO ] [GL ] Texture max size <8192>
[INFO ] [GL ] Texture max units <16>
[INFO ] [Window ] auto add sdl2 input provider
[INFO ] [Window ] virtual keyboard not allowed, single mode, not docked
[INFO ] [KivMob ] init called.
[WARNING] [KivMob ] Ads will not be shown.
[INFO ] [GL ] NPOT texture support is available
[INFO ] [Video ] Provider: null(['video_ffmpeg', 'video_ffpyplayer'] ignored)
[INFO ] [Base ] Start application main loop
Traceback (most recent call last):
File "C:\Users\Dheeraj\AppData\Roaming\Python\Python38\site-packages\plyer\platforms\win\filechooser.py", line 102, in run
self.fname, _, _ = win32gui.GetOpenFileNameW(**args)
pywintypes.error: (0, 'GetOpenFileNameW', 'No error message is available')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Dheeraj\AppData\Roaming\Python\Python38\site-packages\plyer\platforms\win\filechooser.py", line 108, in run
self._handle_selection(self.selection)
File "C:\Users\Dheeraj\Desktop\kivy codes\Dheeraj.py", line 949, in selected
Directory=selection[0]
IndexError: list index out of range
[INFO ] [Base ] Leaving application in progress...

Screenshots

Add images to explain us this bug. Paste urls here.

Remove this section if no images here

Versions

  • OS: win 7 pro
  • Python: 3.7
  • Kivy: latest
  • KivyMD: latest (update everyweek)

爬取公众号文章出现错误

作者你好,按照你给的代码,修改参数之后,在 VS Code 中运行提示错误,请问是哪里出问题了

Traceback (most recent call last):
File "c:\Users\安哥拉\Desktop\vscode_pytest.py", line 78, in
get_list_data(0)
File "c:\Users\安哥拉\Desktop\vscode_pytest.py", line 54, in get_list_data
can_msg_continue = data['can_msg_continue']
KeyError: 'can_msg_continue'

第二个爬虫程序报错

错误为:
Traceback (most recent call last):
File "D:/coding/Python/PyCharm/test1/test2.py", line 127, in
main(i)
File "D:/coding/Python/PyCharm/test1/test2.py", line 119, in main
soup = BeautifulSoup(html, 'lxml')
File "C:\Programs\Python\Python38-32\lib\site-packages\bs4_init_.py", line 287, in init
elif len(markup) <= 256 and (
TypeError: object of type 'NoneType' has no len()

公众号爬虫手机下滑fiddler捕捉不到相关消息

非常感谢大佬分享爬虫相关代码。
如图所示,我认为我的手机和电脑已经连在同一个局域网中了:
image
image
我把相关微信公众号从头到尾滑了2遍,结果如下图:
image
能帮我看看是啥原因吗?
十分感谢!

翻墙 ip代理

用您说的那个ip池的ip,再挂上本地的v2ray翻墙代理,那最终走的代理会是什么?如果ip被封了,是会封ip池的ip还是翻墙的ip鸭,谢谢!

json教程的微信例子无法复现

python爬虫12的json教程中举的微信的例子已经不能用了,登陆网页版微信会提示出于安全,微信账号已经不能通过网页版登陆。
建议作者可以考虑更换一个例子?或者也许有这个问题的解决方法?
感谢作者!(`・ω・´)

当当top500

为什么python一直运行中但是没结果

文章链接失效

大部分的链接都显示【账号已经迁移】或【文章已被发布者删除】,大佬们能修复一下吗?

多进程+豆瓣top250那里有人遇到生成空的xsl文件吗

image
image
第一个是不适用多进程拉的,打开都是正常的
第二个就是用的多进程,发现文件内容没有了,而且看了一下打印的列表,那些列表也不是正常排序的(虽然最后那部分有的正常排序了),而且那个n值也没有正常自增,是乱序的,只到了100还有重复的值
请问各位道友又遇到这种问题吗?不胜感激!!!

关于的二个豆瓣电影问题报错问题的解决方案

`
from fake_useragent import UserAgent

ua = UserAgent()
def request_douban(url):
try:
headers = {'User-Agent': ua.chrome}
response = requests.get(url,headers=headers)
`
加一个headers=headers请求头就行了,
加上加粗的部分,其他代码不变

字体加密

映射到坐标系, 字体文件与解析的模板文件的 坐标的余弦相似度

**获取到搜索的input框后需要先.click()然后再.send_keys()**

获取到搜索的input框后需要先.click()然后再.send_keys()

`def search():
try:
print('start visit bilibili...')
browser.get('https://www.bilibili.com/')

    search_input = WAIT.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#nav-searchform > div.nav-search-content > input")))
    search_input.click()
    search_input.send_keys('蔡徐坤篮球')
    search_submit = WAIT.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="nav-searchform"]/div[2]')))
    search_submit.click()
    print('jump to new window')
    all_h = browser.window_handles
    browser.switch_to.window(all_h[1])
except TimeoutException:
    return search()`

Originally posted by @ls-6414 in #6 (comment)

爬取 20w 表情包

无法爬取所有页面的表情包,下载几百个表情包后程序停止。代码用的是博主的源代码,爬取的页码为1-200页。已加请求头

1

哥,现在在fiddler中抓取手机APP的包,按照教程,手机连不上网,证书也下载了,防火墙也关闭了,如何解决

博主老司机无疑了

本来文章看的好好的,有几个网站打开了下,大家懂的,不过从安全和知名度角度考虑,建议博主还是别在公开场合开车的好

有错误

发生异常: TypeError
object of type 'NoneType' has no len()
File "C:\Users\tkomg\py\doubanTop250.py", line 40, in get_page_urls
soup = BeautifulSoup(html, 'lxml')
File "C:\Users\tkomg\py\doubanTop250.py", line 91, in
list_page_urls = get_page_urls()

有错误

error

有没有遇到下面这个错误的,求指教个方向:
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='127.0.0.1', port=443): Max retries exceeded with url: /?cdn=nohost (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3f46c5bba8>: Failed to establish a new connection: [Errno 111] Connection refused',))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.