Coder Social home page Coder Social logo

boris-code / feaplat Goto Github PK

View Code? Open in Web Editor NEW
97.0 6.0 26.0 56 KB

爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本

Home Page: https://feapder.com/#/feapder_platform/feaplat

spider crawler feapder feaplat

feaplat's Introduction

爬虫管理系统 - FEAPLAT

生而为虫,不止于虫

feaplat命名源于 feapder 与 platform 的缩写

读音: [ˈfiːplæt]

特性

  1. 支持任何python脚本,包括不限于feapderscrapy
  2. 支持浏览器渲染,支持有头模式。浏览器支持playwrightselenium
  3. 支持部署服务,可自动负载均衡
  4. 支持服务器集群管理
  5. 支持监控,监控内容可自定义
  6. 支持起多个实例,如分布式爬虫场景
  7. 支持弹性伸缩
  8. 支持4种定时启动方式
  9. 支持自定义worker镜像,如自定义java的运行环境、机器学习环境等,即根据自己的需求自定义(feaplat分为master-调度端worker-运行任务端
  10. docker一键部署,架设在docker swarm集群上

为什么用feaplat爬虫管理系统

市面上的爬虫管理系统

feapderd

worker节点常驻,且运行多个任务,不能弹性伸缩,任务之前会相互影响,稳定性得不到保障

feaplat爬虫管理系统

pic

worker节点根据任务动态生成,一个worker只运行一个任务实例,任务做完worker销毁,稳定性高;多个服务器间自动均衡分配,弹性伸缩

功能概览

1. 项目管理

添加/编辑项目 -w1785

2. 任务管理

3. 任务实例

日志

4. 爬虫监控

feaplat支持对feapder爬虫的运行情况进行监控,除了数据监控和请求监控外,用户还可自定义监控内容,详情参考自定义监控

若scrapy爬虫或其他python脚本使用监控功能,也可通过自定义监控的功能来支持,详情参考自定义监控

注:需 feapder>=1.6.6

部署

下面部署以centos为例, 其他平台docker安装方式可参考docker官方文档:https://docs.docker.com/compose/install/

1. 安装docker

删除旧版本(可选,需要重装升级时执行)

yum remove docker  docker-common docker-selinux docker-engine

安装:

yum install -y yum-utils device-mapper-persistent-data lvm2 && python2 /usr/bin/yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo && yum install docker-ce -y

国内用户推荐使用

yum install -y yum-utils device-mapper-persistent-data lvm2 && python2 /usr/bin/yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo && yum install docker-ce -y

启动

systemctl enable docker
systemctl start docker

2. 安装 docker swarm

docker swarm init

# 如果你的 Docker 主机有多个网卡,拥有多个 IP,必须使用 --advertise-addr 指定 IP
docker swarm init --advertise-addr 192.168.99.100

3. 安装docker-compose

sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

国内用户推荐使用

sudo curl -L "https://get.daocloud.io/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

4. 部署feaplat爬虫管理系统

预备项

安装git(1.8.3的版本已够用)

yum -y install git

1. 下载项目

先按照下面命令拉取develop分支代码运行。 master分支不支持urllib3>=2.0版本,现在已经运行不起来了,但之前老用户不受影响。待后续测试好兼容性,不影响老用户后,会将develop分支合并到master

gitub

git clone -b develop https://github.com/Boris-code/feaplat.git

gitee

git clone -b develop https://gitee.com/Boris-code/feaplat.git

2. 运行

首次运行需拉取镜像,时间比较久,且运行可能会报错,再次运行下就好了

cd feaplat
docker-compose up -d
  • 若端口冲突,可修改.env文件,参考常见问题

3. 访问爬虫管理系统

默认地址:http://localhost 默认账密:admin / admin

4. 停止(可选)

docker-compose stop

5. 添加服务器(可选)

用于搭建集群,扩展爬虫(worker)节点服务器

1. 安装docker

参考部署步骤1

2. 部署

在master服务器(feaplat爬虫管理系统所在服务器)执行下面命令,查看token

docker swarm join-token worker

在需扩充的服务器上执行

docker swarm join --token [token] [ip]

这条命令用于将该台服务器加入集群节点

3. 验证是否成功

在master服务器(feaplat爬虫管理系统所在服务器)执行下面命令

docker node ls

若打印结果包含刚加入的服务器,则添加服务器成功

4. 下线服务器(可选)

在需要下线的服务器上执行

docker swarm leave

拉取私有项目

拉取私有项目需在git仓库里添加如下公钥

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCd/k/tjbcMislEunjtYQNXxz5tgEDc/fSvuLHBNUX4PtfmMQ07TuUX2XJIIzLRPaqv3nsMn3+QZrV0xQd545FG1Cq83JJB98ATTW7k5Q0eaWXkvThdFeG5+n85KeVV2W4BpdHHNZ5h9RxBUmVZPpAZacdC6OUSBYTyCblPfX9DvjOk+KfwAZVwpJSkv4YduwoR3DNfXrmK5P+wrYW9z/VHUf0hcfWEnsrrHktCKgohZn9Fe8uS3B5wTNd9GgVrLGRk85ag+CChoqg80DjgFt/IhzMCArqwLyMn7rGG4Iu2Ie0TcdMc0TlRxoBhqrfKkN83cfQ3gDf41tZwp67uM9ZN [email protected]

或在系统设置页面配置您的SSH私钥,然后在git仓库里添加您的公钥,例如:

注意,公私钥加密方式为RSA,其他的可能会有问题

生成RSA公私钥方式如下:

ssh-keygen -t rsa -C "备注" -f 生成路径/文件名

如: ssh-keygen -t rsa -C "feaplat" -f id_rsa 然后一路回车,不要输密码 最终生成 id_rsaid_rsa.pub 文件,复制id_rsa.pub文件内容到git仓库,复制id_rsa文件内容到feaplat爬虫管理系统

自定义爬虫镜像

默认的爬虫镜像只打包了feapderscrapy框架,若需要其它环境,可基于.env文件里的SPIDER_IMAGE镜像自行构建

如将常用的python库打包到镜像

FROM registry.cn-hangzhou.aliyuncs.com/feapderd/feapder:[最新版本号]

# 安装依赖
RUN pip3 install feapder \
    && pip3 install scrapy

自己随便搞事情,搞完修改下 .env文件里的 SPIDER_IMAGE 的值即可

价格

类型 价格 说明
试用版 0元 可部署5个任务,删除任务不可恢复额度
正式版 288元 有效期一年,可换绑服务器

部署后默认为试用版,购买授权码后配置到系统里即为正式版

购买方式:添加微信 boris_tm

随着功能的完善,价格会逐步调整

学习交流

知识星球:17321694 作者微信: boris_tm QQ群号:750614606

加好友备注:feaplat

feaplat's People

Contributors

boris-code avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

feaplat's Issues

执行任务后,查看实例异常

启动任务,查看实例时异常
查看容器feapder_backend的执行日志,发现后端执行错误

INFO:     10.0.1.8:34674 - "GET /feapder/task/get_all_tag?project_id=2 HTTP/1.1" 200 OK
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/python3/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/python3/lib/python3.6/site-packages/fastapi/applications.py", line 201, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/middleware/cors.py", line 86, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/middleware/cors.py", line 142, in simple_response
    await self.app(scope, receive, send)
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/usr/local/python3/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/usr/local/python3/lib/python3.6/site-packages/fastapi/routing.py", line 202, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/usr/local/python3/lib/python3.6/site-packages/fastapi/routing.py", line 148, in run_endpoint_function
    return await dependant.call(**values)
  File "<frozen views.task_instance_view>", line 94, in get_task_instance
  File "task_instance.py", line 40, in core.task_instance.get_task_instance
AttributeError: 'NoneType' object has no attribute 'get'
INFO:     10.0.1.8:34678 - "POST /feapder/task_instance/get_task_instance HTTP/1.1" 500 Internal Server Error

容器之间的网络问题

在成功构建镜像后,启动时报错

Error response from daemon: Could not attach to network feaplat: rpc error: code = PermissionDenied desc = network feaplat not manually attachable

需要先初始化swarm

  1. docker swarm init
  2. docker network create --driver overlay --attachable feaplat 解决方式参考
  3. 如果你还是不行:就先强制清空 docker swarm leave -f 然后重复1,2

feaplat运行异常退出时的状态码问题

image
使用TaskSpider在多节点部署,feaplat 运行时异常退出,状态码 9,日志内容如上图所示,请问状态码9对应的是什么错误呢?请问状态码对应的具体错误原因在哪里能看到呢?文档和源码中都没有找到,谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.