Coder Social home page Coder Social logo

dataintegrationalliance / data_integration_celery Goto Github PK

View Code? Open in Web Editor NEW
282.0 17.0 120.0 1.29 MB

通过celery定期执行更相关任务,将万得wind,同花顺ifind,东方财富choice、Tushrae、JQDataSDK、pytdx、CMC等数据终端的数据进行整合,清洗,一致化,供其他系统数据分析使用

License: GNU General Public License v3.0

Python 99.38% Batchfile 0.05% Shell 0.08% Jupyter Notebook 0.49%

data_integration_celery's Introduction

data_integration_celery

Build Status GitHub issues GitHub forks GitHub stars GitHub license

通过celery定期执行更相关任务,将万得wind,同花顺ifind,东方财富choice等数据终端的数据进行整合,清洗,一致化,供其他系统数据分析使用

为了满足不同环境需要,也支持不使用celery,直接运行

目录:

一、环境依赖及安装配置

  • windows,ubuntu均可
  • rabbitmq
  • python 3.6 及相关包
  • mysql 5.7

如果需要下载 windy、ifind、choice 等数据,需要安装对应的组建

为了支持独立运行在windows环境下,celery 的 broker 选择 rabbitmq 而非 redis(仅支持linux)

二、首次运行前环境配置

项目全部的配置信息存放在 ./tasks/config.py 文件中 包括:

  • Celery 配置信息
  • 数据库配置信息
  • IFind、Wind、Tushare、JQDataSDK、CMC等用户名密码配置信息
  • 是否支持 SQLite导出功能及路经配置信息
  • 日志输出格式及级别配置信息

三、RabbitMQ 系统配置

下载页面 windows安装版地址
RabbitMQ:下载地址 rabbitmq-server-3.7.14.exe
RabbitMQ 运行依赖于 Erlang:下载地址 OTP 21.3 Windows 64-bit Binary File (92618042)

Ubuntu 系统可直接 apt 方式安装

1. 用户创建及权限配置

创建用户,host,及访问权限

rabbitmqctl add_user mg ****
rabbitmqctl set_user_tags mg broker_backend

rabbitmqctl add_vhost celery_tasks
rabbitmqctl set_permissions -p celery_tasks mg ".*" ".*" ".*"

rabbitmqctl add_vhost backend
rabbitmqctl set_permissions -p backend mg ".*" ".*" ".*"

2. 启动web端

rabbitmq-plugins enable rabbitmq_management

RabbitMQ 管理界面

四、Window CMD 启动

以下命令均才 data_integration_celery 根目录下运行

scripts\run.bat

输出界面如下:

run: Active Env[A] Worker[W] Beat[B] Local Tasks[L] Cancel[C] [A,W,B,L,C]?

A:激活虚拟环境(如果有的话)
W:启动 worker
B:启动 beat
L:启动本地运行
C:退出

其中,“L:启动本地运行”,将启动python程序进入选择界面,相关运行代码在 tasks_init_.py 文件 main() 方法中。 如果有进一步希望进行自己的定制,可以根据需要,增加 func_list 中的函数列表

五、celery 启动方法

以下命令均才 data_integration_celery 根目录下运行

1. 启动 worker

celery -A tasks worker --loglevel=debug -c 2 -P eventlet

-P 命令只要是为了在win10 下可以正常运行 详见 issue,其他环境下可以去除
-P, --pool Pool implementation: prefork (default), eventlet, gevent or solo.
-c 命令后面的数字表示平行运行的 worker 数量,建议不要超过CPU核数
-l, --loglevel Logging level, choose between DEBUG, INFO, WARNING, ERROR, CRITICAL, or FATAL.

2. 启动 beat

celery beat -A tasks

CeleryConfig 中的定时任务将通过 beat 自动启动

3. Schedules Configuration

推荐配置

from celery.schedules import crontab


class CeleryConfig:
    # Celery settings
    broker_url = 'amqp://mg:***@localhost:5672/celery_tasks',
    result_backend = 'amqp://mg:***@localhost:5672/backend'
    accept_content = ['json']  # , 'pickle'
    timezone = 'Asia/Shanghai'
    imports = ('tasks',)
    beat_schedule = {
        'daily_task': {
            'task': 'tasks.grouped_task_daily',
            'schedule': crontab(hour='16', minute='03', day_of_week='1-5'),
        },
        'weekly_task': {
            'task': 'tasks.grouped_task_weekly',
            'schedule': crontab(hour='10', day_of_week='6'),
        },
    }
    broker_heartbeat = 0

具体命令及执行时间可根据需要进行配置

六、MySQL 配置方法

MySQL 5.7 下载地址
Ubuntu 系统可直接 apt 方式安装

  1. Ubuntu 18.04 环境下安装 MySQL,5.7

    sudo apt install mysql-server
  2. 默认情况下,没有输入用户名密码的地方,因此,安装完后需要手动重置Root密码,方法如下:

    cd /etc/mysql/debian.cnf
    sudo more debian.cnf

    出现类似这样的东西

    # Automatically generated for Debian scripts. DO NOT TOUCH!
    [client]
    host     = localhost
    user     = debian-sys-maint
    password = j1bsABuuDRGKCV5s
    socket   = /var/run/mysqld/mysqld.sock
    [mysql_upgrade]
    host     = localhost
    user     = debian-sys-maint
    password = j1bsABuuDRGKCV5s
    socket   = /var/run/mysqld/mysqld.sock

    以debian-sys-maint为用户名登录,密码就是debian.cnf里那个 password = 后面的东西。 使用mysql -u debian-sys-maint -p 进行登录。 进入mysql之后修改MySQL的密码,具体的操作如下用命令:

    use mysql;
    
    update user set authentication_string=PASSWORD("Dcba4321") where user='root';
    
    update user set plugin="mysql_native_password"; 
     
    flush privileges;
  3. 然后就可以用过root用户登陆了

    mysql -uroot -p
  4. 创建用户 mg 默认密码 Abcd1234

    CREATE USER 'mg'@'%' IDENTIFIED BY 'Abcd1234';
  5. 创建数据库 md_integration

    CREATE DATABASE `md_integration` default charset utf8 collate utf8_general_ci;
  6. 授权

    grant all privileges on md_integration.* to 'mg'@'localhost' identified by 'Abcd1234'; 
    
    flush privileges; #刷新系统权限表

七、MySQL 参数调整

部分数据过大可能导致数据库连接、存储过程中失效或者速度缓慢,建议调整一下参数:

参数 数值 描述
max_allowed_packet 500M 配置MySQL允许的最大数据包大小,解决 Lost connection to MySQL server during query 问题
wait_timeout 172800 解决 MySQL server has gone away 的问题
interactive_timeout 172800 解决 MySQL server has gone away 的问题
innodb_buffer_pool_size 1024M Innodb_buffer_pool_pages_data / Innodb_buffer_pool_pages_total * 100%, 建议使用物理内存的75%
sort_buffer_size 4M 解决 Sort aborted: Out of sort memory, consider increasing server sort buffer size。默认只有 256K。On Linux, there are thresholds of 256KB and 2MB where larger values may significantly slow down memory allocation

参数调整方法:

  1. 修改 my.cnf 文件

  2. 执行 sql命令

    set global max_allowed_packet=500*1024*1024;
  3. 快速查询数据库中各个表的数据条数命令

    select a.TABLE_NAME,a.TABLE_ROWS from information_schema.`TABLES` a WHERE a.TABLE_SCHEMA='md_integration' ORDER BY a.TABLE_ROWS desc

data_integration_celery's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data_integration_celery's Issues

stock.py 出错

sqlalchemy.exc.ProgrammingError: (mysql_exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '{table_name} (wind_code, trade_code, sec_name, ipo_date, delist_date, mkt, exch' at line 1") [SQL: 'REPLACE INTO {table_name} (wind_code, trade_code, sec_name, ipo_date, delist_date, mkt, exch_city, exch_eng, prename) values (%s, %s, %s, %s, %s, %s, %s, %s, %s)'] [parameters: (('000686.SZ', '000686', '东北证券', '1997-02-27', None, '主板', '深圳', 'SZSE', '锦州六陆;S锦六陆'), ('002293.SZ', '002293', '罗莱生活', '2009-09-10', None, '中小企业板', '深圳', 'SZSE', '罗莱家纺'), ('002171.SZ', '002171', '楚江新材', '2007-09-21', None, '中小企业板', '深圳', 'SZSE', '精诚铜业'), ('603233.SH', '603233', '大参林', '2017-07-31', None, '主板', '上海', 'SSE', None), ('300739.SZ', '300739', '明阳电路', '2018-02-01', None, '创业板', '深圳', 'SZSE', None), ('603056.SH', '603056', '德邦股份', '2018-01-16', None, '主板', '上海', 'SSE', None), ('000807.SZ', '000807', '云铝股份', '1998-04-08', None, '主板', '深圳', 'SZSE', '云铝股份;G云铝'), ('002286.SZ', '002286', '保龄宝', '2009-08-28', None, '中小企业板', '深圳', 'SZSE', None) ... displaying 10 of 3536 total bound parameter sets ... ('603889.SH', '603889', '新澳股份', '2014-12-31', None, '主板', '上海', 'SSE', None), ('300290.SZ', '300290', '荣科科技', '2012-02-16', None, '创业板', '深圳', 'SZSE', None))] (Background on this error at: http://sqlalche.me/e/f405)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.