Coder Social home page Coder Social logo

duyanlong / datax Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thestyleofme/datax

0.0 1.0 0.0 13.75 MB

DataX分布式集群化、自定义DataX插件、源码修改任务监控以及脏数据存表Hook

License: Other

Java 96.82% Python 3.08% Shell 0.11%

datax's Introduction

Datax-logo

DataX分布式集群化:

  • 基于netty改造,使用zookeeper作为注册中心
      使用netty作为http服务器,可自行提供url处理,类似spring mvc
      datax启动后注册到zookeeper,下游服务可从zookeeper中获取datax集群,提供负载等。
    
  • 提供rest api的方式去提交datax job
      POST http://ip:port/datax/job
    
  • 可配合datax-admin 服务进行使用
      datax任务创建/datax-admin统一执行datax/负载均衡等
    

基于DataX自定义插件,已自定义插件:

  •   可以对ots的字段进行加密解密操作
    
  •   由于hdfsreader插件是基于hdfs上的文件,不能自定义sql,因此开发了hdfsplusreader插件,
      通过Shell执行自定义Hive查询SQL,写入临时表(ORC),再将临时表数据给到DataX,最后删除。
    
  •   在hdfswriter基础上,做了增强处理: 
      1. 增加preSql,postSql,跟mysqlWriter中的preSql,postSql一样
      2. 增加delimsReplacement,dropImportDelims,对字段中的\n、\r以及\01处理,跟sqoop一样
    
  •   在elasticsearchwriter基础上,做了增强处理: 
      1. 增加清空目标索引数据的功能
    

基于DataX自定义Hook,进行DataX job的监控以及数据一致性保证:

  •   开发azkaban datax插件去调度datax任务,对datax监控信息,
      如同步的日志有用信息提取以及数据一致性(脏数据收集),进行存表操作
    

DataX

DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。

Features

DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。

DataX详细介绍

请参考:DataX-Introduction

Quick Start

请点击:Quick Start

我要开发新的插件

请点击:DataX插件开发宝典

datax's People

Contributors

trafalgarluo avatar binaryworld avatar mr-kidbk avatar thestyleofme avatar heljoyliu avatar asdf2014 avatar wuchase avatar cch1996 avatar sufism avatar lazlaz avatar kevinwangcs avatar wenshao avatar datagic avatar zhouzf05 avatar woaer avatar wanda1416 avatar tofuhero avatar terryliu1994 avatar randomgear avatar lw309637554 avatar ryan-mei avatar duyong6380 avatar coderxiao avatar galthen avatar c-agam avatar xudaojie avatar willian-zhang avatar hanhuimin001 avatar boyizhang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.