Coder Social home page Coder Social logo

zml1 / dataxserver Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tianlangstudio/dataxserver

0.0 1.0 0.0 1.16 MB

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能

License: Apache License 2.0

Java 44.69% Scala 54.78% Thrift 0.52%

dataxserver's Introduction

DataX Server

DataX 提供远程调用(Thrift Server, Http Server)分布式运行(DataX On YARN)功能

Feature

    1. Thrift Server
    1. DataX on Yarn
    1. Http Server
    1. 单机多线程方式运行
    1. 单机多进程方式运行
    1. 分布式运行(On Yarn)
    1. 混合模式运行(Yarn+多进程模式运行)
    1. 自动伸缩

TODO

  • 1.Http Server
  • 2.代码重构
  • 3.按照功能类型拆分到多个子项目中 重新组织包名 方便后续新增功能
  • 4.完善文档示例

Deploy

  下载发布包DataXServer-0.0.1.tar.gz 并解压 进入 0.0.1 目录

  启动Thrift Server

./bin/startThriftServer.sh     

使用NodeJS提交测试任务到Thrift Server

  cd example/nodejs    
  node submitStream2Stream.js 

 

Develop

下载程序源码

 项目依赖阿里 DataX

git clone https://github.com/alibaba/DataX.git 
cd DataX    
mvn install

git clone https://github.com/TianLangStudio/DataXServer.git  
cd DataXServer  
mvn clean compile install -DskipTests

单机多线程模式运行http server (已部署好datax 且能正常运行job/test_job.json)

  • 配置DataX安装目录

修改pom.xml中的datax-home配置项为部署datax的地址

 <datax-home>/data/test/datax</datax-home>
  • 启动http server
 cd httpserver
 mvn scala:run -Dlauncher=httpserver -DskipTests
  • 提交任务 获取任务ID
curl -XPOST -d "@测试文件路径" 127.0.0.1:9808/dataxserver/task

tianlang@tianlang:job$ curl -XPOST -d "@job/test_job.json" 127.0.0.1:9808/dataxserver/task
0 (任务ID)

  • 获取任务执行状态结果耗时
curl  127.0.0.1:9808/dataxserver/task/status/0
curl  127.0.0.1:9808/dataxserver/task/0
curl  127.0.0.1:9808/dataxserver/task/cost/0

运行成功日志

单机多进程模式运行

  • 配置DataX安装目录
    同多线程模式
  • 启动server
  cd hamal-yarn
  mvn scala:run -Dlauncher=httpserver-mp -DskipTests
  • 提交运行任务同多线程模式

多机多进程模式运行(On Yarn)

  • 配置DataX 安装目录 修改hamal-yarn/src/main/resources/master.conf 里的datax.home配置项的值为 DataX安装目录
  • 打包
cd hamal-yarn
mvn clean package -DskipTests
  • 上传jar包到hdfs 将hamal-yarn/target/hamal-yarn--with-dependencies.jar上传到hdfs /app/hamal/master.jar 将hamal-yarn/target/hamal-yarn--package.zip上传到hdfs /app/hamal/executor.zip
hdfs dfs -put hamal-yarn-*-with-dependencies.jar /app/hamal/master.jar
hdfs dfs -put hamal-yarn-*-package.zip /app/hamal/executor.zip
  • 运行Master
yarn jar hamal-yarn-*_with-dependencies.jar  org.tianlangstudio.data.hamal.yarn.Client /app/hamal/master.jar

可以通过yarn ui看到运行的Master

  • 提交运行任务同多线程模式

提交任务后可看到, container数量增加, master运行日志中可看到当前executor数量 ,在master.conf文件中可以配置最大executor数量,可以将local.num.max设置为不为0的值即代表可以在本机启动executor. executor空闲一段时间后自动销毁。

On Yarn Hamal Master On Yarn Log

如用在生产环境建议修改ID生成策略,提交任务存储方式等  

QA

  • 编译失败

检查是否是依赖包下载失败,可以将依赖包安装到本机
可以尝试注释掉pom文件中recompileMode配置

  • 是否集群中每台机器都要安装datax

不需要每台机器都安装datax,可以把datax打包到excutor的部署zip包中,放到hdfs上

  • Excutor和Master是通过http还是thrift通信?

Excutor和Master的通信是基于akka实现的

  • Excutor的个数会随着任务个数增减?

是的,但不会大于配置的最大Excutor个数

Document

TODO

问题交流可加群

QQ群:579896894

KeepLearning QQ

dataxserver's People

Contributors

tianlangstudio avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.