Coder Social home page Coder Social logo

weiye-jing / datax-web Goto Github PK

View Code? Open in Web Editor NEW
5.5K 5.5K 2.1K 42.65 MB

DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。

Home Page: https://segmentfault.com/u/weiye_jing/articles

License: MIT License

Java 95.57% HTML 0.59% CSS 0.02% Shell 3.83%

datax-web's People

Contributors

alecor-sudo avatar binaryworld avatar fantasticke avatar kyofin avatar leigao-dev avatar liukunyuan avatar lw309637554 avatar shijieqin avatar sqdf1990 avatar sufism avatar waterwang avatar weiye-jing avatar wuchase avatar zhouhongfa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datax-web's Issues

where及postsql参数

同步过程中使用where、postsql参数也比较多,建议增加where、postsql参数的选项框

编辑用户逻辑是否需要修改?

编辑用户逻辑是否需要修改?
1,“密码”显示的是密文,使用的BCryptPasswordEncoder是无法解密的。要显示明文需要更换加密方法
2,或者逻辑修改为“密码”置空,让用户填入最新明文密码

executor服务器资源使用问题

可以考虑一下自己写一下rpc的网关和负载均衡,如果没有将单个datax作业的executor拆分到多台执行服务器上的话,需要监控一个executor服务器上剩下的内存,不然机器上正在运行的作业会oom, 可以考虑一下把xxl的rpc改成蚂蚁开源的sofa-rpc, 然后自定义网关和负载均衡,作业推到执行器上,内存到达阀值就不再推了,延后

获取用户信息接口

用户管理,点击“编辑”应该是前台根据userId从后台查询user信息。现在好像是前台直接从pageList中获取用户信息

datax转换的问题

官方文档:dx_groovy只能调用一次。不能多次调用。([https://github.com/alibaba/DataX/blob/master/transformer/doc/transformer.md])
请问这个只能调用一次,当调度执行的时候怎么办?你是怎么解决的?

json构建第四步选择模板优化

目前的做法是点击 “选择模板(操作步骤:构建->选择模板->下一步)”这段文字,希望通过按钮加说明来提示用户

执行非DataX任务后端报空指针异常

JobInfo(id=7, jobGroup=1, jobCron=0 0 2 1/1 * ? , jobDesc=hive-sql-test, addTime=Wed Mar 25 14:30:24 CST 2020, updateTime=Wed Mar 25 14:54:49 CST 2020, author=qijun, alarmEmail=, executorRouteStrategy=ROUND, executorHandler=, executorParam=, executorBlockStrategy=SERIAL_EXECUTION, executorTimeout=0, executorFailRetryCount=0, glueType=GLUE_SHELL, glueSource=yesterday=date -d '-1 day' '+%Y-%m-%d'
echo $yesterday, glueRemark=null, glueUpdatetime=Wed Mar 25 14:54:49 CST 2020, childJobId=, triggerStatus=0, triggerLastTime=0, triggerNextTime=0, jobJson=null, replaceParam=, jvmParam=, incStartTime=null, partitionInfo=, lastHandleCode=0) 根据jobid 获取job信息 发现字段jobJson内容为空

执行器启动报错Unparseable date: "callbacklog"

java.text.ParseException: Unparseable date: "callbacklog"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)
13:21:47.780 exe [datax-web, executor JobLogFileCleanThread] ERROR c.w.d.c.t.JobLogFileCleanThread - Unparseable date: "datax-executor.log"
java.text.ParseException: Unparseable date: "datax-executor.log"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)
13:21:47.781 exe [datax-web, executor JobLogFileCleanThread] ERROR c.w.d.c.t.JobLogFileCleanThread - Unparseable date: "gluesource"
java.text.ParseException: Unparseable date: "gluesource"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)
13:21:47.781 exe [datax-web, executor JobLogFileCleanThread] ERROR c.w.d.c.t.JobLogFileCleanThread - Unparseable date: "processcallbacklog"
java.text.ParseException: Unparseable date: "processcallbacklog"
at java.text.DateFormat.parse(DateFormat.java:366)
at com.wugui.datatx.core.thread.JobLogFileCleanThread.lambda$start$0(JobLogFileCleanThread.java:60)
at java.lang.Thread.run(Thread.java:748)

前端form表单不能缓存数据

比如我先点击"datax-json构建这个模块"
然后点击其他模块
然后再点击"datax-json"模块,之前已经配置的进度就会消失,这样是不合理的

增加oracle数据库配置临活性

例如:增量使用的专用用户有另外一个用户的权限,数据同步的过程中控制使用专用用户,在界面上能够选取指定用户下有权限的表

偶现报错:XxlRpcException: xxl-rpc, request timeout at

出现频率:偶现(低于1%)
邮件报错信息:
msg:com.wugui.datax.rpc.util.XxlRpcException: xxl-rpc, request timeout at:1585273504252, request:XxlRpcRequest{requestId='761c6e38-1401-4a1c-b345-01ca66684054', createMillisTime=1585273501247, accessToken='', className='com.wugui.datatx.core.biz.ExecutorBiz', methodName='run', parameterTypes=[class com.wugui.datatx.core.biz.model.TriggerParam], parameters=[TriggerParam{jobId=152, executorHandler='executorJobHandler', executorParams='', executorBlockStrategy='DISCARD_LATER', executorTimeout=0, logId=2551329, logDateTime=1585273501000, glueType='BEAN', glueSource='null', glueUpdatetime=1582114646000, broadcastIndex=0, broadcastTotal=1, jobJson={ "job": { "setting": { "speed": { "channel": 16 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "ods_readonly", "password": "IEOIXNo3QCKDbI4n1VYdYpSlGqGfmR3d", "connection": [ { "jdbcUrl": [ "jdbc:mysql://rm-2zen18ooiq4wsn0rj.mysql.rds.aliyuncs.com:3306/trade_hebei_prod?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8" ], "querySql": [ "select id,created_by_id,created_time,updated_by_id,updated_time,data_flag,goods_47_kind_id,replace_kind_id,hxProductId,medicalAreaId,pack_convert_factor,replace_goods_id,replace_type,stand_spec_mg,superviseId,medicalConfigId,batch_type from p_goods_47_kind_replace_relation where (created_time >= FROM_UNIXTIME(${lastTime}-60) and created_time < FROM_UNIXTIME(${currentTime})) or (updated_time >= FROM_UNIXTIME(${lastTime}-60) and updated_time < FROM_UNIXTIME(${currentTime}))" ] } ] } }, "writer": { "name": "kafkawriter", "parameter": { "ack": "all", "batchSize": 2000000, "bootstrapServers": "192.168.1.246:9092,192.168.1.249:9092,192.168.1.248:9092", "fieldDelimiter": "#", "keySerializer": "org.apache.kafka.common.serialization.StringSerializer", "retries": 0, "transactionalId": "hb_trd_p_goods_47_kind_replace_relation_prod_trans_id", "topic": "hb_trd_p_goods_47_kind_replace_relation_prod_1", "writeType": "csv", "valueSerializer": "org.apache.kafka.common.serialization.StringSerializer", "columns": [ { "name": "id", "type": "Long" } ] } } } ] } }, processId=null, replaceParam=-DlastTime='%s' -DcurrentTime='%s', jvmParam=, startTime=Fri Mar 27 09:30:01 CST 2020, triggerTime=Fri Mar 27 09:45:01 CST 2020}], version='null'} at com.wugui.datax.rpc.remoting.net.params.XxlRpcFutureResponse.get(XxlRpcFutureResponse.java:120) at com.wugui.datax.rpc.remoting.invoker.reference.XxlRpcReferenceBean$1.invoke(XxlRpcReferenceBean.java:242) at com.sun.proxy.$Proxy164.run(Unknown Source) at com.wugui.datax.admin.core.trigger.JobTrigger.runExecutor(JobTrigger.java:204) at com.wugui.datax.admin.core.trigger.JobTrigger.processTrigger(JobTrigger.java:156) at com.wugui.datax.admin.core.trigger.JobTrigger.trigger(JobTrigger.java:73) at com.wugui.datax.admin.core.thread.JobTriggerPoolHelper.lambda$addTrigger$0(JobTriggerPoolHelper.java:88) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

json构建\符号转义异常

在datax json构建这个里面 输入 \t 那最后的 json 会出来 \t 执行任务报错,如果是 输入t 那 最后的json里又没有 ""

元数据mysql账密加密

数据源或者json中储存大量账密,所以元数据库的账密不能直接明文;
加密解密类JasyptUtil

关于增量字段

您好,由于有些数据表中不包含时间戳字段。在做数据增量时,需要根据自增主键进行脚本替换。我准备支持该功能并向仓库提交PR,您觉得这个主意怎么样?可能会涉及到修改一些表结构及相关代码。

增加任务超时配置

增加任务超时配置,kill DataX任务超过配置时间的进程。可以配合重试机制,避免网络问题导致的reader读不到数据。

Wanted: Who is using DataX Web

谁在使用DataX Web

  • 诚挚地感谢每一位持续关注并使用DataX Web的朋友。我们会持续投入,把DataX Web做得更好,让数据集成的社区和生态更加繁荣。

此Issue的出发点

  1. 聆听社区的声音,让DataX Web更好
  2. 吸引更多的伙伴来参与贡献
  3. 更多的了解DataX Web的实际使用场景,以方便下一步的规划

我们期待您能提供

  1. 在此提交一条评论, 评论内容包括:
  2. 您所在公司、学校或组织和首页
  3. 您所在的城市、国家
  4. 您的联系方式: 微博、邮箱、微信 (至少一个)
  5. 您将DataX Web用于哪些业务场景

您可以参考下面的样例来提供您的信息:

* 组织:个人 , https://github.com/WeiYe-Jing/datax-web
* 地点:**苏州
* 联系方式:[email protected]
* 场景:使用DataX做异构数据源同步,项目选择数据源即可构建json极大提升了同步效率。

再次感谢你的参与!!! 您的支持是我们前进的强大动力!!
DataX Web 社区

dataxjson构建的问题

在dataxjson构建第三步需要填写字段映射,但填写的数据在构建时并没有使用上,按照了第一步和第二部的勾选顺序进行了映射。
比如:我在第一步先勾选了(name,pk,age)第二步我勾选了(pk,username,userage)那么第三步我填写映射关系,[
{
"src": {
"name": "name",
"name": "pk",
"name": "age"
},
"des": {
"name": "username",
"name": "pk",
"name": "userage"
}
}
]
生成的json中对应也是错误的。

Specified key was too long; max key length is 767 bytes

job_registry表使用utf8mb4字符集,联合索引i_g_k_v字节数超限制。建议修改建表语句设置registry_keyregistry_value字节大小为191

DROP TABLE IF EXISTS `job_registry`;
CREATE TABLE `job_registry`  (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `registry_group` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `registry_key` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `registry_value` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
  `update_time` datetime(0) NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  INDEX `i_g_k_v`(`registry_group`, `registry_key`, `registry_value`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 26 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

read 插件不支持persql和postsql吗

eg.需求: 取数据前,对源库上table更改状态字段值,做完同步后,再改状态值。
如下步骤:

1.update @table set status=1 where status=0 --源表
2.执行select * from @table where status=1 -- insert目的表
3.update @table set status=2 where status=1 --postsql 源表

字段顺序可调整

在进行表字段对照时,源端字段和目标字段顺序可能不一致,建议增加字段顺序调整按钮,方便字段调整

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.