Comments (7)
这是执行节点1上的部分log信息
690,0,0.5850,5.6700\n447,0.28960,0.00,9.690,0,0.5850,5.3900\n448,0.26838,0.00,9.690,0,0.5850,5.7940\n449,0.23912,0.00,9.690,0,0.5850,6.0190\n450,0.17783,0.00,9.690,0,0.5850,5.5690\n451,0.22438,0.00,9.690,0,0.5850,6.0270\n452,0.06263,0.00,11.930,0,0.5730,6.5930\n453,0.04527,0.00,11.930,0,0.5730,6.1200\n454,0.06076,0.00,11.930,0,0.5730,6.9760\n455,0.10959,0.00,11.930,0,0.5730,6.7940\n456,0.04741,0.00,11.930,0,0.5730,6.0300\n" hosts:"127.0.0.1:8185" params:<trainParams:<label:"MEDV" regParam:0.1 alpha:0.1 amplitude:0.0001 accuracy:10 idName:"id" BatchSize:4 > modelParams:<> > , otherParts: [127.0.0.1:8185] �[36mmodule�[0m=handler.mpc
�[36mINFO�[0m[2022-01-02 21:20:53] start ToProcess task of loop, taskId: 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74 �[36mmodule�[0m=monitor.task
�[36mINFO�[0m[2022-01-02 21:20:53] success send task request to others, taskId: 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74 �[36mmodule�[0m=handler.mpc
�[36mINFO�[0m[2022-01-02 21:20:53] tasks execution finished of each round �[36mend_time�[0m="2022-01-02 21:20:53" �[36mmodule�[0m=monitor.task �[36mtask_len�[0m=1
�[36mINFO�[0m[2022-01-02 21:20:53] learner[1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74] finished advance . message MsgPsiEnc �[36mloopRound�[0m=0 �[36mmessageRound�[0m=0 �[36mmodule�[0m=mpc.learners.linear_reg_vl
�[36mINFO�[0m[2022-01-02 21:20:53] learner[1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74] finished advance . message MsgPsiAskReEnc �[36mloopRound�[0m=0 �[36mmessageRound�[0m=0 �[36mmodule�[0m=mpc.learners.linear_reg_vl
�[36mINFO�[0m[2022-01-02 21:20:53] learner[1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74] finished advance . message MsgPsiIntersect �[36mloopRound�[0m=0 �[36mmessageRound�[0m=0 �[36mmodule�[0m=mpc.learners.linear_reg_vl
�[37mDEBU�[0m[2022-01-02 21:21:02] no task found �[37mamount�[0m=0 �[37mmodule�[0m=monitor.task
�[37mDEBU�[0m[2022-01-02 21:21:12] no task found �[37mamount�[0m=0
这是任务信息:
/home/aistudio/PaddleDTX/dai/executor1
TaskID: 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74
TaskType: LEARN
TaskName: 房价预测任务v4
Description: hahahha
TaskStatus: Failed
PublishTime: 2022-01-02 21:09:01
TaskID: ee2302a8-cbb4-42a8-b3ce-1d4c0f2bd649
TaskType: LEARN
TaskName: 房价预测任务v4
Description: hahahha
TaskStatus: Confirming
PublishTime: 2022-01-02 21:04:11
TaskID: 2851c8cf-ea0d-4fdf-a80a-893704c2e993
TaskType: LEARN
TaskName: 房价预测任务v3
Description: hahahha
TaskStatus: Failed
PublishTime: 2022-01-02 20:16:31
TaskID: 6ea3193b-e626-4c80-a8da-f34a7dcee197
TaskType: LEARN
TaskName: 房价预测任务v3
Description: hahahha
TaskStatus: Failed
PublishTime: 2022-01-02 14:36:14
taskNum : 4
``` �
from paddledtx.
# 启动训练任务1,这里好像只要启动一个?
# 当所有的任务执行节点对任务进行确认后,需要计算需求方触发启动命令的执行,训练任务的执行结果是产出一个预测模型。
%cd ~/PaddleDTX/dai/executor1
!./requester-cli task start --id 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74 \
-k eae7344064e1d5b53af6da1a23407b1e7e265d15eaf0442c476e3caac3003406 \
--conf ./conf/config.toml
from paddledtx.
@skywalk163 看起来任务发布、确认和启动都是没问题的,如果任务执行失败,可通过 ./requester-cli task getbyid xxx 查看具体的错误日志。另外,请确保执行节点配置了有效的 publicAddress
from paddledtx.
请问两个问题:
问题一:任务发布确认启动流程是这样吗?
1 单节点发布任务
2 双节点确认
3 单节点启动任务
问题二:上传波士顿训练文件的时候,我看文件~/PaddleDTX/dai/mpc/testdata/vl/linear_boston_housing/train_dataA.csv
算上第一行总共457行,那么文件提交那里应该写多少呢? docker一体化代码里面写的是456 ,手册客户端工具章节写的是457 。
from paddledtx.
@skywalk163
问题1:流程是正确的,1 计算需求方发布任务 2 计算执行方(多方)确认任务 3 计算需求方启动任务,4 计算执行方(多方交互)进行任务训练或预测
问题2:统一以客户端手册为准,我们更新下文件上传测试脚本,写457行即可;
from paddledtx.
经过调试,现在任务启动之后,再看状态,就failed失败了,报错信息改变了,至少能看出来是出错了。
[37mmodule�[0m=monitor.task
�[36mINFO�[0m[2022-01-06 10:10:53] success update task status into chain, taskId: b5f44aae-b1ba-48f1-91c7-4236fba41649 �[36mmodule�[0m=handler.mpc
�[36mINFO�[0m[2022-01-06 10:10:53] task deleted �[36mmodule�[0m=mpc.trainer �[36mtaskId�[0m=b5f44aae-b1ba-48f1-91c7-4236fba41649
�[37mDEBU�[0m[2022-01-06 10:10:53] stop mpc task, taskId: b5f44aae-b1ba-48f1-91c7-4236fba41649 �[37mmodule�[0m=handler.mpc
�[31mERRO�[0m[2022-01-06 10:10:53] error occurred when task start prepare, and taskId: b5f44aae-b1ba-48f1-91c7-4236fba41649 �[31merror�[0m="{\"code\":\"XDAT0002\",\"message\":\"invalid addr: parse \\\"127.0.0.1:8121\\\": first path segment in URL cannot contain colon\"}" �[31mmodule�[0m=monitor.task
�[36mINFO�[0m[2022-01-06 10:10:53] tasks execution finished of each round �[36mend_time�[0m="2022-01-06 10:10:53" �[36mmodule�[0m=monitor.task �[36mtask_len�[0m=1
TaskID: b5f44aae-b1ba-48f1-91c7-4236fba41649
Requester: 6cb69efc0439032b0d0f52bae1c9aada3f8fb46a5f24fa99065910055b77a1174d4afbac3c0529c8927587bb0e2ad90a85eaa600cfddd6b99f1212112135ef2b
TaskType: train
TaskName: 房价预测任务v1
Description: 用飞桨,划时代
Label: MEDV
LabelName:
RegMode:
RegParam: 0.1
Algorithm: linear-vl
Alpha: 0.100000
Amplitude: 0.000100
Accuracy: 10
ModelTaskID:
Status: Failed
PublishTime: 2022-01-06 09:35:56
Task data sets:
DataID: 80b31197-1ae1-4a13-97e4-b1e245702486
Owner: 4637ef79f14b036ced59b76408b0d88453ac9e5baa523a86890aa547eac3e3a0f4a3c005178f021c1b060d916f42082c18e1d57505cdaaeef106729e6442f4e5
Address: 127.0.0.1:8181
PSILabel: id
ConfirmedAt: 2022-01-06 09:40:39
RejectedAt:
DataID: 5f58f937-b23f-4d76-af22-9d8fd2c40d63
Owner: e4530d81ccddc478978070e8f9fcc9f101dfc3b5c3ca1519c522c5e9698f394a35aab9145f242765185689a64b7338e9929c6a32e09050ff15645bb121ce1754
Address: 127.0.0.1:8182
PSILabel: id
ConfirmedAt: 2022-01-06 09:41:36
RejectedAt:
StartTime: 2022-01-06 10:10:53
EndTime: 2022-01-06 10:10:53
ErrMessage: {"code":"XDAT0002","message":"invalid addr: parse \"127.0.0.1:8121\": first path segment in URL cannot contain colon"}
Result:
from paddledtx.
OK啦!
针对报错:
ErrMessage: {"code":"XDAT0002","message":"invalid addr: parse \"127.0.0.1:8121\": first path segment in URL cannot contain colon"}
Result:
解决方法:
在执行节点里面设置数据持有节点的时候,需要加上http头:
host = "http://127.0.0.1:8121"
其它配置都不需要带http头,但是执行节点里这个地方需要带上,所以一不小心就会忘记写而导致报错!
from paddledtx.
Related Issues (16)
- xdb编译有问题 HOT 5
- XuperDB源代码编译安装在XuperDB这里碰到了问题 HOT 7
- 手册创建命名空间命令参数出错 HOT 2
- 编译区域链合约报错 HOT 2
- 启动数据存储节点报错 HOT 3
- 新版本docker script模式启动失败 HOT 6
- 编译xdb失败
- xdb部署和性能问题 HOT 3
- 【PaddlePaddle Hackathon 2】100、PaddleDTX/xdb支持负载均衡策略
- 【PaddlePaddle Hackathon 2】99、PaddleDTX/crypto中Paillier算法实现的性能优化
- 【PaddlePaddle Hackathon 2】98、PaddleDTX支持k8s启动DAI网络
- 【PaddlePaddle Hackathon 2】97、PaddleDTX/xdb支持数据集查询功能
- 【PaddlePaddle Hackathon 2】96、PaddleDTX/dai网络支持Fabric
- 【PaddlePaddle Hackathon 2】PaddleDTX 任务合集
- 在AIStudio中执行训练任务报错 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddledtx.