Coder Social home page Coder Social logo

Comments (7)

skywalk163 avatar skywalk163 commented on June 2, 2024

这是执行节点1上的部分log信息

690,0,0.5850,5.6700\n447,0.28960,0.00,9.690,0,0.5850,5.3900\n448,0.26838,0.00,9.690,0,0.5850,5.7940\n449,0.23912,0.00,9.690,0,0.5850,6.0190\n450,0.17783,0.00,9.690,0,0.5850,5.5690\n451,0.22438,0.00,9.690,0,0.5850,6.0270\n452,0.06263,0.00,11.930,0,0.5730,6.5930\n453,0.04527,0.00,11.930,0,0.5730,6.1200\n454,0.06076,0.00,11.930,0,0.5730,6.9760\n455,0.10959,0.00,11.930,0,0.5730,6.7940\n456,0.04741,0.00,11.930,0,0.5730,6.0300\n" hosts:"127.0.0.1:8185" params:<trainParams:<label:"MEDV" regParam:0.1 alpha:0.1 amplitude:0.0001 accuracy:10 idName:"id" BatchSize:4 > modelParams:<> > , otherParts: [127.0.0.1:8185]  �[36mmodule�[0m=handler.mpc
�[36mINFO�[0m[2022-01-02 21:20:53] start ToProcess task of loop, taskId: 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74  �[36mmodule�[0m=monitor.task
�[36mINFO�[0m[2022-01-02 21:20:53] success send task request to others, taskId: 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74   �[36mmodule�[0m=handler.mpc
�[36mINFO�[0m[2022-01-02 21:20:53] tasks execution finished of each round        �[36mend_time�[0m="2022-01-02 21:20:53" �[36mmodule�[0m=monitor.task �[36mtask_len�[0m=1
�[36mINFO�[0m[2022-01-02 21:20:53] learner[1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74] finished advance . message MsgPsiEnc  �[36mloopRound�[0m=0 �[36mmessageRound�[0m=0 �[36mmodule�[0m=mpc.learners.linear_reg_vl
�[36mINFO�[0m[2022-01-02 21:20:53] learner[1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74] finished advance . message MsgPsiAskReEnc  �[36mloopRound�[0m=0 �[36mmessageRound�[0m=0 �[36mmodule�[0m=mpc.learners.linear_reg_vl
�[36mINFO�[0m[2022-01-02 21:20:53] learner[1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74] finished advance . message MsgPsiIntersect  �[36mloopRound�[0m=0 �[36mmessageRound�[0m=0 �[36mmodule�[0m=mpc.learners.linear_reg_vl
�[37mDEBU�[0m[2022-01-02 21:21:02] no task found                                 �[37mamount�[0m=0 �[37mmodule�[0m=monitor.task
�[37mDEBU�[0m[2022-01-02 21:21:12] no task found                                 �[37mamount�[0m=0

这是任务信息:

/home/aistudio/PaddleDTX/dai/executor1
TaskID: 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74
TaskType: LEARN
TaskName: 房价预测任务v4
Description: hahahha
TaskStatus: Failed
PublishTime: 2022-01-02 21:09:01

TaskID: ee2302a8-cbb4-42a8-b3ce-1d4c0f2bd649
TaskType: LEARN
TaskName: 房价预测任务v4
Description: hahahha
TaskStatus: Confirming
PublishTime: 2022-01-02 21:04:11

TaskID: 2851c8cf-ea0d-4fdf-a80a-893704c2e993
TaskType: LEARN
TaskName: 房价预测任务v3
Description: hahahha
TaskStatus: Failed
PublishTime: 2022-01-02 20:16:31

TaskID: 6ea3193b-e626-4c80-a8da-f34a7dcee197
TaskType: LEARN
TaskName: 房价预测任务v3
Description: hahahha
TaskStatus: Failed
PublishTime: 2022-01-02 14:36:14

taskNum : 4
``` �

from paddledtx.

skywalk163 avatar skywalk163 commented on June 2, 2024
# 启动训练任务1,这里好像只要启动一个?

# 当所有的任务执行节点对任务进行确认后,需要计算需求方触发启动命令的执行,训练任务的执行结果是产出一个预测模型。
%cd ~/PaddleDTX/dai/executor1
!./requester-cli task start --id 1bb43886-4ef4-47c5-a6f0-73ba6fdcfe74 \
-k eae7344064e1d5b53af6da1a23407b1e7e265d15eaf0442c476e3caac3003406 \
--conf ./conf/config.toml

from paddledtx.

hongyanwang avatar hongyanwang commented on June 2, 2024

@skywalk163 看起来任务发布、确认和启动都是没问题的,如果任务执行失败,可通过 ./requester-cli task getbyid xxx 查看具体的错误日志。另外,请确保执行节点配置了有效的 publicAddress

from paddledtx.

skywalk163 avatar skywalk163 commented on June 2, 2024

请问两个问题:

问题一:任务发布确认启动流程是这样吗?
1 单节点发布任务
2 双节点确认
3 单节点启动任务

问题二:上传波士顿训练文件的时候,我看文件~/PaddleDTX/dai/mpc/testdata/vl/linear_boston_housing/train_dataA.csv算上第一行总共457行,那么文件提交那里应该写多少呢? docker一体化代码里面写的是456 ,手册客户端工具章节写的是457 。

from paddledtx.

MyYuan avatar MyYuan commented on June 2, 2024

@skywalk163
问题1:流程是正确的,1 计算需求方发布任务 2 计算执行方(多方)确认任务 3 计算需求方启动任务,4 计算执行方(多方交互)进行任务训练或预测
问题2:统一以客户端手册为准,我们更新下文件上传测试脚本,写457行即可;

from paddledtx.

skywalk163 avatar skywalk163 commented on June 2, 2024

经过调试,现在任务启动之后,再看状态,就failed失败了,报错信息改变了,至少能看出来是出错了。

[37mmodule�[0m=monitor.task
�[36mINFO�[0m[2022-01-06 10:10:53] success update task status into chain, taskId: b5f44aae-b1ba-48f1-91c7-4236fba41649  �[36mmodule�[0m=handler.mpc
�[36mINFO�[0m[2022-01-06 10:10:53] task deleted                                  �[36mmodule�[0m=mpc.trainer �[36mtaskId�[0m=b5f44aae-b1ba-48f1-91c7-4236fba41649
�[37mDEBU�[0m[2022-01-06 10:10:53] stop mpc task, taskId: b5f44aae-b1ba-48f1-91c7-4236fba41649  �[37mmodule�[0m=handler.mpc
�[31mERRO�[0m[2022-01-06 10:10:53] error occurred when task start prepare, and taskId: b5f44aae-b1ba-48f1-91c7-4236fba41649  �[31merror�[0m="{\"code\":\"XDAT0002\",\"message\":\"invalid addr: parse \\\"127.0.0.1:8121\\\": first path segment in URL cannot contain colon\"}" �[31mmodule�[0m=monitor.task
�[36mINFO�[0m[2022-01-06 10:10:53] tasks execution finished of each round        �[36mend_time�[0m="2022-01-06 10:10:53" �[36mmodule�[0m=monitor.task �[36mtask_len�[0m=1
TaskID: b5f44aae-b1ba-48f1-91c7-4236fba41649
Requester: 6cb69efc0439032b0d0f52bae1c9aada3f8fb46a5f24fa99065910055b77a1174d4afbac3c0529c8927587bb0e2ad90a85eaa600cfddd6b99f1212112135ef2b
TaskType: train
TaskName: 房价预测任务v1
Description: 用飞桨,划时代
Label: MEDV
LabelName: 
RegMode: 
RegParam: 0.1
Algorithm: linear-vl
Alpha: 0.100000
Amplitude: 0.000100
Accuracy: 10
ModelTaskID: 
Status: Failed
PublishTime: 2022-01-06 09:35:56

Task data sets: 
DataID: 80b31197-1ae1-4a13-97e4-b1e245702486
Owner: 4637ef79f14b036ced59b76408b0d88453ac9e5baa523a86890aa547eac3e3a0f4a3c005178f021c1b060d916f42082c18e1d57505cdaaeef106729e6442f4e5
Address: 127.0.0.1:8181
PSILabel: id
ConfirmedAt: 2022-01-06 09:40:39
RejectedAt: 
DataID: 5f58f937-b23f-4d76-af22-9d8fd2c40d63
Owner: e4530d81ccddc478978070e8f9fcc9f101dfc3b5c3ca1519c522c5e9698f394a35aab9145f242765185689a64b7338e9929c6a32e09050ff15645bb121ce1754
Address: 127.0.0.1:8182
PSILabel: id
ConfirmedAt: 2022-01-06 09:41:36
RejectedAt: 

StartTime: 2022-01-06 10:10:53
EndTime: 2022-01-06 10:10:53


ErrMessage: {"code":"XDAT0002","message":"invalid addr: parse \"127.0.0.1:8121\": first path segment in URL cannot contain colon"}
Result: 

from paddledtx.

skywalk163 avatar skywalk163 commented on June 2, 2024

OK啦!
针对报错:

ErrMessage: {"code":"XDAT0002","message":"invalid addr: parse \"127.0.0.1:8121\": first path segment in URL cannot contain colon"}
Result: 

解决方法:
在执行节点里面设置数据持有节点的时候,需要加上http头:

 host = "http://127.0.0.1:8121"

其它配置都不需要带http头,但是执行节点里这个地方需要带上,所以一不小心就会忘记写而导致报错!

from paddledtx.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.