Ruksana Kabealo
01/18/2023, 11:40 PMYee
01/19/2023, 1:40 AMDynamicJobSpec
object but basically it looks just like a workflows3 ls
that folder where the futures file is supposed to be?Ketan (kumare3)
01/19/2023, 2:39 AMRuksana Kabealo
01/19/2023, 4:23 AMKetan (kumare3)
01/19/2023, 4:31 AMRuksana Kabealo
01/19/2023, 4:45 AMKetan (kumare3)
01/19/2023, 4:47 AMRuksana Kabealo
01/19/2023, 1:32 PMKevin Su
01/19/2023, 6:59 PMYee
01/20/2023, 9:25 PMlogger:
show-source: true
level: 6
Ruksana Kabealo
01/20/2023, 10:45 PMYee
01/20/2023, 10:46 PMDan Rammer (hamersaw)
01/23/2023, 5:12 PMaq4dj4xctvd84df9cvqm
) the error message in the logs is:
{
"json": {
"exec_id": "aq4dj4xctvd84df9cvqm",
"node": "n5/dn0",
"ns": "delaieine-development",
"res_ver": "266883",
"routine": "worker-3",
"wf": "delaieine:development:flyte.workflows.auto_train.pipeline"
},
"level": "error",
"msg": "handling parent node failed with error: InvalidArgument: Invalid fields for event message, caused by [rpc error: code = InvalidArgument desc = missing project]",
"ts": "2023-01-19T01:26:46Z"
}
This shows that propeller is failing to send a message to admin because of a 'missing project'. There may be some kind of version mismatch between propeller and admin - do you know what versions you're running?
(2) The Failed to read futures file
errors are printed out for other workflows (ie. not the one depicted). It looks like Flyte is trying to abort the workflow but is failing to abort. ex:
{
"json": {
"exec_id": "a8h2qqfdkxtqzhg49g22",
"node": "n5",
"ns": "delaieine-development",
"res_ver": "271443",
"routine": "worker-1",
"wf": "delaieine:development:flyte.workflows.auto_train.pipeline"
},
"level": "warning",
"msg": "Failed to read futures file. Error: path:<s3://my-s3-bucket/metadata/propeller/delaieine-development-a8h2qqfdkxtqzhg49g22/n5/data/0/futures.pb>: not found",
"ts": "2023-01-19T01:50:13Z"
}
followed by:
{
"json": {
"exec_id": "a8h2qqfdkxtqzhg49g22",
"ns": "delaieine-development",
"res_ver": "271443",
"routine": "worker-1",
"wf": "delaieine:development:flyte.workflows.auto_train.pipeline"
},
"level": "error",
"msg": "Failed to propagate Abort for workflow:project:\"delaieine\" domain:\"development\" name:\"a8h2qqfdkxtqzhg49g22\" . Error: []",
"ts": "2023-01-19T01:50:13Z"
}
Somehow the futures.pb
file is missing. So either (1) it was generated and deleted, corrupt, etc or (2) Flyte is looking for the file when it shouldn't be - this may be related to the event issue above.Ruksana Kabealo
01/24/2023, 5:09 PMDan Rammer (hamersaw)
03/03/2023, 9:49 AMRuksana Kabealo
03/06/2023, 3:18 PMDan Rammer (hamersaw)
03/06/2023, 9:42 PM