Chris Grass
10/12/2023, 9:39 PMpyflyte
the task pod completes successfully and i see metadata written to my blob store; the n0
dir has both input and output and a dir is created for end-node
which contains inputs.pb
. however, the execution never finishes. it remains in the RUNNING
state and the end-node
task/event lists as QUEUED
when i look at execution details:
└── start-node - SUCCEEDED - 2023-10-12 21:27:35.186157884 +0000 UTC - 2023-10-12 21:27:35.186197485 +0000 UTC
└── n0 - SUCCEEDED - 2023-10-12 21:27:35.199313006 +0000 UTC - 2023-10-12 21:27:40.65606235 +0000 UTC
│ ├── Attempt :0
│ │ ├── Task - SUCCEEDED - 2023-10-12 21:27:35.262872565 +0000 UTC - 2023-10-12 21:27:40.644027555 +0000 UTC
│ │ ├── Task Type - python-task
│ │ ├── Reason - [ContainersNotReady|ContainerCreating]: containers with unready status: [f81959f07bf634b93b9c-n0-0]|
│ │ ├── Metadata
│ │ │ ├── Generated Name : f81959f07bf634b93b9c-n0-0
│ │ │ ├── Plugin Identifier : container
│ │ │ ├── External Resources
│ │ │ ├── Resource Pool Info
│ │ ├── Logs :
│ ├── Outputs :
│ └── o0: test-response
└── end-node - QUEUED - 2023-10-12 21:27:40.69112181 +0000 UTC - 2023-10-12 21:27:40.712353931 +0000 UTC
i don't see any error in the task pod but i do see this error in the flyte pod:
E1012 21:27:40.723260 7 workers.go:102] error syncing 'flyte-az-development/f81959f07bf634b93b9c': Operation cannot be fulfilled on <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com> "f81959f07bf634b93b9c": the object has been modified; please apply your changes to the latest version and try again
i suspect i have something misconfigured but can't figure out what that is. i have tried both with an azure workload identity and raw creds and see the same behavior.
any ideas?QUEUED
state and never runsWorkflow execution not found in flyteadmin.
as an error in the task pod when i try to re-run the workflow. I don't know if that is related or just a transient errorflyte-binary
and an image i built using the Makefile build_native_flyte
. the custom artifact doesn't have a code change but references a custom stow buildSamhita Alla
Yee
Chris Grass
10/17/2023, 11:41 AME1012
above and stuff like:
2023/10/13 16:00:21 /go/pkg/mod/gorm.io/gorm@v1.24.1-0.20221019064659-5dd2bb482755/finisher_api.go:509
[0.722ms] [rows:1] SELECT count(*) FROM pg_indexes WHERE tablename = 'artifacts' AND indexname = 'artifacts_dataset_uuid_idx' AND schemaname = CURRENT_SCHEMA()
...
# logging Specify configuration for logs emitted by Flyte
logging:
# level Set the log level
level: 1
and this in my .flyte/config.yaml
file:
logger:
show-source: true
### I have tried a variety of different levels
level: 20
values.yaml
)
propeller:
# disabled Disables flytepropeller
disabled: false
# disabledWebhook Disables webhook only
disableWebhook: false
publish-k8s-events: true
create-flyteworkflow-crd: true
Yee
Chris Grass
10/18/2023, 9:33 PMnode_execution_events
show a normal flow but end_node
remains QUEUED
. (i think) this is the log leading up to propeller trying to pick up that event:
likely pertinent block: [Operation cannot be fulfilled on <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com> \"f42f996b920d8459b9fe\": the object has been modified; please apply your changes to the latest version and try again]","ts":"2023-10-18T20:22:10Z"}
2023/10/18 20:22:10 [32m/go/pkg/mod/gorm.io/gorm@v1.24.1-0.20221019064659-5dd2bb482755/callbacks.go:134
[0m[33m[3.650ms] [34;1m[rows:1][0m INSERT INTO "node_executions" ("created_at","updated_at","deleted_at","execution_project","execution_domain","execution_name","node_id","phase","input_uri","closure","started_at","node_execution_created_at","node_execution_updated_at","duration","node_execution_metadata","parent_id","parent_task_execution_id","error_kind","error_code","cache_status","dynamic_workflow_remote_closure_reference","internal_data") VALUES ('2023-10-18 20:22:10.162','2023-10-18 20:22:10.162',NULL,'flyte-az','development','f42f996b920d8459b9fe','end-node','QUEUED','<abfs://myflytetest/metadata/propeller/flyte-az-development-f42f996b920d8459b9fe/end-node/data/inputs.pb','><binary>',NULL,'2023-10-18 20:22:10.135','2023-10-18 20:22:10.16',0,'<binary>',NULL,NULL,NULL,NULL,NULL,'','<binary>') RETURNING "id"
{"json":{"exec_id":"f42f996b920d8459b9fe","node":"end-node","src":"noop_notifications.go:32"},"level":"debug","msg":"call to noop publish with notification type [flyteidl.admin.NodeExecutionEventRequest] and proto message [event:\u003cid:\u003cnode_id:\"end-node\" execution_id:\u003cproject:\"flyte-az\" domain:\"development\" name:\"f42f996b920d8459b9fe\" \u003e \u003e producer_id:\"propeller\" phase:QUEUED occurred_at:\u003cseconds:1697660530 nanos:135833222 \u003e input_uri:\"<abfs://myflytetest/metadata/propeller/flyte-az-development-f42f996b920d8459b9fe/end-node/data/inputs.pb>\" spec_node_id:\"end-node\" event_version:1 reported_at:\u003cseconds:1697660530 nanos:160164218 \u003e \u003e ]","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","node":"end-node","src":"noop_notifications.go:32"},"level":"debug","msg":"call to noop publish with notification type [flyteidl.admin.NodeExecutionEventRequest] and proto message [event:\u003cid:\u003cnode_id:\"end-node\" execution_id:\u003cproject:\"flyte-az\" domain:\"development\" name:\"f42f996b920d8459b9fe\" \u003e \u003e producer_id:\"propeller\" phase:QUEUED occurred_at:\u003cseconds:1697660530 nanos:135833222 \u003e input_uri:\"<abfs://myflytetest/metadata/propeller/flyte-az-development-f42f996b920d8459b9fe/end-node/data/inputs.pb>\" spec_node_id:\"end-node\" event_version:1 reported_at:\u003cseconds:1697660530 nanos:160164218 \u003e \u003e ]","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","node":"end-node","ns":"flyte-az-development","res_ver":"342094168","routine":"worker-11","src":"executor.go:991","wf":"flyte-az:development:workflows.simple-workflow.simple_workflow"},"level":"debug","msg":"Node pre-execute completed","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","node":"end-node","ns":"flyte-az-development","res_ver":"342094168","routine":"worker-11","src":"executor.go:1174","wf":"flyte-az:development:workflows.simple-workflow.simple_workflow"},"level":"info","msg":"Node was queued, parallelism is now [1]","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","node":"end-node","ns":"flyte-az-development","res_ver":"342094168","routine":"worker-11","src":"executor.go:1176","wf":"flyte-az:development:workflows.simple-workflow.simple_workflow"},"level":"debug","msg":"Completed node [end-node]","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","ns":"flyte-az-development","res_ver":"342094168","routine":"worker-11","src":"executor.go:403","wf":"flyte-az:development:workflows.simple-workflow.simple_workflow"},"level":"info","msg":"Handling Workflow [f42f996b920d8459b9fe] Done","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","ns":"flyte-az-development","routine":"worker-11","src":"passthrough.go:89"},"level":"debug","msg":"Observed FlyteWorkflow Update (maybe finalizer)","ts":"2023-10-18T20:22:10Z"}
{"json":{"src":"controller.go:206"},"level":"info","msg":"Deletion triggered for f42f996b920d8459b9fe","ts":"2023-10-18T20:22:10Z"}
2023/10/18 20:22:10 [32m/go/pkg/mod/gorm.io/gorm@v1.24.1-0.20221019064659-5dd2bb482755/callbacks.go:134
[0m[33m[4.106ms] [34;1m[rows:1][0m INSERT INTO "node_execution_events" ("created_at","updated_at","deleted_at","execution_project","execution_domain","execution_name","node_id","request_id","occurred_at","phase") VALUES ('2023-10-18 20:22:10.166','2023-10-18 20:22:10.166',NULL,'flyte-az','development','f42f996b920d8459b9fe','end-node','','2023-10-18 20:22:10.135','QUEUED') RETURNING "id"
{"json":{"exec_id":"f42f996b920d8459b9fe","ns":"flyte-az-development","routine":"worker-11","src":"passthrough.go:104"},"level":"error","msg":"Failed to update workflow. Error [Operation cannot be fulfilled on flyteworkflows.flyte.lyft.com \"f42f996b920d8459b9fe\": the object has been modified; please apply your changes to the latest version and try again]","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","ns":"flyte-az-development","routine":"worker-11","src":"handler.go:362"},"level":"info","msg":"Completed processing workflow.","ts":"2023-10-18T20:22:10Z"}
E1018 20:22:10.173073 7 workers.go:102] error syncing 'flyte-az-development/f42f996b920d8459b9fe': Operation cannot be fulfilled on flyteworkflows.flyte.lyft.com "f42f996b920d8459b9fe": the object has been modified; please apply your changes to the latest version and try again
{"json":{"exec_id":"f42f996b920d8459b9fe","ns":"flyte-az-development","routine":"worker-11","src":"handler.go:181"},"level":"info","msg":"Processing Workflow.","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","ns":"flyte-az-development","routine":"worker-11","src":"passthrough.go:40"},"level":"warning","msg":"Workflow not found in cache.","ts":"2023-10-18T20:22:10Z"}
{"json":{"exec_id":"f42f996b920d8459b9fe","ns":"flyte-az-development","routine":"worker-11","src":"handler.go:189"},"level":"warning","msg":"Workflow namespace[flyte-az-development]/name[f42f996b920d8459b9fe] not found, may be deleted.","ts":"2023-10-18T20:22:10Z"}
select * from node_execution_events where execution_name = 'f42f996b920d8459b9fe';
id | created_at | updated_at | deleted_at | execution_project | execution_domain | execution_name | node_id | request_id | occurred_at | phase
----+-------------------------------+-------------------------------+------------+-------------------+------------------+----------------------+------------+------------+-------------------------------+-----------
28 | 2023-10-18 20:22:04.307004+00 | 2023-10-18 20:22:04.307004+00 | | flyte-az | development | f42f996b920d8459b9fe | start-node | | 2023-10-18 20:22:04.299654+00 | SUCCEEDED
29 | 2023-10-18 20:22:04.344218+00 | 2023-10-18 20:22:04.344218+00 | | flyte-az | development | f42f996b920d8459b9fe | n0 | | 2023-10-18 20:22:04.328458+00 | QUEUED
30 | 2023-10-18 20:22:04.480383+00 | 2023-10-18 20:22:04.480383+00 | | flyte-az | development | f42f996b920d8459b9fe | n0 | | 2023-10-18 20:22:04.452153+00 | RUNNING
31 | 2023-10-18 20:22:10.108206+00 | 2023-10-18 20:22:10.108206+00 | | flyte-az | development | f42f996b920d8459b9fe | n0 | | 2023-10-18 20:22:10.089884+00 | SUCCEEDED
32 | 2023-10-18 20:22:10.166419+00 | 2023-10-18 20:22:10.166419+00 | | flyte-az | development | f42f996b920d8459b9fe | end-node | | 2023-10-18 20:22:10.135833+00 | QUEUED
wf
in the tutorialYee
Chris Grass
10/20/2023, 12:49 PMYee
Chris Grass
10/20/2023, 12:51 PM