Hello :slightly_smiling_face: I'm currently trying...
# flyte-deployment
b
Hello šŸ™‚ I'm currently trying to set up flyte 1.3.0 on AWS EKS 1.27 (our current flyte setup is running on 1.23). I haven't done any other changes beside the k8s update and am running into below errors/warnings from the flyte admin while trying to create a workflow. I'm wondering if I oversaw any k8s version limitations from the flyte perspective or something else.
Copy code
{"json":{"x-request-id":"a-nkqkn6ckxxcjrxc45ftj"},"level":"warning","msg":"Failed to fetch override values when assigning task resource default values for [resource_type:TASK project:\"broder\" domain:\"development\" name:\"src.loading.get_data.load_sample_data\" version:\"fUkU2IwuEUHd25UVHD5tFQ==\" ]: Resource [{Project:broder Domain:development Workflow: LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-09-29T07:34:20Z"}
{"json":{"x-request-id":"a-gv9hhtk7jrgbskmkqngn"},"level":"warning","msg":"Failed to fetch override values when assigning task resource default values for [resource_type:TASK project:\"broder\" domain:\"development\" name:\"src.preprocessing.filter.filter_for_job\" version:\"fUkU2IwuEUHd25UVHD5tFQ==\" ]: Resource [{Project:broder Domain:development Workflow: LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-09-29T07:34:20Z"}
{"json":{"x-request-id":"a-dm2qmntbpqjzzd4v2gjh"},"level":"warning","msg":"Failed to fetch override values when assigning task resource default values for [resource_type:TASK project:\"broder\" domain:\"development\" name:\"src.loading.output_result.output_result\" version:\"fUkU2IwuEUHd25UVHD5tFQ==\" ]: Resource [{Project:broder Domain:development Workflow: LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-09-29T07:34:20Z"}
{"json":{"exec_id":"f86447f1c7fb94aa2b60","x-request-id":"a-dt254dxgs9vpnnqzjnvh"},"level":"warning","msg":"Failed to fetch override values when assigning task resource default values for [resource_type:WORKFLOW project:\"broder\" domain:\"development\" name:\"use_case_workflow.sample_workflow\" version:\"fUkU2IwuEUHd25UVHD5tFQ==\" ]: Resource [{Project:broder Domain:development Workflow:use_case_workflow.sample_workflow LaunchPlan: ResourceType:TASK_RESOURCE}] not found","ts":"2023-09-29T07:34:21Z"}
{"json":{"exec_id":"f86447f1c7fb94aa2b60","x-request-id":"a-dt254dxgs9vpnnqzjnvh"},"level":"warning","msg":"Failed to fetch override values when assigning execution queue for [{ResourceType:WORKFLOW Project:broder Domain:development Name:use_case_workflow.sample_workflow Version:fUkU2IwuEUHd25UVHD5tFQ== XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}] with err: Resource [{Project:broder Domain:development Workflow:use_case_workflow.sample_workflow LaunchPlan: ResourceType:EXECUTION_QUEUE}] not found","ts":"2023-09-29T07:34:21Z"}
{"json":{"exec_id":"f86447f1c7fb94aa2b60","x-request-id":"a-dt254dxgs9vpnnqzjnvh"},"level":"warning","msg":"Setting security context from auth Role","ts":"2023-09-29T07:34:21Z
Error from `pyflyte run`:
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "failed to create workflow in propeller the server could not find the requested resource (post <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com>)"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to create workflow in propeller the server could not find the requested resource (post <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com>)", grpc_status:13, created_time:"2023-09-29T09:34:21.458589715+02:00"}"
The CRD exists on the data plane.
One thing that makes me wonder, is a second crashing flyte-admin pod/replicaset
panic: Get "<https://1234567.gr7.eu-central-1.eks.amazonaws.com/api?timeout=30s>": x509: certificate signed by unknown authority
Copy code
āžœ  ~ k get pods -n flyte                                                                                                                                           <aws:dev-playground-tf>
NAME                              READY   STATUS                  RESTARTS       AGE
flyteadmin-56cf447466-xdb7q       0/1     Init:CrashLoopBackOff   20 (71s ago)   78m
flyteadmin-74bffb4bc8-4z8ld       1/1     Running                 0              97m
flyteconsole-f59757bd9-lb65s      1/1     Running                 0              97m
flytescheduler-579dbcb4df-46w5t   1/1     Running                 0              97m
syncresources-5b87d9cb5d-4cb9g    1/1     Running                 0              97m
āžœ  ~ k get deployment -n flyte                                                                                                                                     <aws:dev-playground-tf>
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
flyteadmin       1/1     1            1           97m
flyteconsole     1/1     1            1           97m
flytescheduler   1/1     1            1           97m
syncresources    1/1     1            1           97m
āžœ  ~ k get replicasets -n flyte                                                                                                                                    <aws:dev-playground-tf>
NAME                        DESIRED   CURRENT   READY   AGE
flyteadmin-56cf447466       1         1         0       79m
flyteadmin-74bffb4bc8       1         1         1       97m
flyteadmin-7889bf56f7       0         0         0       80m
flyteconsole-f59757bd9      1         1         1       97m
flytescheduler-579dbcb4df   1         1         1       97m
syncresources-5b87d9cb5d    1         1         1       97m
k
Why so old
b
Do you mean the flyte or k8s version? ^^
k
Flyte
b
The usual: time and priority to to updates šŸ¤· Is there a definite connection to the issue? Should I just try a more recent version?
I was also a bit lost. Haven't been in flyte that deep and much yet so I just posted the helm version.
we use flyteadmin v1.1.115 flytescheduler v1.1.70 datacatalog v1.0.50 flytepropeller v1.1.111 flyteconsole v1.9.0