Hey folks.. we have a Flyte workflow that runs con...
# ask-the-community
r
Hey folks.. we have a Flyte workflow that runs container tasks and many a times we’ve been running into this error.. not sure how to root cause what’s going on
Copy code
[4/4] currentAttempt done. Last Error: USER::[1/1] currentAttempt done. Last Error: USER::[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[flyte-copilot-downloader] terminated with exit code (1). Reason [Error]. Message: 
Type to use [iam, accesskey]. (default "iam")
      --storage.connection.disable-ssl             Disables SSL connection. Should only be used for development.
      --storage.connection.endpoint string         URL for storage client to connect to.
      --storage.connection.region string           Region to connect to. (default "us-east-1")
      --storage.connection.secret-key string       Secret to use when accesskey is set.
      --storage.container string                   Initial container (in s3 a bucket) to create -if it doesn't exist-.'
      --storage.defaultHttpClient.timeout string   Sets time out on the http client. (default "0s")
      --storage.enable-multicontainer              If this is true,  then the container argument is overlooked and redundant. This config will automatically open new connections to new containers/buckets as they are encountered
      --storage.limits.maxDownloadMBs int          Maximum allowed download size (in MBs) per call. (default 2)
      --storage.stow.config stringToString         Configuration for stow backend. Refer to github/graymeta/stow (default [])
      --storage.stow.kind string                   Kind of Stow backend to use. Refer to github/graymeta/stow
      --storage.type string                        Sets the type of storage to configure [s3/minio/local/mem/stow]. (default "s3")
      --tls-server-name string                     If provided, this name will be used to validate server certificate. If this is not provided, hostname used to contact the server is used.
      --token string                               Bearer token for authentication to the API server
      --user string                                The name of the kubeconfig user to use
      --username string                            Username for basic authentication to the API server
  -v, --v Level                                    number for the log level verbosity
      --vmodule moduleSpec                         comma-separated list of pattern=N settings for file-filtered logging
k
Hmm version mismatch of copilot and Flyte
In you backend config
r
oh.. let me check
Why do all pods not have this problem though?
flytecopilot:v0.0.24 flyteadmin and flytescheduler: v1.1.46 datacatalog: v1.0.1 flytepropeller: v1.1.40 flyteconsole: v1.3.4
c
are you seeing this on every
ContainerTask
run, or only a seemingly-random subset of them?
r
seemingly-random subset
oops sorry.. in this workflow ALL containertasks
c
ok, that's more understandable. semi-random would be harder to debug. we ran into something similar. in our case the
flyte-copilot-downloader
container that was spawned from our
ContainerTask
required credentials to the metadata bucket (to fetch an input FlyteFile). those credentials were not being propagated from the PodTemplate containerspec, so we had to manage them independently.
i'd take a look at the container details and make sure it has all the envvars you expect, specifically around creds
r
@Erich Shan FYI Thanks a lot!
c
good luck, and let us know what the research yields
@Yee this would be a good area to document or extend functionality. we found that it was difficult/impossible to add an explicit set of credentials using
secret_requests
because it added a prefix that was not configurable. we ended up having to manage the credentials outside of the
ContainerTask
definition
k
@L godlike did you see this error before?
k
I honestly think we can improve container tasks a lot - I wrote it over a weekend 4 years ago. K8s has new Ways of Ordering containers now
l
did you see this error before?
r
didn’t see this error… just the one I posted
l
I will investigate it, thanks
e
small update: looks like we're getting the same error that @Sujith Samuel is experiencing. when i run log on our copilot downloader container its showing a lot of this until it prints the help menu for the cli
Copy code
{"json":{},"level":"error","msg":"Failed to Get credentials.","ts":"2024-03-26T18:14:27Z"}
{"json":{},"level":"error","msg":"Failed to Get credentials.","ts":"2024-03-26T18:14:32Z"}
{"json":{},"level":"error","msg":"Failed to Get credentials.","ts":"2024-03-26T18:14:37Z"}
i guess what i don't understand so far is what is printing this error since we do have access to our s3 bucket from within this container (i reproduced this issue and can ls the contents of the relevant bucket)
the only thing that has changed is that we upgraded our EKS version from 1.24 to 1.25
The fix was upgrading copilot to its latest version which was what @Sujith Samuel did as well. still don’t understand why but maybe k8s 1.24 is not compatible with that version of copilot
l
I didn't see this error before, but I think there's credential issue with blob type when using single binary. (Sandbox works well)
flytekit will support raw container task local execution recently. Please try this PR if you are interested, thank you https://github.com/flyteorg/flytekit/pull/2258