Andrew
11/28/2023, 10:21 PMtask submitted to K8s
[ContainersNotReady|ContainerCreating]: containers with unready status: [f787e4fb7001d459f851-n0-0]|
[ContainersNotReady|ErrImagePull]: containers with unready status: [f787e4fb7001d459f851-n0-0]|rpc error: code = Unknown desc = failed to pull and unpack image "<location>-docker.pkg.dev/<project>/<repository>/<image_name>:MHon8F_9TgvC55qoS5mUtw..": failed to resolve reference "<location>-docker.pkg.dev/<project>/<repository>/<image_name>:MHon8F_9TgvC55qoS5mUtw..": failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://<location>-docker.pkg.dev/v2/token?scope=repository%3A<project>%2F<repository>%2F<image_name>%3Apull&service=<location>-docker.pkg.dev: 403 Forbidden
Here are the patch steps I followed:
• Created service account and downloaded the .json key file
• kubectl create secret docker-registry artifact-json-key --docker-server=<http://pkg.dev|pkg.dev> --docker-username=_json_key --docker-password=(cat artifact_auth.json | string collect) --docker-email=<email>
• kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "artifact-json-key"}]}'
Laura Lin
11/29/2023, 12:52 AMmap_task
but the newer feature of overriding container_image
doesn't seem to work with map_tasks. Here's a minimal example where its still using the default container image and not the overwritten one. It does work with regular tasks, just not map_task?
@task
def my_mappable_task(a: int) -> typing.Optional[str]:
return str(a)
@workflow
def my_map_wf(x: typing.List[int]) -> typing.List[typing.Optional[str]]:
return map_task(
my_mappable_task,
concurrency=10,
min_success_ratio=0.75,
)(a=x).with_overrides(container_image="random:image")
cc @Kevin Su since he added the original feature https://github.com/flyteorg/flytekit/commit/eafcc820303367749e63edc62190b9153fd6be5eFrank Shen
11/29/2023, 2:59 AMdocker pull gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0
Error response from daemon: manifest for gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0 not found: manifest unknown: Failed to fetch "v1beta2-1.2.0-3.0.0" from request "/v2/spark-operator/spark-operator/manifests/v1beta2-1.2.0-3.0.0".
(base) ➜ mlforge git:(flyte-deployment-2) ✗ docker pull gcr.io/spark-operator/spark-operator
Using default tag: latest
Error response from daemon: manifest for gcr.io/spark-operator/spark-operator:latest not found: manifest unknown: Failed to fetch "latest" from request "/v2/spark-operator/spark-operator/manifests/latest".
I got the image repo url from https://github.com/helm/charts/blob/master/incubator/sparkoperator/values.yaml,
and image tag from https://googlecloudplatform.github.io/spark-on-k8s-operator/
Could anyone provide a working image tag?
Thanks!Muhammad Haritsah Mukhlis
11/29/2023, 3:19 AMZeeshan Shareef
11/29/2023, 9:34 AM> apt-get install git:
#5 17.22 perl: warning: Please check that your locale settings:
#5 17.22 LANGUAGE = (unset),
#5 17.22 LC_ALL = "en_US.UTF-8",
#5 17.22 LANG = "C.UTF-8"
#5 17.22 are supported and installed on your system.
#5 17.23 perl: warning: Falling back to a fallback locale ("C.UTF-8").
#5 17.64 debconf: delaying package configuration, since apt-utils is not installed
#5 17.71 Fetched 8904 kB in 1s (10.5 MB/s)
#5 17.76 Error while loading /usr/sbin/dpkg-split: No such file or directory
Sub-process /usr/bin/dpkg returned an error code (1)
------
Failed with Unknown Exception <class 'Exception'> Reason: failed to run command envd build --path /var/folders/01/m569sr9x04j6x8_sr_n1c03r06hms7/T/flyte-qp2ow6ji/sandbox/local_flytekit/13832f2987526bc535adaf7cc3e6fb55 --platform linux/amd64 --output type=image,name=<http://europe-west4-docker.pkg.dev/my-gcp-project/flyte/zeetesting/flytekit:L0sYlwiX5Ne76o2wlm8Y2w..,push=true|europe-west4-docker.pkg.dev/my-gcp-project/flyte/zeetesting/flytekit:L0sYlwiX5Ne76o2wlm8Y2w..,push=true> with error b'time="2023-11-29T10:20:40+01:00" level=fatal msg="failed to build the image: failed to build: failed to wait error group: failed to solve LLB: failed to solve: process \\"/dev/.buildkit_qemu_emulator bash -c apt-get update && apt-get install -y --no-install-recommends git\\" did not complete successfully: exit code: 100"\n'
Additional Information:
Docker Desktop (v 4.25.0)
Flyte v 1.10.0
Python 3.11
Previously it was working for me but now we have updated the fylte version and I am getting this error.
Any idea, how can I solve this issue ?Adrian Loy
11/29/2023, 10:32 AMFabio Grätz
11/29/2023, 12:09 PMpyflyte run --max-parallelism
option of executions.
Even when setting it to 1, I see multiple pods scheduled at the same time while I would have expected only a single one. Am I misunderstanding this option? Is the same for you as well?Andrés Gómez Ferrer
11/29/2023, 12:23 PMEthan Brown
11/29/2023, 3:09 PM{"json":{},"level":"fatal","msg":"Lost leader state. Shutting down.","ts":"2023-11-29T150420Z"}Is this common... or indicative of something that could be misbehaving within the cluster? (The cluster is mostly idling)
Thomas Newton
11/29/2023, 4:04 PMAnother area of slowdown could be the size of the input-output cache that FlytePropeller maintains in-memory. This can be configured, while configuring the storage for FlytePropeller. Rule of thumb, for FlytePropeller with x memory limit, allocate x/2 to the cachehttps://docs.flyte.org/en/latest/deployment/configuration/performance.html#manual-scale-out
Frank Shen
11/29/2023, 11:23 PMtask_resources:
defaults:
cpu: 2
memory: 1Gi
storage: 1Gi
limits:
cpu: 20
memory: 500Gi
storage: 100Gi
gpu: 1
...
cluster_resource_manager:
customData:
- development:
- projectQuotaCpu:
value: "800"
- projectQuotaMemory:
value: "3200Gi"
Frank Shen
11/29/2023, 11:35 PMAlain GALDEMAS
11/30/2023, 9:06 AMunionml train app:model -i '{"hyperparameters": {"C": 1.0, "max_iter": 1000}}'
, and the workflow execute and run OK in flyte, but when I want to predict
unionml predict app:model -f data/sample_features.json
I've got an error :
TypeError: No automatic conversion found from type <class 'sklearn.linear_model._logistic.LogisticRegression'> to FlyteFile.Supported (os.PathLike, str, Flytefile)
which is a flytekit originated error,
my config is unionml 0.2.1, flyteidl 1.5.21, flytekit 1.9.1, scikit-learn 1.3.2, with flyte-binary
I'm stuck with this issue, any idea on how to fix, by specifying a type somewhere ?
or is it a bug due to some new version of packages ?
thanks in advance for any help or hint, at least to workaround this problem, or better I use the s3 storage to store and link the features file ?
Slack ConversationQuentin Chenevier
11/30/2023, 10:06 AMFilipe Fonseca
11/30/2023, 12:27 PMEndre Karlson
11/30/2023, 2:15 PMNandakumar Raghu
11/30/2023, 3:27 PMdefault-node-selector:
in the k8s plugin config to select nodes on our MNG for flyte workers pods. This allows us to use either spot nodes of on-demand nodes on AWS. We want to set up nodeAffinity selectors with weights to try to use spot first and if not available, go to on-demand nodes. We are using flyte-binary chart and the deployment
section in values.yaml has a extraPodSpec
which I believe only applies to the flyte-binary pods. Are pod templates the only way to apply nodeAffinity selection rules to worker pods? Or is there any way to apply these via the values.yaml for flyte-binary?Ethan Brown
11/30/2023, 4:40 PMpyflyte run --remote
generating an incorrect URL? I assume there's a config value set somewhere that's incorrect -- I'm seeing console
twice in the URI like this:
https://mydomain.com/console/console/projects/flytesnacks/domains/staging/executions/f3974b92e876344a3840Muhammad Haritsah Mukhlis
11/30/2023, 5:22 PMAlykhan Tejani
11/30/2023, 6:05 PM@task(
task_config=TfJob(
chief=Chief(replicas=1, image=<some_image>),
ps=PS(replicas=0, image=<some_image>),
worker=Worker(replicas=1, image=<some_image>),
),
)
def func(some_arg: int):
do somethign...
Alykhan Tejani
11/30/2023, 6:06 PMfunc
get run. Also if <some_image> is just run (and its entrypoint executed) how can I pass args to it?Swarup Srinivasan
11/30/2023, 6:21 PMHORIZONTAL_LAYOUT
env var doesn't seem to work, but setting it to false via the URL parameter horizontal-layout
works (found a related issue: 4047#issuecomment-1723528573) - wanted to check if this is a bug with the releaseChristopher Lee Murray
11/30/2023, 9:17 PMClemente Cuevas
11/30/2023, 11:23 PMKlemens Kasseroller
12/01/2023, 12:21 PMGarret Cook
12/01/2023, 5:05 PMflyte-binary
I've added this to my values.yaml:
configuration:
inline:
webhook:
secretManagerType: "AWS"
Where do I put the AWS service account credentials? And may I specify different credentials on different project/environment combinations?
I'd expect to have to put a list of credentials in somewhere, like this for each project/environment:
"AWS_ACCESS_KEY_ID": "some-value"
"AWS_SECRET_ACCESS_KEY": "some-value",
"AWS_DEFAULT_REGION": "some-value"
Thank you for taking a look.Frank Shen
12/01/2023, 6:47 PMFrank Shen
12/01/2023, 6:48 PMFrank Shen
12/01/2023, 6:50 PMFrank Shen
12/01/2023, 9:16 PMpyflyte register .... --service-account spark
And when the task ephemeral pod is having service account = spark, as opposed to service account = default, that task pod is not having permission to access S3.
Where in the flyte-core helm charts should the fixes made?
Maybe the default for this section needs to be corrected?
- key: ad_spark_service_account
value: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: {{ namespace }}
- key: ae_spark_role_binding
value: |
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: spark-role-binding
namespace: {{ namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark-role
subjects:
- kind: ServiceAccount
name: spark
namespace: {{ namespace }}
I got this.
The service account for spark is not having aws iam-role set, compared to the default service account.
kubectl describe sa -n flytesnacks-development
Name: default
Namespace: flytesnacks-development
Labels: <none>
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::245085526351:role/flyte-role
Image pull secrets: artifactory-da-reader-token
Mountable secrets: default-token-x5dbz
Tokens: <none>
Events: <none>
Name: spark
Namespace: flytesnacks-development
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>