Cody Scandore
08/15/2023, 1:48 PMflyte-binary
chart on EKS. When I try to add auth
the container goes to CrashLoopBackoff.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m49s default-scheduler Successfully assigned flyte/flyte-flyte-binary-6479b586cc-sb75s to ip-192-xxx-xx-xxx.us-west-2.compute.internal
Normal Pulled 6m48s kubelet Container image "postgres:15-alpine" already present on machine
Normal Created 6m48s kubelet Created container wait-for-db
Normal Started 6m48s kubelet Started container wait-for-db
Normal Started 6m47s kubelet Started container gen-admin-auth-secret
Normal Created 6m47s kubelet Created container gen-admin-auth-secret
Normal Pulled 6m47s kubelet Container image "<http://cr.flyte.org/flyteorg/flyte-binary-release:v1.5.0|cr.flyte.org/flyteorg/flyte-binary-release:v1.5.0>" already present on machine
Normal Started 6m45s kubelet Started container flyte
Warning Unhealthy 6m18s (x3 over 6m38s) kubelet Liveness probe failed: Get "<http://192.168.37.60:8088/healthcheck>": dial tcp 192.168.37.60:8088: connect: connection refused
Normal Killing 6m18s kubelet Container flyte failed liveness probe, will be restarted
Normal Pulled 5m48s (x2 over 6m45s) kubelet Container image "<http://cr.flyte.org/flyteorg/flyte-binary-release:v1.5.0|cr.flyte.org/flyteorg/flyte-binary-release:v1.5.0>" already present on machine
Normal Created 5m48s (x2 over 6m45s) kubelet Created container flyte
Warning Unhealthy 107s (x52 over 6m44s) kubelet Readiness probe failed: Get "<http://192.168.37.60:8088/healthcheck>": dial tcp 192.168.37.60:8088: connect: connection refused
...
{"json":{"src":"composite_workqueue.go:98"},"level":"debug","msg":"Dynamically configured batch size [-1]","ts":"2023-08-15T13:46:21Z"}
{"json":{"src":"composite_workqueue.go:129"},"level":"debug","msg":"Exiting SubQueue handler batch round","ts":"2023-08-15T13:46:21Z"}
{"json":{"src":"composite_workqueue.go:88"},"level":"debug","msg":"Subqueue handler batch round","ts":"2023-08-15T13:46:22Z"}
{"json":{"src":"composite_workqueue.go:98"},"level":"debug","msg":"Dynamically configured batch size [-1]","ts":"2023-08-15T13:46:22Z"}
{"json":{"src":"composite_workqueue.go:129"},"level":"debug","msg":"Exiting SubQueue handler batch round","ts":"2023-08-15T13:46:22Z"}
Victor Delépine
08/15/2023, 5:52 PMBroder Peters
08/18/2023, 2:25 PMPrzemys
08/22/2023, 10:24 AMflyte-binary
, could you help me to rise this limit?
In my task I want to use such decorator:
@task(limits=Resources(mem='3Gi'), requests=Resources(mem='2Gi'))
I've found this config in values.yaml
:
flyte-binary:
deployment:
# resources Resource limits and requests for Flyte container
# Uncomment and update to specify resources for deployment
resources:
limits:
memory: 3Gi
requests:
cpu: 5
but it doesn't work. It probably just increases flyte deployment limits.David Espejo (he/him)
08/24/2023, 3:18 PMJan Fiedler
08/29/2023, 2:25 PMAZURE_STORAGE_ACCOUNT_NAME
, AZURE_STORAGE_ACCOUNT_KEY
) in the pods that run my Flyte Tasks. In the future i would like to use workload identities instead of storage account keys but thats another topic.
My Question: Is it even possible to connect to 2 different azure storage accounts with flytekit / fsspec and the way flyte works? What i have in mind is having a cluster storage account for Flyte (metadata) and a some user storage account where i upload and download user data via Flytefile / Flytedirectories. Hope this made sense 🙂Guy Harel
08/31/2023, 12:56 PMconfiguration:
inline:
cluster_resources:
customData:
- production:
- defaultIamRole:
value: arn:aws:iam::<redacted>:role/flyte_user_role_staging
- staging:
- defaultIamRole:
value: arn:aws:iam::<redacted>:role/flyte_user_role_staging
- development:
- defaultIamRole:
value: arn:aws:iam::<redacted>:role/flyte_user_role_staging
describe pod shows it is using the default SA:
Name: ap26hjbb7gbjdzc9fs94-n0-0
Namespace: flytesnacks-development
Priority: 0
Service Account: default
kubectl describe sa: (The default SA is not getting annotated)
Name: default
Namespace: flytesnacks-development
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
Guy Harel
09/03/2023, 12:33 PM{
"level": "error",
"ts": "2023-09-03T12:17:51Z",
"msg": "Reconciler error",
"controller": "ingress",
"object": {
"name": "flyte-backend-flyte-binary-http",
"namespace": "flyte"
},
"namespace": "flyte",
"name": "flyte-backend-flyte-binary-http",
"reconcileID": "65bc9264-18ff-4930-b29b-ac1abf7cf524",
"error": "InvalidParameter: 1 validation error(s) found.\n- minimum field value of 1, CreateTargetGroupInput.Port.\n"
}
Probably due to this section in the configuration it is trying to implement:
"AWS::ElasticLoadBalancingV2::TargetGroup": {
"flyte/flyte-backend-flyte-binary-http-flyte-backend-flyte-binary-http:8088": {
"spec": {
"name": "k8s-flyte-flytebac-743fe1ff44",
"targetType": "instance",
"port": 0,
"protocol": "HTTP",
"protocolVersion": "HTTP1",
"ipAddressType": "ipv4",
"healthCheckConfig": {
"port": "traffic-port",
"protocol": "HTTP",
"path": "/",
"matcher": {
"httpCode": "200"
},
"intervalSeconds": 15,
"timeoutSeconds": 5,
"healthyThresholdCount": 2,
"unhealthyThresholdCount": 2
}
}
}
},
Notice "port: 0" in there, I think that's what it's choking on. But I'm not sure why it's not picking up the correct port... The current ingress for the http backend is:
Name: flyte-backend-flyte-binary-http
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.9.1|helm.sh/chart=flyte-binary-v1.9.1>
Namespace: flyte
Address:
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
<http://staging.flyte.brain.space|staging.flyte.brain.space>
/console flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/console/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/api flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/api/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/healthcheck flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/v1/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/.well-known flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/.well-known/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/login flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/login/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/logout flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/logout/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/callback flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/callback/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/me flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/config flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/config/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/oauth2 flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
/oauth2/* flyte-backend-flyte-binary-http:8088 (10.0.60.54:8088)
Annotations: <http://alb.ingress.kubernetes.io/target-type|alb.ingress.kubernetes.io/target-type>: instance
<http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
<http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 14m (x23 over 3h6m) ingress Failed deploy model due to InvalidParameter: 1 validation error(s) found.
- minimum field value of 1, CreateTargetGroupInput.Port.
Any idea?...Guy Harel
09/03/2023, 6:28 PMKetan (kumare3)
Prasad Bhalerao
09/04/2023, 10:15 AMDavid Espejo (he/him)
09/05/2023, 10:58 PMPrasad Bhalerao
09/06/2023, 7:22 PMBroder Peters
09/07/2023, 11:54 AMrules:
- apiGroups:
- ""
- <http://flyte.lyft.com|flyte.lyft.com>
- <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
resources:
- configmaps
- flyteworkflows
- namespaces
- pods
- resourcequotas
- roles
- rolebindings
- secrets
- services
- serviceaccounts
- spark-role
verbs:
- '*'
I get that I can overwrite those and restrict it more, but when does need the flyteadmin need more of those permissions besides the flyteworkflows
one?Prasad Bhalerao
09/07/2023, 3:51 PMvendor.4cb66fb8bb2bca222104.js:8 Error: invalid wire type 4 at offset 1
at l.skipType (vendor.4cb66fb8bb2bca222104.js:2:33214)
at e.decode (vendor.4cb66fb8bb2bca222104.js:8:1795729)
at t.decodeProtoResponse (main.31219686ea86aafa4419.js:1:541961)
at t.getAdminEntity (main.31219686ea86aafa4419.js:1:539552)
receving this error in the browsers console... (edited)Michael Tinsley
09/08/2023, 7:53 AMflyte-binary
deployment? This is all I can find about it in the docsPrasad Bhalerao
09/08/2023, 10:14 AM{
"error": "Resource [{Project:flytesnacks Domain:development Workflow: LaunchPlan: ResourceType:WORKFLOW_EXECUTION_CONFIG}] not found",
"code": 5,
"message": "Resource [{Project:flytesnacks Domain:development Workflow: LaunchPlan: ResourceType:WORKFLOW_EXECUTION_CONFIG}] not found"
}
What's this?Prasad Bhalerao
09/08/2023, 3:27 PMUria Franko
09/10/2023, 3:21 PMPrasad Bhalerao
09/12/2023, 6:59 AMerror syncing 'flytesnacks-development/fbc2876f45600492bbf4': Operation cannot be fulfilled on <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com> "fbc2876f45600492bbf4": the object has been modified; please apply your changes to the latest version and try again
Slack ConversationUria Franko
09/12/2023, 11:38 AM12-9-2023 14:30:33.606
I0912 11:30:33.606660 1 static_autoscaler.go:509] ip-10-2-54-164.eu-central-1.compute.internal is unneeded since 2023-09-12 11:15:47.260684064 +0000 UTC m=+10982.039451071 duration 14m46.343709026s
12-9-2023 14:30:33.606
I0912 11:30:33.606586 1 scale_down.go:448] Node ip-10-2-165-69.eu-central-1.compute.internal - <http://nvidia.com/gpu|nvidia.com/gpu> utilization 0.000000
12-9-2023 14:30:33.606
I0912 11:30:33.606577 1 cluster.go:224] node ip-10-2-165-69.eu-central-1.compute.internal has unready GPU
12-9-2023 14:30:33.606
I0912 11:30:33.606565 1 scale_down.go:448] Node ip-10-2-54-164.eu-central-1.compute.internal - <http://nvidia.com/gpu|nvidia.com/gpu> utilization 0.000000
12-9-2023 14:30:33.606
I0912 11:30:33.606547 1 cluster.go:224] node ip-10-2-54-164.eu-central-1.compute.internal has unready GPU
Node definition
worker-single-gpu = {
dedicated_node_role = "worker"
instance_type = "g4dn.xlarge"
gpu_accelerator = "nvidia-tesla-t4"
gpu_count = 1
min_size = 0
max_size = 10
local_ssd_size_gb = 160
root_disk_size_gb = 200
}
Cody Scandore
09/12/2023, 6:50 PMJagadeesh Avasarala
09/14/2023, 10:17 AMhttps://<my Ingress DNS name>/flyte
is where flyte is running and is redirecting to the okta login page, and after successful login it is going to https://<my Ingress DNS name>/callback?code=<code>&state=<state>
which is failing with 401 UnauthorizedAbin Shahab
09/15/2023, 12:29 AMLaura Lin
09/18/2023, 8:42 PMflyte-pod-webhook
or syncresources
have a nodeselector that's configurable through the helm chart? all the other flyte pods seem to have one but not thisTerence Kent
09/19/2023, 2:06 PMstow
that adds Azure AD support (flyteorg/stow/pull/9, related convo here).
Part of the effort in shoring up that PR is testing the stow
branch in a Flyte deployment. The changes are tested within the stow
project, but being able to try it all together would be a nice early check.
Is that something somebody here is interesting is trying out? Or, should I just follow the development guide.Uria Franko
09/19/2023, 3:31 PMUria Franko
09/19/2023, 8:50 PMimage = ImageSpec(
name="flyte",
base_image="docker pull <http://ghcr.io/flyteorg/flytekit:py3.9-1.9.1|ghcr.io/flyteorg/flytekit:py3.9-1.9.1>",
registry="<http://example.dkr.ecr.eu-central-1.amazonaws.com|example.dkr.ecr.eu-central-1.amazonaws.com>",
python_version="3.9",
cuda="11.6.2",
packages=["torch"],
)
Nandakumar Raghu
09/26/2023, 2:10 PMTerence Kent
09/27/2023, 3:22 AMAttach theThat's a bit too much access for me to grant to flyte in most accounts, so I'd like to pair that down. I see from the Opta IaC for flyte, both those categories are provided the Opta s3 "write" access alias, which seems to translate to this:policy for now. S3 access can be tweaked later to narrow down the scope.AmazonS3FullAccess
"s3:GetObject*",
"s3:PutObject*",
"s3:DeleteObject*",
"s3:ListBucket"
Does that sound about right? It seems a little narrow, I would have expected to also include things like AbortMultipartUpload
, GetBucketAcl
, etc.