Geert
07/05/2023, 2:33 PMflytectl demo start
. Ive already increased task resources to mem=6Gi
(same code ran fine before with 2Gi so I wouldnt expect that to be the issue); Increase VM size (running Docker on macOS); Removing and re-initializing the demo environment.
I didnt change much since I last ran it w/o issues, other than changing a workflow to now a dynamic workflow, and perhaps adding 1 additional task.jeev
Tommy Nam
07/06/2023, 6:50 AMGeert
07/07/2023, 8:11 AMkubectl top pod -n tmp-development
for example. But node memory usage before is around 20% and I don’t see any sudden spike before the Task fails. @Tommy Nam I don’t see any LimitRange or ResourceQuota applied in the demo environment (only some general resource requests/limits around 100-200Mi but thats for the Flyte deployment itself, not the task Pods). Where could I find these cluster-wide limits?FlyteWorkflow
CRD:
Kind: FlyteWorkflow
Execution Config:
Environment Variables: <nil>
Interruptible: <nil>
Max Parallelism: 25
Overwrite Cache: false
Recovery Execution:
Task Plugin Impls:
Task Resources:
Limits:
CPU: 2
Ephemeral Storage: 0
GPU: 1
Memory: 1Gi
Storage: 0
Requests:
CPU: 2
Ephemeral Storage: 0
GPU: 0
Memory: 200Mi
Storage: 0
Execution Id:
Domain: development
Name: fc4eefd4fe2614c0987f
Project: tmp
Im not sure where the 1Gi
here is set (I think from default here https://github.com/flyteorg/flyte/blob/1e3d515550cb338c2edb3919d79c6fa1f0da5a19/charts/flyte-core/values.yaml#L520C4-L531C15). Perhaps also Im misconfiguring the dynamic task’s resources? I have the resources set as follows. Should I configure the @dynamic
workflow also to have 6Gi?
@task(limits=Resources(mem="6Gi"))
def run_task()
# do stuff
@dynamic(limits=Resources(mem="500Mi"))
def base_workflow(config: Config):
for i in list:
run_task()
@workflow
def wf(config: Config):
base_workflow(config=config)
flytectl update cluster-resource-attribute --attrFile cra.yaml
with cra.yaml
attributes:
projectQuotaCpu: "1000"
projectQuotaMemory: 8Gi
domain: development
project: tmp
task_resources:
defaults:
cpu: 100m
memory: 200Mi
storage: 100M
limits:
cpu: 500m
gpu: 1
memory: 8Gi
storage: 10G
Gives:
❯ flytectl demo start --config /Users/{user}/.flyte/config-sandbox.yaml
Error:
strict mode is on but received keys [map[task_resources:{}]] to decode with no config assigned to receive them: failed strict mode check
ERRO[0000]
flyte-sandbox-config
ConfigMap, and restarted the Flyte Pod:
data:
000-core.yaml: |
...
task_resources:
defaults:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 8Gi
gpu: 5
cluster_resources:
customData:
- production:
- projectQuotaCpu:
value: "8"
- projectQuotaMemory:
value: "16Gi"
- staging:
- projectQuotaCpu:
value: "8"
- projectQuotaMemory:
value: "16Gi"
- development:
- projectQuotaCpu:
value: "8"
- projectQuotaMemory:
value: "16Gi"
...
...
flyte:
admin:
disableClusterResourceManager: true
...
flytectl demo start
?Tommy Nam
07/07/2023, 9:29 AMjeev
Samhita Alla
strict mode is on but received keys [map[task_resources:{}]] to decode with no config assigned to receive them: failed strict mode check
ERRO[0000]
Geert
07/07/2023, 2:07 PM~/.flyte/sandbox/config.yaml
doesnt seem to work. @Samhita Alla I used the following, maybe it helps you (it uses https://github.com/mikefarah/yq for yaml processing):
# gets current configmap and store locally
kubectl get cm -n flyte flyte-sandbox-config -o=yaml > configmap-flyte-sandbox-config.yaml
# updates configmap with new values from local file 000-core.yaml
yq eval '.data."000-core.yaml" = "'"$(< ./flyte/000-core.yaml)"'"' configmap-flyte-sandbox-config.yaml > updated-configmap-flyte-sandbox-config.yaml
kubectl -n flyte apply -f updated-configmap-flyte-sandbox-config.yaml
# restart flyte pods to use new values
kubectl delete pods -l <http://app.kubernetes.io/name=flyte-sandbox|app.kubernetes.io/name=flyte-sandbox> -n flyte
# cleanup
rm configmap-flyte-sandbox-config.yaml
rm updated-configmap-flyte-sandbox-config.yaml
000-core.yaml
looks like this (only thing I changed here is increase the resource (memory/cpu/etc.) limits:
admin:
endpoint: localhost:8089
insecure: true
catalog-cache:
endpoint: localhost:8081
insecure: true
type: datacatalog
task_resources:
defaults:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 8Gi
gpu: 5
cluster_resources:
customData:
- production:
- projectQuotaCpu:
value: 8
- projectQuotaMemory:
value: 16Gi
- staging:
- projectQuotaCpu:
value: 8
- projectQuotaMemory:
value: 16Gi
- development:
- projectQuotaCpu:
value: 8
- projectQuotaMemory:
value: 16Gi
standaloneDeployment: false
templatePath: /etc/flyte/cluster-resource-templates
logger:
show-source: true
level: 6
propeller:
create-flyteworkflow-crd: true
webhook:
certDir: /var/run/flyte/certs
localCert: true
secretName: flyte-sandbox-webhook-secret
serviceName: flyte-sandbox-webhook
servicePort: 443
flyte:
admin:
disableClusterResourceManager: true
disableScheduler: false
disabled: false
seedProjects:
- flytesnacks
dataCatalog:
disabled: false
propeller:
disableWebhook: false
disabled: false
jeev
> cat ~/.flyte/sandbox/config.yaml
task_resources:
defaults:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 8Gi
gpu: 5
is passed through to the pod:
> kubectl exec -it flyte-sandbox-79fc858b47-mj5w9 -- cat /etc/flyte/config.d/999-extra-config.yaml
Defaulted container "flyte" out of: flyte, flyteagent, wait-for-db (init)
task_resources:
defaults:
cpu: 500m
memory: 1Gi
limits:
cpu: 2
memory: 8Gi
gpu: 5
Seems to be working as intended.Geert
07/07/2023, 6:08 PMflytectl demo reload
gives the strict mode ...
message and doesn't apply. My config is in ~/.flyte/sandbox-config.yaml
(there is no config file in ~/.flyte/sandbox/
, only the kubeconfig
).jeev
~/.flyte/sandbox/config.yaml
flytectl
config, but rather flyte
config that's passed through to the pod.Geert
07/07/2023, 6:09 PMjeev