Dan Farrell
10/12/2023, 12:37 PMflytectl demo
is there a way to increase the allowed total ~G~CPU/Memory per task?
My mem="10Gi" requests are received by the flyte server and silently truncated to "1Gi" -_-projectQuotaMemory
somehow....
following: https://docs.flyte.org/en/latest/deployment/configuration/general.html#cluster-resources
I run flytectl get cluster-resource-attribute -p flytesnacks -d development
but I get:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0x147f37c]
goroutine 1 [running]:
<http://github.com/flyteorg/flytectl/cmd/get.FetchAndUnDecorateMatchableAttr({0x257b478|github.com/flyteorg/flytectl/cmd/get.FetchAndUnDecorateMatchableAttr({0x257b478>?, 0xc0000560a8?}, {0x7ffc1eb6a9db?, 0xc00099f998?}, {0x7ffc1eb6a9ea?, 0xc000ae5090?}, {0x0?, 0xc000afc701?}, {0x0, 0x0}, ...)
/home/runner/work/flytectl/flytectl/cmd/get/matchable_attribute_util.go:32 +0xbc
<http://github.com/flyteorg/flytectl/cmd/get.getClusterResourceAttributes({0x257b478|github.com/flyteorg/flytectl/cmd/get.getClusterResourceAttributes({0x257b478>, 0xc0000560a8}, {0xc000315d80, 0x0, 0x257b120?}, {0x0, {0x0, 0x0}, {0x0, 0x0}, ...})
/home/runner/work/flytectl/flytectl/cmd/get/matchable_cluster_resource_attribute.go:78 +0x2a6
<http://github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc0009fbb80|github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc0009fbb80>?, {0xc000315d80, 0x0, 0x4})
/home/runner/work/flytectl/flytectl/cmd/core/cmd.go:70 +0x93d
<http://github.com/spf13/cobra.(*Command).execute(0xc0009fbb80|github.com/spf13/cobra.(*Command).execute(0xc0009fbb80>, {0xc000315d40, 0x4, 0x4})
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
<http://github.com/spf13/cobra.(*Command).ExecuteC(0xc0009fb400)|github.com/spf13/cobra.(*Command).ExecuteC(0xc0009fb400)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd
<http://github.com/spf13/cobra.(*Command).Execute(...)|github.com/spf13/cobra.(*Command).Execute(...)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
<http://github.com/flyteorg/flytectl/cmd.ExecuteCmd()|github.com/flyteorg/flytectl/cmd.ExecuteCmd()>
/home/runner/work/flytectl/flytectl/cmd/root.go:137 +0x1e
main.main()
/home/runner/work/flytectl/flytectl/main.go:12 +0x1d
flytectl update cluster-resource-attribute --attrFile cra.yaml
cra.yaml:
attributes:
projectQuotaCpu: "1000"
projectQuotaMemory: 5Ti
domain: development
project: flytesnacks
yields:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1aeed82]
goroutine 1 [running]:
<http://github.com/flyteorg/flytectl/cmd/update.DecorateAndUpdateMatchableAttr({0x257b478|github.com/flyteorg/flytectl/cmd/update.DecorateAndUpdateMatchableAttr({0x257b478>, 0xc0000560a8}, {0xc000a9cb00, 0xb}, {0xc000a9caf0, 0xb}, {0x0, 0x0}, {0x0, 0x0}, ...)
/home/runner/work/flytectl/flytectl/cmd/update/matchable_attribute_util.go:37 +0x2e2
<http://github.com/flyteorg/flytectl/cmd/update.updateClusterResourceAttributesFunc({0x257b478|github.com/flyteorg/flytectl/cmd/update.updateClusterResourceAttributesFunc({0x257b478>, 0xc0000560a8}, {0xc000a1fb40?, 0x100000049c5e5?, 0x257b120?}, {0x0, {0x0, 0x0}, {0x0, 0x0}, ...})
/home/runner/work/flytectl/flytectl/cmd/update/matchable_cluster_resource_attribute.go:75 +0x206
<http://github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc00083b680|github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc00083b680>?, {0xc000660760, 0x0, 0x2})
/home/runner/work/flytectl/flytectl/cmd/core/cmd.go:70 +0x93d
<http://github.com/spf13/cobra.(*Command).execute(0xc00083b680|github.com/spf13/cobra.(*Command).execute(0xc00083b680>, {0xc000660740, 0x2, 0x2})
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
<http://github.com/spf13/cobra.(*Command).ExecuteC(0xc0003b3b80)|github.com/spf13/cobra.(*Command).ExecuteC(0xc0003b3b80)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd
<http://github.com/spf13/cobra.(*Command).Execute(...)|github.com/spf13/cobra.(*Command).Execute(...)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
<http://github.com/flyteorg/flytectl/cmd.ExecuteCmd()|github.com/flyteorg/flytectl/cmd.ExecuteCmd()>
/home/runner/work/flytectl/flytectl/cmd/root.go:137 +0x1e
main.main()
/home/runner/work/flytectl/flytectl/main.go:12 +0x1d
$ flytectl update cluster-resource-attribute --attrFile cra.yaml --config ~/.flyte/config-sandbox.yaml
Updated attributes from flytesnacks project and domain development
seems to work!! My jobs aren't OOMing!!
Maybe stragely If i go grab the k8s definition from the k8s dashboard I see the pod has the instructions:
resources:
limits:
cpu: '2'
memory: 1Gi
requests:
cpu: '2'
memory: 1Gi
while my Task has defined: requests=Resources(mem="6500Mi"))
So I feel like I'm not correctly doing this....cra.yaml
above I needed to update the task-resource-attribute
as well:
tra.yaml:
defaults:
cpu: "1"
memory: 1Gi
limits:
cpu: "1000"
memory: 5Ti
project: flytesnacks
domain: development
cmd:
flytectl update task-resource-attribute --attrFile tra.yaml --config ~/.flyte/config-sandbox.yaml
resources:
limits:
cpu: '1'
memory: 6500Mi
requests:
cpu: '1'
memory: 6500Mi
Kevin Su
10/12/2023, 8:09 PMDan Farrell
10/12/2023, 8:29 PMKevin Su
10/12/2023, 11:44 PML godlike
10/13/2023, 12:21 AMRyuu
10/26/2023, 8:01 AML godlike
10/30/2023, 7:08 AMDan Farrell
10/30/2023, 3:29 PMflytectl demo start --image futureoutlier/flyte-sandbox:gpu-v2 --disable-agent --force
does not seem to work on my ubuntu gpu workstation, it just exits with no logs.L godlike
10/30/2023, 3:30 PMDan Farrell
10/30/2023, 3:30 PMdocker run --rm -it futureoutlier/flyte-sandbox:gpu-v2
also gives no stdout and fails immediately[2023-10-30T15:31:24+00:00] Running k3d entrypoints...
[2023-10-30T15:31:24+00:00] Running /bin/k3d-entrypoint-cgroupv2.sh
[2023-10-30T15:31:24+00:00] Running /bin/k3d-entrypoint-flyte-sandbox-bootstrap.sh
2023/10/30 15:31:24 failed to apply transformations: lookup host.docker.internal on 205.171.3.26:53: no such host
Ryuu
10/30/2023, 3:32 PMDan Farrell
10/30/2023, 3:35 PMflytectl demo start --image futureoutlier/flyte-sandbox:gpu-v2 --disable-agent --force
Does not exit with no logs, it gives me the following output,
{"status":"Status: Downloaded newer image for futureoutlier/flyte-sandbox:gpu-v2"}
🧑🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
flyte-sandbox
container that it spawns fails and has no logs (
73a7896f3379 futureoutlier/flyte-sandbox:gpu-v2 "/bin/k3d-entrypoint…" 17 minutes ago Exited (1) 16 minutes ago flyte-sandbox
L godlike
10/30/2023, 3:38 PMDan Farrell
10/30/2023, 4:16 PM$ docker run --gpus=all --rm -it --entrypoint /bin/bash futureoutlier/flyte-sandbox:gpu-v2
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.1, please update your driver to a newer version, or use an earlier cuda container: unknown.
My cuda is 11.8L godlike
10/31/2023, 3:06 AMDan Farrell
10/31/2023, 3:14 AML godlike
10/31/2023, 3:15 AMDan Farrell
10/31/2023, 4:34 AMflyte-sandbox-bootstrap
doing something wild here? or is this a problem with my env?root@153de33f2937:/# cat /var/log/k3d-entrypoints_231031053514.log
[2023-10-31T05:35:14+00:00] Running k3d entrypoints...
[2023-10-31T05:35:14+00:00] Running /bin/k3d-entrypoint-cgroupv2.sh
[2023-10-31T05:35:14+00:00] Running /bin/k3d-entrypoint-flyte-sandbox-bootstrap.sh
[2023-10-31T05:35:14+00:00] Running /bin/k3d-entrypoint-gpu-check.sh
/bin/k3d-entrypoint.sh: 14: /bin/k3d-entrypoint-gpu-check.sh: Permission denied
root@153de33f2937:/# chmod +x /bin/k3d-entrypoint-gpu-check.sh
this needs to be executableL godlike
10/31/2023, 5:37 AMflyte-sandbox-bootstrap
works
2. how to make k3s
get the hostDan Farrell
10/31/2023, 5:42 AMchmod +x ./docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh
L godlike
10/31/2023, 5:44 AMkubectl describe node | grep -i gpu
Ryuu
10/31/2023, 5:45 AML godlike
10/31/2023, 5:46 AM