sparse-advantage-22780
10/12/2023, 12:37 PMflytectl demo
is there a way to increase the allowed total ~G~CPU/Memory per task?
My mem="10Gi" requests are received by the flyte server and silently truncated to "1Gi" -_-sparse-advantage-22780
10/12/2023, 4:02 PMprojectQuotaMemory
somehow....
following: https://docs.flyte.org/en/latest/deployment/configuration/general.html#cluster-resources
I run flytectl get cluster-resource-attribute -p flytesnacks -d development
but I get:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0x147f37c]
goroutine 1 [running]:
<http://github.com/flyteorg/flytectl/cmd/get.FetchAndUnDecorateMatchableAttr({0x257b478|github.com/flyteorg/flytectl/cmd/get.FetchAndUnDecorateMatchableAttr({0x257b478>?, 0xc0000560a8?}, {0x7ffc1eb6a9db?, 0xc00099f998?}, {0x7ffc1eb6a9ea?, 0xc000ae5090?}, {0x0?, 0xc000afc701?}, {0x0, 0x0}, ...)
/home/runner/work/flytectl/flytectl/cmd/get/matchable_attribute_util.go:32 +0xbc
<http://github.com/flyteorg/flytectl/cmd/get.getClusterResourceAttributes({0x257b478|github.com/flyteorg/flytectl/cmd/get.getClusterResourceAttributes({0x257b478>, 0xc0000560a8}, {0xc000315d80, 0x0, 0x257b120?}, {0x0, {0x0, 0x0}, {0x0, 0x0}, ...})
/home/runner/work/flytectl/flytectl/cmd/get/matchable_cluster_resource_attribute.go:78 +0x2a6
<http://github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc0009fbb80|github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc0009fbb80>?, {0xc000315d80, 0x0, 0x4})
/home/runner/work/flytectl/flytectl/cmd/core/cmd.go:70 +0x93d
<http://github.com/spf13/cobra.(*Command).execute(0xc0009fbb80|github.com/spf13/cobra.(*Command).execute(0xc0009fbb80>, {0xc000315d40, 0x4, 0x4})
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
<http://github.com/spf13/cobra.(*Command).ExecuteC(0xc0009fb400)|github.com/spf13/cobra.(*Command).ExecuteC(0xc0009fb400)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd
<http://github.com/spf13/cobra.(*Command).Execute(...)|github.com/spf13/cobra.(*Command).Execute(...)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
<http://github.com/flyteorg/flytectl/cmd.ExecuteCmd()|github.com/flyteorg/flytectl/cmd.ExecuteCmd()>
/home/runner/work/flytectl/flytectl/cmd/root.go:137 +0x1e
main.main()
/home/runner/work/flytectl/flytectl/main.go:12 +0x1d
sparse-advantage-22780
10/12/2023, 4:03 PMflytectl update cluster-resource-attribute --attrFile cra.yaml
cra.yaml:
attributes:
projectQuotaCpu: "1000"
projectQuotaMemory: 5Ti
domain: development
project: flytesnacks
yields:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1aeed82]
goroutine 1 [running]:
<http://github.com/flyteorg/flytectl/cmd/update.DecorateAndUpdateMatchableAttr({0x257b478|github.com/flyteorg/flytectl/cmd/update.DecorateAndUpdateMatchableAttr({0x257b478>, 0xc0000560a8}, {0xc000a9cb00, 0xb}, {0xc000a9caf0, 0xb}, {0x0, 0x0}, {0x0, 0x0}, ...)
/home/runner/work/flytectl/flytectl/cmd/update/matchable_attribute_util.go:37 +0x2e2
<http://github.com/flyteorg/flytectl/cmd/update.updateClusterResourceAttributesFunc({0x257b478|github.com/flyteorg/flytectl/cmd/update.updateClusterResourceAttributesFunc({0x257b478>, 0xc0000560a8}, {0xc000a1fb40?, 0x100000049c5e5?, 0x257b120?}, {0x0, {0x0, 0x0}, {0x0, 0x0}, ...})
/home/runner/work/flytectl/flytectl/cmd/update/matchable_cluster_resource_attribute.go:75 +0x206
<http://github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc00083b680|github.com/flyteorg/flytectl/cmd/core.generateCommandFunc.func1(0xc00083b680>?, {0xc000660760, 0x0, 0x2})
/home/runner/work/flytectl/flytectl/cmd/core/cmd.go:70 +0x93d
<http://github.com/spf13/cobra.(*Command).execute(0xc00083b680|github.com/spf13/cobra.(*Command).execute(0xc00083b680>, {0xc000660740, 0x2, 0x2})
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856 +0x67c
<http://github.com/spf13/cobra.(*Command).ExecuteC(0xc0003b3b80)|github.com/spf13/cobra.(*Command).ExecuteC(0xc0003b3b80)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd
<http://github.com/spf13/cobra.(*Command).Execute(...)|github.com/spf13/cobra.(*Command).Execute(...)>
/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902
<http://github.com/flyteorg/flytectl/cmd.ExecuteCmd()|github.com/flyteorg/flytectl/cmd.ExecuteCmd()>
/home/runner/work/flytectl/flytectl/cmd/root.go:137 +0x1e
main.main()
/home/runner/work/flytectl/flytectl/main.go:12 +0x1d
sparse-advantage-22780
10/12/2023, 4:34 PM$ flytectl update cluster-resource-attribute --attrFile cra.yaml --config ~/.flyte/config-sandbox.yaml
Updated attributes from flytesnacks project and domain development
seems to work!! My jobs aren't OOMing!!
Maybe stragely If i go grab the k8s definition from the k8s dashboard I see the pod has the instructions:
resources:
limits:
cpu: '2'
memory: 1Gi
requests:
cpu: '2'
memory: 1Gi
while my Task has defined: requests=Resources(mem="6500Mi"))
So I feel like I'm not correctly doing this....sparse-advantage-22780
10/12/2023, 4:41 PMsparse-advantage-22780
10/12/2023, 5:11 PMcra.yaml
above I needed to update the task-resource-attribute
as well:
tra.yaml:
defaults:
cpu: "1"
memory: 1Gi
limits:
cpu: "1000"
memory: 5Ti
project: flytesnacks
domain: development
cmd:
flytectl update task-resource-attribute --attrFile tra.yaml --config ~/.flyte/config-sandbox.yaml
sparse-advantage-22780
10/12/2023, 5:12 PMresources:
limits:
cpu: '1'
memory: 6500Mi
requests:
cpu: '1'
memory: 6500Mi
sparse-advantage-22780
10/12/2023, 6:31 PMglamorous-carpet-83516
10/12/2023, 8:09 PMsparse-advantage-22780
10/12/2023, 8:29 PMglamorous-carpet-83516
10/12/2023, 11:44 PMdamp-lion-88352
10/13/2023, 12:21 AMfull-evening-87657
10/26/2023, 8:01 AMdamp-lion-88352
10/30/2023, 7:08 AMdamp-lion-88352
10/30/2023, 7:09 AMsparse-advantage-22780
10/30/2023, 3:29 PMflytectl demo start --image futureoutlier/flyte-sandbox:gpu-v2 --disable-agent --force
does not seem to work on my ubuntu gpu workstation, it just exits with no logs.damp-lion-88352
10/30/2023, 3:30 PMsparse-advantage-22780
10/30/2023, 3:30 PMdocker run --rm -it futureoutlier/flyte-sandbox:gpu-v2
also gives no stdout and fails immediatelysparse-advantage-22780
10/30/2023, 3:32 PM[2023-10-30T15:31:24+00:00] Running k3d entrypoints...
[2023-10-30T15:31:24+00:00] Running /bin/k3d-entrypoint-cgroupv2.sh
[2023-10-30T15:31:24+00:00] Running /bin/k3d-entrypoint-flyte-sandbox-bootstrap.sh
2023/10/30 15:31:24 failed to apply transformations: lookup host.docker.internal on 205.171.3.26:53: no such host
full-evening-87657
10/30/2023, 3:32 PMsparse-advantage-22780
10/30/2023, 3:35 PMflytectl demo start --image futureoutlier/flyte-sandbox:gpu-v2 --disable-agent --force
Does not exit with no logs, it gives me the following output,
{"status":"Status: Downloaded newer image for futureoutlier/flyte-sandbox:gpu-v2"}
🧑🏭 booting Flyte-sandbox container
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
Waiting for cluster to come up...
sparse-advantage-22780
10/30/2023, 3:36 PMflyte-sandbox
container that it spawns fails and has no logs (
73a7896f3379 futureoutlier/flyte-sandbox:gpu-v2 "/bin/k3d-entrypoint…" 17 minutes ago Exited (1) 16 minutes ago flyte-sandbox
damp-lion-88352
10/30/2023, 3:38 PMdamp-lion-88352
10/30/2023, 3:38 PMsparse-advantage-22780
10/30/2023, 4:16 PM$ docker run --gpus=all --rm -it --entrypoint /bin/bash futureoutlier/flyte-sandbox:gpu-v2
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.1, please update your driver to a newer version, or use an earlier cuda container: unknown.
My cuda is 11.8damp-lion-88352
10/31/2023, 3:06 AMdamp-lion-88352
10/31/2023, 3:07 AMsparse-advantage-22780
10/31/2023, 3:14 AMdamp-lion-88352
10/31/2023, 3:15 AMsparse-advantage-22780
10/31/2023, 4:34 AMsparse-advantage-22780
10/31/2023, 4:35 AMflyte-sandbox-bootstrap
doing something wild here? or is this a problem with my env?sparse-advantage-22780
10/31/2023, 5:36 AMsparse-advantage-22780
10/31/2023, 5:37 AMroot@153de33f2937:/# cat /var/log/k3d-entrypoints_231031053514.log
[2023-10-31T05:35:14+00:00] Running k3d entrypoints...
[2023-10-31T05:35:14+00:00] Running /bin/k3d-entrypoint-cgroupv2.sh
[2023-10-31T05:35:14+00:00] Running /bin/k3d-entrypoint-flyte-sandbox-bootstrap.sh
[2023-10-31T05:35:14+00:00] Running /bin/k3d-entrypoint-gpu-check.sh
/bin/k3d-entrypoint.sh: 14: /bin/k3d-entrypoint-gpu-check.sh: Permission denied
root@153de33f2937:/# chmod +x /bin/k3d-entrypoint-gpu-check.sh
this needs to be executabledamp-lion-88352
10/31/2023, 5:37 AMdamp-lion-88352
10/31/2023, 5:37 AMdamp-lion-88352
10/31/2023, 5:41 AMflyte-sandbox-bootstrap
works
2. how to make k3s
get the hostdamp-lion-88352
10/31/2023, 5:41 AMsparse-advantage-22780
10/31/2023, 5:42 AMsparse-advantage-22780
10/31/2023, 5:44 AMchmod +x ./docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh
damp-lion-88352
10/31/2023, 5:44 AMdamp-lion-88352
10/31/2023, 5:44 AMdamp-lion-88352
10/31/2023, 5:45 AMdamp-lion-88352
10/31/2023, 5:45 AMkubectl describe node | grep -i gpu
full-evening-87657
10/31/2023, 5:45 AMdamp-lion-88352
10/31/2023, 5:46 AMdamp-lion-88352
10/31/2023, 6:12 AM