Bernhard Stadlbauer
10/26/2023, 6:02 PMByron Hsu
10/26/2023, 6:03 PMFabio Grätz
10/27/2023, 9:56 AMFlyteRecoverableException
from torch elastic worker processes up to the main process to fix retries.
Can somebody please review? 🙏Heet Vekariya
10/28/2023, 3:59 PMHeet Vekariya
10/28/2023, 3:59 PMHeet Vekariya
10/28/2023, 4:00 PMHeet Vekariya
10/28/2023, 4:01 PMFabio Grätz
10/30/2023, 8:43 AMDan Farrell
10/31/2023, 7:47 PME1031 19:35:27.628662 66 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: cgroups: cgroup mountpoint does not exist: unknown" pod="flyte/flyte-sandbox-postgresql-0"
E1031 19:35:27.628687 66 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: cgroups: cgroup mountpoint does not exist: unknown" pod="flyte/flyte-sandbox-postgresql-0"
E1031 19:35:27.628758 66 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"flyte-sandbox-postgresql-0_flyte(43c0a74e-0d0d-46b9-b4e5-50d76ca102d7)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"flyte-sandbox-postgresql-0_flyte(43c0a74e-0d0d-46b9-b4e5-50d76ca102d7)\\\": rpc error: code = Unknown desc = failed to create containerd task: cgroups: cgroup mountpoint does not exist: unknown\"" pod="flyte/flyte-sandbox-postgresql-0" podUID="43c0a74e-0d0d-46b9-b4e5-50d76ca102d7"
W1031 19:35:28.620888 66 manager.go:1159] Failed to process watch event {EventType:0 Name:/kubepods/besteffort/pod174bfdd2-289e-45f4-8d89-3420e1fe3835/668db6ca7463b2bc7452a10b05dd6e48d41dc989cfb6b0771b958996958c76f2 WatchSource:0}: container "668db6ca7463b2bc7452a10b05dd6e48d41dc989cfb6b0771b958996958c76f2" in namespace "<http://k8s.io|k8s.io>": not found
I also have this error at the top of my logs:
sed: couldn't flush stdout: Device or resource busy
time="2023-10-31T19:41:12Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
time="2023-10-31T19:41:12Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/ab2055bc72380bad965b219e8688ac02b2e1b665cad6bdde1f8f087637aa81df"
time="2023-10-31T19:41:15Z" level=info msg="Starting k3s v1.28.2+k3s1 (6330a5b4)"
Does anyone have any idea why https://github.com/flyteorg/flyte/blob/master/docker/sandbox-bundled/bin/k3d-entrypoint-cgroupv2.sh#L19 this line might be failing?Bernhard Stadlbauer
11/02/2023, 8:17 AMconda
packages for flytekit
(here; latest 1.9.1
)) as well as flyteidl
(here; latest 1.5.17
)) are not up to date.
At the moment it’s not possible to release flytekit
as that depends on flyteidl>=1.10.0
, for which the auto-updater does not work (status page here; search for “flyteidl” in the “Errored” field).
I’ve gone ahead and fixed the auto-update (by removing the duplicate tests
section) and bumped `flyteidl==1.10.0 (this PR). I’ve also taken the liberty to add myself as a maintainer (this PR).
Once that is in, I will go ahead and update flytekit
.
cc @Eduardo Apolinario (eapolinario)Yi Chiu
11/03/2023, 4:36 AMDavid Espejo (he/him)
11/03/2023, 6:32 PMKetan (kumare3)
Ketan (kumare3)
Byron Hsu
11/06/2023, 9:22 PMLaura Lin
11/07/2023, 1:30 AMByron Hsu
11/07/2023, 5:50 PMDaniel Farrell
11/08/2023, 1:01 AMdocker
to build a container from ImageSpec
with a defined Dockerfile
))
https://github.com/flyteorg/flytekit/pull/1926
wanting to get some opinions on potential changes to the ImageSpec
interface. (none implemented yet, just put the link as a reference)
Basically:
If you want to define a container for a task that Flyte manages/builds you currently must define an ImageSpec
. I would like to create a new 'builder' that builds images via a Dockerfile
, but I'm finding the current interface pretty restricted.
I find the current ImageSpec
implementation to be mainly targeted at envD
. This is strange as ImageSpec
as-is is in flytekit core, but is essentially useless without flytekit-envd
installed. The current implementation also requires flytekit-envd
to be installed on the task-node as the EnvdImageSpecBuilder
must be instantiated at task-runtime as .build gets run at some point.
I was wondering if any devs had thoughts on ways I might decouple ImageSpec
from envD
? Or if any thought has been put into this before? I have a few ideas, but each has tradeoffs and I don't want to restart discussions that have already been had.
I will say that I would like to be able to use Dockerfiles
with the least amount of modification possible, so it's not really ideal to add a pip install flytekit-docker
step to every D`ockerfile` I want to use, but maybe that is unavoidable.Fabio Grätz
11/08/2023, 3:45 PMuser
11/08/2023, 10:00 PML godlike
11/09/2023, 12:32 PMByron Hsu
11/12/2023, 11:00 PMFabio Grätz
11/14/2023, 4:53 PMDavid Espejo (he/him)
11/14/2023, 6:22 PMKetan (kumare3)
Byron Hsu
11/22/2023, 5:28 PMuser
11/22/2023, 10:00 PMKevin Su
11/26/2023, 10:24 AMByron Hsu
11/26/2023, 5:37 PMByron Hsu
11/28/2023, 4:40 AM