Frank Shen
05/16/2023, 9:28 PMAnooj Patel
05/16/2023, 9:38 PMLaura Lin
05/16/2023, 10:44 PMOptional[str]
, when I try to rerun it using the UI Rerun Button, the field doesn't get populated.
But if I launch the job using the UI, it does get populated in the rerun panel.Derek Yu
05/16/2023, 11:26 PMDerek Yu
05/17/2023, 5:24 AMBernhard Stadlbauer
05/17/2023, 7:15 AMdask
plugin are unschedulable, as limits are not set properly.
We’re thinking executionMetadata.GetPlatformResources()
returns nil
in the dask plugin, leading to limits not being set which is invalid with `ResourceQuota`’s in place.
We do have a workaround of explicitly setting those limits for now but wanted to flag this.Bernhard Stadlbauer
05/17/2023, 7:34 AMflytectl
? We’ve tried --logger.mute
but that doesn’t seem to do the trick:
root@316456918fe4:/# flytectl --logger.mute version
INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags.
{
"App": "flytectl",
"Build": "bd6b856",
"Version": "0.6.36",
"BuildTime": "2023-05-17 07:34:12.851237172 +0000 UTC m=+0.290831418"
}
Lukas Bommes
05/17/2023, 10:55 AMconfiguration:
inline:
task_resource_defaults:
task_resources:
defaults:
cpu: 500m
memory: 500Mi
gpu: 0
limits:
cpu: 8
memory: 32Gi
gpu: 1
2. Add the following to helm values.yaml
configuration:
inline:
cluster_resources:
customData:
- production:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: "4000Mi"
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: "3000Mi"
- development:
- projectQuotaCpu:
value: "48"
- projectQuotaMemory:
value: "100Gi"
3. Update task-resource-attributes through flytectl as described here and here.
Any ideas on this? Would be greatly appreciated.er Ksy
05/17/2023, 3:03 PMBosco Raju
05/17/2023, 4:19 PMJay Ganbat
05/17/2023, 10:02 PMoverwrite cache
in the main parent workflow however that value is not propagating to the subworkflows so i have this weird execution that partially ran. All task and dynamic tasks have been rerun but the subworkflows used cached result
is that an intended behavoir or an oversight in propagating the valueDerek Yu
05/18/2023, 1:05 AMcluster-resource-attribute
?
Following this section to flytectl update cluster-resource-attribute --attrFile cra.yaml
and when trying to restart flyteadmin it fails in the sync-cluster-resources
init container (without any error logs).juchao song
05/18/2023, 3:22 AMVictor Gustavo da Silva Oliveira
05/18/2023, 1:15 PMvalues.yaml
to change the default service account? I have tried k8sServiceAccount
but it is not working...Lukas Bommes
05/18/2023, 2:02 PMadmin:
endpoint: dns:///A.B.C.D:PPPPP
insecure: false
authType: Pkce
insecureSkipVerify: true
flytectl works just fine. But when I try to fetch an execution on a FlyteRemote with this code
project = "myproject"
domain = "development"
execution = "a5cphsfgc57nt6nxbknt"
flyte_config_file = "flyte.config.yaml"
remote = FlyteRemote(config=Config.auto(config_file=flyte_config_file))
flyte_workflow_execution = remote.fetch_execution(project=project, domain=domain, name=execution)
I get the following error
Traceback (most recent call last):
File "debug_remote.py", line 12, in <module>
flyte_workflow_execution = remote.fetch_execution(project=project, domain=domain, name=execution)
File "/opt/micromamba/envs/OHLI/lib/python3.8/site-packages/flytekit/remote/remote.py", line 353, in fetch_execution
self.client.get_execution(
File "/opt/micromamba/envs/OHLI/lib/python3.8/site-packages/flytekit/clients/friendly.py", line 582, in get_execution
super(SynchronousFlyteClient, self).get_execution(
File "/opt/micromamba/envs/OHLI/lib/python3.8/site-packages/flytekit/clients/raw.py", line 43, in handler
return fn(*args, **kwargs)
File "/opt/micromamba/envs/OHLI/lib/python3.8/site-packages/flytekit/clients/raw.py", line 651, in get_execution
return self._stub.GetExecution(get_object_request, metadata=self._metadata)
File "/opt/micromamba/envs/OHLI/lib/python3.8/site-packages/grpc/_channel.py", line 1030, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/opt/micromamba/envs/OHLI/lib/python3.8/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:A.B.C.D:PPPPP: Peer name A.B.C.D is not in peer certificate"
debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:A.B.C.D:PPPPP: Peer name A.B.C.D is not in peer certificate {created_time:"2023-05-18T14:40:11.018710903+01:00", grpc_status:14}"
>
The cluster is running the flyte-binary Helm chart in version 1.3.0 and I tried flytekit 1.2.11, 1.3.0, and 1.6.1, all resulting in the same error message.Nicholas Roberson
05/18/2023, 3:17 PMflytekit.remote
? I have the execution information and can sync it for the workflow, however I want to be able to give a user in a CLI a command they can run to check the status of a large job (amount of tasks in RUNNING, SUCCEEDED, FAILED, etc..)Yubo Wang
05/18/2023, 6:47 PMMelody Lui
05/18/2023, 8:50 PMPryce
05/19/2023, 12:57 AM#!/bin/zsh
REPO=$1
IMAGE=$2
TAG=$3
printf "Building Docker image: %s/%s:%s\n" "$REPO" "$IMAGE" "$TAG"
docker build . -t "$REPO/$IMAGE:$TAG"
printf "Saving Docker image: %s/%s:%s\n" "$REPO" "$IMAGE" "$TAG"
docker save "$REPO/$IMAGE:$TAG" -o ~/Downloads/"$REPO-$IMAGE-$TAG.tar"
printf "Copying Docker image to flyte-sandbox:/tmp\n"
docker cp ~/Downloads/"$REPO-$IMAGE-$TAG.tar" flyte-sandbox:/tmp
printf "Importing Docker image on flyte-sandbox\n"
docker exec -t flyte-sandbox ctr image import /tmp/"$REPO-$IMAGE-$TAG.tar"
Pryce
05/19/2023, 12:59 AMdocker save
directly to docker cp
, but alas I could not get it working..Tommy Nam
05/19/2023, 4:45 AMValueError: Error encountered while executing 'run_map_task_workflow':
Map tasks can only compose of Python Functon Tasks currently
Is flytekitplugins.papermill.NotebookTask
currently incompatible with map_tasks?Pradithya Aria Pura
05/19/2023, 6:43 AMKhalil Almazaideh
05/19/2023, 11:28 AM@task
def build_models() -> Tuple[LinearRegression, RandomForestRegressor, Any]:
lr = LinearRegression()
rf = RandomForestRegressor()
ann = HyperModel1()
return lr,rf,ann
@task
def pipline_(models: Tuple[LinearRegression, RandomForestRegressor, Any], (.....etc)) -> List[Dict]:
def itirate(models):
for i in models:
# Do STUFF
def wf() -> pd.DataFrame:
....
....
models = build_models()
s_summary = HyperSearch(models=models, ....)
return s_summary
Mike Ossareh
05/19/2023, 4:16 PM@workflow
that uses map `@task`s and spins up 10-100s of pods. Our EKS setup is very elastic and sometimes will bin-pack these pods onto the same node. When it does that there's a high chance that one of the pods in the map @task
will fail with an error similar to the issue linked above.
Our kubelet is currently setup to do serial image pulls. So, in theory, once one of the `@task`s pulls the image it should then be available for all of the other pods. But it seems that's not the case. Initially i thought the fact that flyte is setting imagePullPolicy: Always
was a problem, but reading the docs more closely it seems that's not the case.
Always
every time the kubelet launches a container, the kubelet queries the container image registry to resolve the name to an image digest. *If the kubelet has a container image with that exact digest cached locally, the kubelet uses its cached image*; otherwise, the kubelet pulls the image with the resolved digest, and uses that image to launch the container.(emphasis mine) Has anyone observed this issue? Any recommendations?
Cody Scandore
05/19/2023, 5:58 PMFlyteFile | None
? Or Optional[FlyteFile | None]
Ping
05/21/2023, 6:34 AMY Li
05/21/2023, 11:58 PMDerek Yu
05/22/2023, 4:11 AMValueError: Empty module name
when running a map_task()
with flytekit and trying to load the actual task
object. More details below ⬇️
cc: @Heidi HurstVitali
05/22/2023, 11:47 AMXinzhou Liu
05/22/2023, 4:14 PM[1/1] currentAttempt done. Last Error: USER::containers with unready status: [primary]|context deadline exceeded
I’ve seen a few times such error. What would be the root cause of it?