flyte-org #ask-the-community

Hi, In the script below I have two tasks, one downloads data from S3 to the "flytekit.current_context().working_directory" and returns the working directory as FlyteDirectory. The second tasks reads files from this directory. How can I ensure that both tasks are always running on the same node in a Kubernetes cluster? Is that automatically handled by passing FlyteDirectory?

@task()

def download_data() -> FlyteDirectory:

working_dir = flytekit.current_context().working_directory

download_to_working_dir()

# now data is in "working_dir"

return FlyteDirectory(path=working_dir)

@task()

def train(data_dir: FlyteDirectory): -> None

train_model(data_dir=data_dir)  # reads files from disk and trains model

Frank Shen

11/29/2022, 8:10 PM

Hello, In a task I tried to return a nested dict like below as -> Dict[str, Any]

Copy code

{'eval-logloss': 0.40768596817526903,
 'done': True,
 'time_this_iter_s': 0.03399085998535156,
 'timesteps_total': None,
 'episodes_total': None,
 'training_iteration': 1,
 'trial_id': '665dc_00005',
 'experiment_id': '55faf486d8e24b999e269e485a88ccbc',
 'date': '2022-11-29_11-39-00',
 'timestamp': 1669750740,
 'time_total_s': 0.03399085998535156,
 'pid': 51901,
 'hostname': 'Franks-MacBook-Pro-2.local',
 'node_ip': '127.0.0.1',
 'config': {'objective': 'binary:logistic',
  'eval_metric': ['logloss', 'error'],
  'max_depth': 2,
  'min_child_weight': 1,
  'subsample': 0.9626982657573901,
  'eta': 0.05413737207295681,
  'tree_method': 'hist'},
 'time_since_restore': 0.03399085998535156,
 'timesteps_since_restore': 0,
 'iterations_since_restore': 1,
 'warmup_time': 0.005497932434082031,
 'experiment_tag': '5_eta=0.0541,max_depth=2,min_child_weight=1,subsample=0.9627'}

And I kept getting error:

Copy code

TypeError: Failed to convert return value for var o0 for function dai_mle_models.xgboost.xgboost_tune_wf.xgb_tune with error <class 'flytekit.core.type_engine.TypeTransformerFailedError'>: Python value cannot be None, expected <class 'flytekit.types.pickle.pickle.FlytePickle.__class_getitem__.<locals>._SpecificFormatClass'>/blob {
  format: "PythonPickle"
}

Frank Shen

11/29/2022, 8:10 PM

Do you have idea why and how to fix it?

Frank Shen

11/29/2022, 8:12 PM

I can return a non-nested dict in another task without problem.

Frank Shen

11/29/2022, 8:46 PM

I think it because of these None values:

Copy code

'timesteps_total': None,
 'episodes_total': None,

Sabrina Lui

11/29/2022, 9:38 PM

Hello, is there a way to handle "workflow termination from UI" from python task plugins? Our plugin spins up a

dask

cluster and we would like to ensure that the cluster gets cleaned up in every case that the workflow finishes (successful completion, failure, or user terminates). We are currently using the dask-kubernetes

KubeCluster

operator to create the cluster in

pre_execute

, then close everything in the

post_execute

(pseudo-code). However,

post_execute

doesn't get called when a user terminates from the UI, so we are seeing the

dask

cluster consistently hang. Thanks for your help!

Ailin Yu

11/30/2022, 6:39 PM

Hello! Maybe this is a known thing, but does the Flyte UI not support inputs of

List[Type]

where the

Type

is complex? Or maybe I’m just specifying my input incorrectly: I specifically have a workflow with an input

List[FlyteFile]

that is passing that input to task that takes a

List[FlyteFile]

. When I try to run this workflow through the Flyte UI, passing an input like

["test_file.txt"]

and the workflow starts running, the UI shows that this was my input:

Copy code

{
  "test_list": [
    "(empty)"
  ]
}

So as you can imagine, the workflow then fails to run:

Copy code

Message:

    Cannot convert from scalar {
  none_type {
  }
}
 to <class 'flytekit.types.file.file.FlyteFile'>

SYSTEM ERROR! Contact platform administrators.

(More details in thread)

Frank Shen

11/30/2022, 7:09 PM

Hello, I experience a long delay (about 5-10 minutes) before tasks starts to run when I do pyflyte run … locally. It used to run tasks immediately. The difference might have been caused by me installing a new python 3.8 vs previously 3.7. Does anyone have idea on how to resolve this? It greatly slows down my development.

Nischel Kandru (Woven Planet)

11/30/2022, 7:35 PM

Hello all I am interested in knowing the resources (CPU, GPU and Memory) requested by a task. Can you please let me know which table-field in Flyte Admin Db contains this data?

Tarmily Wen

11/30/2022, 9:32 PM

Hello, I followed the ray k8s guide, and ran the example task. It has been queuing indefinitely for a day. I can't find any logs tellign me what the issue is. Does anyone know what the issue is or how to search for the problem?

Jun Kwan

12/01/2022, 7:00 AM

I am following this tutorial https://docs.flyte.org/en/latest/getting_started/index.html

flytectl demo start

works well.

pyflyte run example.py wf --n 500 --mean 42 --sigma 2

works well.

export FLYTECTL_CONFIG=...

done. But when i do

pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2

Kubernetes is giving me this error

Pod failed. No message received from kubernetes.

Message: tar: Removing leading '/' from member names.

Any ideas?

Ekku Jokinen

12/01/2022, 6:37 PM

Hello everyone 🙂 I’m looking for some higher level advice on how to structure one of our workflows. We have a wf that runs a lot (10-100 million) of repetitive tasks, for which the execution logic is very stable. For now, we split the original input list into a more suitable number (10-100) of chunks and use

map_task

to map the chunks to equally many worker nodes. However, now I would like to add another task into the mix. The task would be a ShellTask, which would prefetch data into Flyte filesystem for the worker nodes to use in processing. Reasoning behind this is that currently the fetching of the data happens inside of the processing loop, which creates a sizeable I/O bottleneck. The problem is that according to the docs, one should not call another task from inside a mapped task. So I’m looking for a more flexible approach to distribute processing to multiple pods, which would allow calling tasks from inside the worker nodes. I’ve looked into

@dynamic

and subworkflows. Which would be better, or is there a better option? Thanks a ton

Yee

12/01/2022, 9:36 PM

cross-posting from announcements channel https://flyte-org.slack.com/archives/CNMKCU6FR/p1669927238849689

Alex Pozimenko

12/02/2022, 12:38 AM

Hi team, spark question here... We're having issue with some spark applications not cleared after completing/failing. This leads to 1,000s of "active" applications and aggregate size of env vars for spark pods exceeding ARG_MAX limit (as there are multiple vars or each exec/driver pod) and all jobs start failing. Whose responsibility is it to clear completed/failed applications, spark operator or propeller?

Kim Junil

12/02/2022, 9:08 AM

Hi Flyte! Our team have many projects. so I want to filter the project by label on the main screen like

flytectl get project --filter.fieldSelector

command. Is there a way?

Brandon Segal

12/05/2022, 2:00 PM

Is it possible to create a task that returns a flyte workflow? I was interested to see if I could replicate the logic in this proposed Airlfow DAG orchestration that orchestrates a dbt project. In the workflow they do the following: 1. Runs

dbt compile

to create a fresh copy of

manifest.json

(Where dependencies are stored) 2. Reads the model selectors defined in the YAML file 3. Uses the

dbt ls

command to list all of the models associated with each model selector in the YAML file 4. Turns the dbt DAG from

manifest.json

into a

Graph

object with the

networkx

library 5. Uses the methods available on the

Graph

object to figure out the correct set of dependencies for each group of models defined in the YAML file 6. Writes the dependencies for each group of models (stored as a list of tuples) to file 7. Create an Airflow DAG for each group of models based on the given dependencies

Frank Shen

12/05/2022, 8:46 PM

Hello, How can we have the ability of disabling and re-enabling launch plans from the UI? https://github.com/flyteorg/flyteconsole/issues/633

Frank Shen

12/05/2022, 11:21 PM

Hello, In the task , I am trying to dynamically set the env variable like this: if domain is ‘development’, then env=‘dev’ if domain is ‘stage’, then env=‘stage’ if domain is ‘production’, then env=‘prod’ The question is how to determine the domain in code? I tried flytekit.current_context() with no avail. Could someone shed some light? Thanks

Laura Lin

12/06/2022, 12:14 AM

Why does Recover rerun map tasks that succeeded? Shouldn't it only rerun tasks that failed.

Tom Melendez

12/06/2022, 12:30 AM

looking to set maxMessageSizeBytes option as mentioned here in a more declarative way (i.e. via Helm), it doesn't seem clear how to do this. Any pointers?

Chirag Gosalia

12/06/2022, 2:05 PM

what is the unit of the timeout when specifying it as task meta data. for example

Copy code

@task(timeout=60)
def foo:
    return "foo"

safe to assume

is seconds?milliseconds? cc @Louis DiNatale

Louis DiNatale

12/06/2022, 5:53 PM

I noticed in the flyte UI that dynamics are grouped in “attempts” what does this mean?

Or Itzary

12/06/2022, 9:36 PM

Hey, I’m having trouble installing the demo flyte cluster and would appreciate some help 🙂

varsha Parthasarathy

12/07/2022, 7:23 PM

Hi team, What is the best way to solve this use case I have a wf

with inputs

a,b,c

I have another wf B which find out what the inputs should be to wf A and “invokes” it. My current model is as below: Are there better ways to handle this?

Copy code

@workflow
A:

@workflow
B:
result = B(..)

Laura Lin

12/07/2022, 7:58 PM

https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.Resources.html#flytekit.Resources When would I want to change the storage param for kube tasks? Is it always just ephemeral-storage that I should be upping when I anticipate more storage needs?

Tarmily Wen

12/07/2022, 8:50 PM

After a ray task is completed, the pods for the head and workers remain alive, preventing my autoscaling group from downscaling. Does anyone else see this happening? If so how do I ensure it cleans itself up after the ray task is completed?

Sandra Youssef

12/07/2022, 9:16 PM

Hello Flyte Community 👋 As our most trusted contributors and users of Flyte, if we were to select a handful of 2023 ML, AI, Data Science or other conferences to participate in - as a Flyte talk, workshop or booth - where would you like to see us? All suggestions welcome until Friday Dec 16. Much appreciated!

Fabio Grätz

12/08/2022, 3:44 PM

Is there a way to list projcts with flyte remote (python api)?

Hank Fanchiu

12/08/2022, 4:25 PM

👋 what kubernetes solution do folks use? anyone use k3s in production? at stripe, we’ve prototyped flyte with k3s but are now evaluating different solutions. curious to learn about your experiences!

Louis DiNatale

12/08/2022, 4:55 PM

Is there a hard limit on allowed for cache lookup? We are running into this error.

Copy code

array size > max allowed for cache lookup. requested [15434]. writer allowed [10000] reader allowed [10000]