https://flyte.org logo
Join the conversationJoin Slack
Channels
announcements
ask-the-community
auth
conference-talks
contribute
databricks-integration
datahub-flyte
deployment
ecosystem-unionml
engineeringlabs
events
feature-discussions
flyte-bazel
flyte-build
flyte-console
flyte-deployment
flyte-documentation
flyte-github
flyte-ui-ux
flytekit
flytekit-java
flytelab
great-content
hacktoberfest-2022
helsing-flyte
in-flyte-conversations
introductions
jobs
konan-integration
linkedin-flyte
random
ray-integration
ray-on-flyte
release
scipy-2022-sprint
sig-large-models
workflow-building-ui-proj
writing-w-sfloris
Powered by Linen
ask-the-community
  • r

    Rahul Mehta

    11/29/2022, 12:49 PM
    Hi, how does caching w/
    NamedTuple
    s work? Does each field in the
    NamedTuple
    just need to have an appropriate annotation w/ the
    HashMethod
    ?
    k
    • 2
    • 2
  • l

    Lukas Bommes

    11/29/2022, 4:04 PM
    Hi, In the script below I have two tasks, one downloads data from S3 to the "flytekit.current_context().working_directory" and returns the working directory as FlyteDirectory. The second tasks reads files from this directory. How can I ensure that both tasks are always running on the same node in a Kubernetes cluster? Is that automatically handled by passing FlyteDirectory?
    @task()
    def download_data() -> FlyteDirectory:
    working_dir = flytekit.current_context().working_directory
    download_to_working_dir()
    # now data is in "working_dir"
    return FlyteDirectory(path=working_dir)
    @task()
    def train(data_dir: FlyteDirectory): -> None
    train_model(data_dir=data_dir)  # reads files from disk and trains model
    k
    • 2
    • 4
  • f

    Frank Shen

    11/29/2022, 8:10 PM
    Hello, In a task I tried to return a nested dict like below as -> Dict[str, Any]
    {'eval-logloss': 0.40768596817526903,
     'done': True,
     'time_this_iter_s': 0.03399085998535156,
     'timesteps_total': None,
     'episodes_total': None,
     'training_iteration': 1,
     'trial_id': '665dc_00005',
     'experiment_id': '55faf486d8e24b999e269e485a88ccbc',
     'date': '2022-11-29_11-39-00',
     'timestamp': 1669750740,
     'time_total_s': 0.03399085998535156,
     'pid': 51901,
     'hostname': 'Franks-MacBook-Pro-2.local',
     'node_ip': '127.0.0.1',
     'config': {'objective': 'binary:logistic',
      'eval_metric': ['logloss', 'error'],
      'max_depth': 2,
      'min_child_weight': 1,
      'subsample': 0.9626982657573901,
      'eta': 0.05413737207295681,
      'tree_method': 'hist'},
     'time_since_restore': 0.03399085998535156,
     'timesteps_since_restore': 0,
     'iterations_since_restore': 1,
     'warmup_time': 0.005497932434082031,
     'experiment_tag': '5_eta=0.0541,max_depth=2,min_child_weight=1,subsample=0.9627'}
    And I kept getting error:
    TypeError: Failed to convert return value for var o0 for function dai_mle_models.xgboost.xgboost_tune_wf.xgb_tune with error <class 'flytekit.core.type_engine.TypeTransformerFailedError'>: Python value cannot be None, expected <class 'flytekit.types.pickle.pickle.FlytePickle.__class_getitem__.<locals>._SpecificFormatClass'>/blob {
      format: "PythonPickle"
    }
  • f

    Frank Shen

    11/29/2022, 8:10 PM
    Do you have idea why and how to fix it?
  • f

    Frank Shen

    11/29/2022, 8:12 PM
    I can return a non-nested dict in another task without problem.
  • f

    Frank Shen

    11/29/2022, 8:46 PM
    I think it because of these None values:
    'timesteps_total': None,
     'episodes_total': None,
    j
    k
    +2
    • 5
    • 12
  • s

    Sabrina Lui

    11/29/2022, 9:38 PM
    Hello, is there a way to handle "workflow termination from UI" from python task plugins? Our plugin spins up a
    dask
    cluster and we would like to ensure that the cluster gets cleaned up in every case that the workflow finishes (successful completion, failure, or user terminates). We are currently using the dask-kubernetes
    KubeCluster
    operator to create the cluster in
    pre_execute
    , then close everything in the
    post_execute
    (pseudo-code). However,
    post_execute
    doesn't get called when a user terminates from the UI, so we are seeing the
    dask
    cluster consistently hang. Thanks for your help!
    k
    t
    • 3
    • 5
  • a

    Ailin Yu

    11/30/2022, 6:39 PM
    Hello! Maybe this is a known thing, but does the Flyte UI not support inputs of
    List[Type]
    where the
    Type
    is complex? Or maybe I’m just specifying my input incorrectly: I specifically have a workflow with an input
    List[FlyteFile]
    that is passing that input to task that takes a
    List[FlyteFile]
    . When I try to run this workflow through the Flyte UI, passing an input like
    ["test_file.txt"]
    and the workflow starts running, the UI shows that this was my input:
    {
      "test_list": [
        "(empty)"
      ]
    }
    So as you can imagine, the workflow then fails to run:
    Message:
    
        Cannot convert from scalar {
      none_type {
      }
    }
     to <class 'flytekit.types.file.file.FlyteFile'>
    
    SYSTEM ERROR! Contact platform administrators.
    (More details in thread)
    j
    k
    d
    • 4
    • 11
  • f

    Frank Shen

    11/30/2022, 7:09 PM
    Hello, I experience a long delay (about 5-10 minutes) before tasks starts to run when I do pyflyte run … locally. It used to run tasks immediately. The difference might have been caused by me installing a new python 3.8 vs previously 3.7. Does anyone have idea on how to resolve this? It greatly slows down my development.
    k
    d
    • 3
    • 6
  • n

    Nischel Kandru (Woven Planet)

    11/30/2022, 7:35 PM
    Hello all I am interested in knowing the resources (CPU, GPU and Memory) requested by a task. Can you please let me know which table-field in Flyte Admin Db contains this data?
    y
    • 2
    • 6
  • t

    Tarmily Wen

    11/30/2022, 9:32 PM
    Hello, I followed the ray k8s guide, and ran the example task. It has been queuing indefinitely for a day. I can't find any logs tellign me what the issue is. Does anyone know what the issue is or how to search for the problem?
    k
    t
    d
    • 4
    • 46
  • j

    Jun Kwan

    12/01/2022, 7:00 AM
    I am following this tutorial https://docs.flyte.org/en/latest/getting_started/index.html
    flytectl demo start
    works well.
    pyflyte run example.py wf --n 500 --mean 42 --sigma 2
    works well.
    export FLYTECTL_CONFIG=...
    done. But when i do
    pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2
    Kubernetes is giving me this error
    Pod failed. No message received from kubernetes.
    Message: tar: Removing leading '/' from member names.
    Any ideas?
    s
    d
    +2
    • 5
    • 5
  • e

    Ekku Jokinen

    12/01/2022, 6:37 PM
    Hello everyone 🙂 I’m looking for some higher level advice on how to structure one of our workflows. We have a wf that runs a lot (10-100 million) of repetitive tasks, for which the execution logic is very stable. For now, we split the original input list into a more suitable number (10-100) of chunks and use
    map_task
    to map the chunks to equally many worker nodes. However, now I would like to add another task into the mix. The task would be a ShellTask, which would prefetch data into Flyte filesystem for the worker nodes to use in processing. Reasoning behind this is that currently the fetching of the data happens inside of the processing loop, which creates a sizeable I/O bottleneck. The problem is that according to the docs, one should not call another task from inside a mapped task. So I’m looking for a more flexible approach to distribute processing to multiple pods, which would allow calling tasks from inside the worker nodes. I’ve looked into
    @dynamic
    and subworkflows. Which would be better, or is there a better option? Thanks a ton
    j
    k
    • 3
    • 14
  • y

    Yee

    12/01/2022, 9:36 PM
    cross-posting from announcements channel https://flyte-org.slack.com/archives/CNMKCU6FR/p1669927238849689
    • 1
    • 1
  • a

    Alex Pozimenko

    12/02/2022, 12:38 AM
    Hi team, spark question here... We're having issue with some spark applications not cleared after completing/failing. This leads to 1,000s of "active" applications and aggregate size of env vars for spark pods exceeding ARG_MAX limit (as there are multiple vars or each exec/driver pod) and all jobs start failing. Whose responsibility is it to clear completed/failed applications, spark operator or propeller?
    k
    d
    • 3
    • 10
  • k

    Kim Junil

    12/02/2022, 9:08 AM
    Hi Flyte! Our team have many projects. so I want to filter the project by label on the main screen like
    flytectl get project --filter.fieldSelector
    command. Is there a way?
    y
    d
    s
    • 4
    • 8
  • b

    Brandon Segal

    12/05/2022, 2:00 PM
    Is it possible to create a task that returns a flyte workflow? I was interested to see if I could replicate the logic in this proposed Airlfow DAG orchestration that orchestrates a dbt project. In the workflow they do the following: 1. Runs
    dbt compile
    to create a fresh copy of
    manifest.json
    (Where dependencies are stored) 2. Reads the model selectors defined in the YAML file 3. Uses the
    dbt ls
    command to list all of the models associated with each model selector in the YAML file 4. Turns the dbt DAG from
    manifest.json
    into a
    Graph
    object with the
    networkx
    library 5. Uses the methods available on the
    Graph
    object to figure out the correct set of dependencies for each group of models defined in the YAML file 6. Writes the dependencies for each group of models (stored as a list of tuples) to file 7. Create an Airflow DAG for each group of models based on the given dependencies
    d
    e
    • 3
    • 7
  • f

    Frank Shen

    12/05/2022, 8:46 PM
    Hello, How can we have the ability of disabling and re-enabling launch plans from the UI? https://github.com/flyteorg/flyteconsole/issues/633
    y
    t
    e
    • 4
    • 3
  • f

    Frank Shen

    12/05/2022, 11:21 PM
    Hello, In the task , I am trying to dynamically set the env variable like this: if domain is ‘development’, then env=‘dev’ if domain is ‘stage’, then env=‘stage’ if domain is ‘production’, then env=‘prod’ The question is how to determine the domain in code? I tried flytekit.current_context() with no avail. Could someone shed some light? Thanks
    k
    e
    • 3
    • 4
  • l

    Laura Lin

    12/06/2022, 12:14 AM
    Why does Recover rerun map tasks that succeeded? Shouldn't it only rerun tasks that failed.
    e
    j
    +2
    • 5
    • 21
  • t

    Tom Melendez

    12/06/2022, 12:30 AM
    looking to set maxMessageSizeBytes option as mentioned here in a more declarative way (i.e. via Helm), it doesn't seem clear how to do this. Any pointers?
    k
    • 2
    • 2
  • c

    Chirag Gosalia

    12/06/2022, 2:05 PM
    what is the unit of the timeout when specifying it as task meta data. for example
    @task(timeout=60)
    def foo:
        return "foo"
    safe to assume
    60
    is seconds?milliseconds? cc @Louis DiNatale
    d
    • 2
    • 1
  • l

    Louis DiNatale

    12/06/2022, 5:53 PM
    I noticed in the flyte UI that dynamics are grouped in “attempts” what does this mean?
    d
    y
    • 3
    • 13
  • o

    Or Itzary

    12/06/2022, 9:36 PM
    Hey, I’m having trouble installing the demo flyte cluster and would appreciate some help 🙂
    y
    • 2
    • 8
  • v

    varsha Parthasarathy

    12/07/2022, 7:23 PM
    Hi team, What is the best way to solve this use case I have a wf
    A
    with inputs
    a,b,c
    I have another wf B which find out what the inputs should be to wf A and “invokes” it. My current model is as below: Are there better ways to handle this?
    @workflow
    A:
    
    @workflow
    B:
    result = B(..)
    y
    b
    • 3
    • 20
  • l

    Laura Lin

    12/07/2022, 7:58 PM
    https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.Resources.html#flytekit.Resources When would I want to change the storage param for kube tasks? Is it always just ephemeral-storage that I should be upping when I anticipate more storage needs?
    s
    e
    • 3
    • 2
  • t

    Tarmily Wen

    12/07/2022, 8:50 PM
    After a ray task is completed, the pods for the head and workers remain alive, preventing my autoscaling group from downscaling. Does anyone else see this happening? If so how do I ensure it cleans itself up after the ray task is completed?
    k
    d
    • 3
    • 3
  • s

    Sandra Youssef

    12/07/2022, 9:16 PM
    Hello Flyte Community 👋 As our most trusted contributors and users of Flyte, if we were to select a handful of 2023 ML, AI, Data Science or other conferences to participate in - as a Flyte talk, workshop or booth - where would you like to see us? All suggestions welcome until Friday Dec 16. Much appreciated!
  • f

    Fabio Grätz

    12/08/2022, 3:44 PM
    Is there a way to list projcts with flyte remote (python api)?
    r
    y
    • 3
    • 4
  • h

    Hank Fanchiu

    12/08/2022, 4:25 PM
    👋 what kubernetes solution do folks use? anyone use k3s in production? at stripe, we’ve prototyped flyte with k3s but are now evaluating different solutions. curious to learn about your experiences!
    k
    f
    d
    • 4
    • 12
Powered by Linen
Title
h

Hank Fanchiu

12/08/2022, 4:25 PM
👋 what kubernetes solution do folks use? anyone use k3s in production? at stripe, we’ve prototyped flyte with k3s but are now evaluating different solutions. curious to learn about your experiences!
k

Ketan (kumare3)

12/08/2022, 4:28 PM
Cc @Michael Lujan ?
f

Felix Ruess

12/08/2022, 4:32 PM
I'm running it on k3s with metallb, traefik and longhorn... and don't see any reason to move to a different one.
d

David Espejo (he/him)

01/19/2023, 4:45 PM
👀 Bumping up @Hank Fanchiu's question ⬆️
f

Felix Ruess

01/19/2023, 4:50 PM
k3s runs fine here so far, any specific questions @David Espejo (he/him)?
d

David Espejo (he/him)

01/19/2023, 4:55 PM
thanks for your input @Felix Ruess! Not much into the specifics, but more interested in having a sense of what other orgs are using to run Flyte/ML workloads? From what I can tell so far EKS/GKE/AKS (in that order) seem like common options, and k3s also makes a ton of sense
f

Felix Ruess

01/19/2023, 5:08 PM
sure So if it's of interest: we run 3 master nodes in HA setup in VMs on our Proxmox cluster and the agent nodes (with various GPUs) are "bare-metal" machines. Kube-vip for control plane and metallb for other loadbalancer services. Longhorn for persistent storage (e.g. for Flyte postgres db), actual ML data in our MinIO cluster...
and Grafana agent ships metrics and logs to Prometheus and Loki, viewing Flyte task logs via Grafana/Loki works nicely
k

Ketan (kumare3)

01/19/2023, 8:11 PM
@Felix Ruess would you be open to sharing with the community your progress with Flyte and share thr journey during a community meeting
f

Felix Ruess

01/19/2023, 8:22 PM
sure, but to be honest we can't run it in prod yet and run most things outside flyte still (e.g. because I can't assign affinity per task yet)
And it seems that we are a bit atypical Flyte users, with only a few container tasks in a workflow so far. Not using most of the "fancy" Flyte features... But sure, I would be up to talking about it...
d

David Espejo (he/him)

01/19/2023, 8:29 PM
that's great @Felix Ruess! I'm sure we all can learn from your journey so far. @Tyler Su 👀
k

Ketan (kumare3)

01/19/2023, 11:27 PM
the affinity per task is coming soon
View count: 4