https://flyte.org logo
Join the conversationJoin Slack
Channels
announcements
ask-the-community
auth
conference-talks
contribute
databricks-integration
datahub-flyte
deployment
ecosystem-unionml
engineeringlabs
events
feature-discussions
flyte-bazel
flyte-build
flyte-console
flyte-deployment
flyte-documentation
flyte-github
flyte-ui-ux
flytekit
flytekit-java
flytelab
great-content
hacktoberfest-2022
helsing-flyte
in-flyte-conversations
introductions
jobs
konan-integration
linkedin-flyte
random
ray-integration
ray-on-flyte
release
scipy-2022-sprint
sig-large-models
workflow-building-ui-proj
writing-w-sfloris
Powered by Linen
ask-the-community
  • f

    Frank Shen

    10/21/2022, 6:01 PM
    I think I found it https://github.com/flyteorg/flytesnacks/tree/master/cookbook/integrations/kubernetes/k8s_spark
  • f

    Frank Shen

    10/21/2022, 9:52 PM
    Hello, I have a workflow that requires a custom client connection (Feathr client). I need to get the client and pass it to the subsequent tasks. E.g.
    from feathr_client_wrapper.feathr_client import generate_feathr_client
    from feathr.client import FeathrClient
    
    
    # @task
    def get_feathr_client() -> FeathrClient:
        client = generate_feathr_client(
            team="customer",
            environment="dev",
            organization="wm",
            cluster_size="small",
            project_name="test_project",
            email="<mailto:test@warnermedia.com|test@warnermedia.com>",
            aws_registry=True
        )
        return client
    
    @task
    def list_features(client: FeathrClient, project_name: str) -> typing.List:
        return client.list_registered_features(project_name=project_name)
    
    @workflow
    def wf(project_name: str = 'feathr_demo') -> typing.List:
        client = get_feathr_client()
        return list_features(client = client, project_name = project_name)
    
    if __name__ == "__main__":
        print(wf())
  • f

    Frank Shen

    10/21/2022, 9:55 PM
    It works when I comment out the @task & @workflow and run from the main(). When running as workflow, I got error:
    TypeError: can't pickle _thread.lock objects
    ...
    Failed to Bind variable client for function feathr_example.list_features
  • f

    Frank Shen

    10/21/2022, 9:59 PM
    Basically, input variable client is not a data json but a custom class object, therefore is not appropriate to be an input or output in a task is what I figured. How do I initialize a shared object in the workflow script to be used by all the tasks?
    e
    • 2
    • 2
  • a

    Alex Pozimenko

    10/21/2022, 9:59 PM
    is it possible to delete a project using
    flytectl
    ? The documentation says that
    flytectl delete
    can do that (here), but then I run the tool, project is not listed in available commands:
    flytectl delete --help
    
    Delete a resource; if an execution:
    ::
    
     flytectl delete execution kxd1i72850  -d development  -p flytesnacks
    
    Usage:
      flytectl delete [command]
    
    Available Commands:
      cluster-resource-attribute Deletes matchable resources of cluster attributes.
      execution                  Terminates/deletes execution resources.
      execution-cluster-label    Deletes matchable resources of execution cluster label.
      execution-queue-attribute  Deletes matchable resources of execution queue attributes.
      plugin-override            Deletes matchable resources of plugin overrides.
      task-resource-attribute    Deletes matchable resources of task attributes.
      workflow-execution-config  Deletes matchable resources of workflow execution config.
    k
    e
    • 3
    • 4
  • s

    Sampath Vaddadi

    10/21/2022, 10:12 PM
    Hi all, I'm using PyTorchJob job in flyte to train a pytorch model following this tutorial https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/kubernetes/kfpytorch/pytorch_mnist.html I do not need the distributed training and so I change num_workers=0 but the job gets queued and never starts. If I give num_workers >= 1 the pytorchjob runs. Did anyone face this issue? Any help is greatly appreciated
    k
    s
    • 3
    • 4
  • l

    Laura Lin

    10/21/2022, 10:26 PM
    how does fast registration work? When I try giving it an s3 output like the docs
    pyflyte register -p project_name -d domain_name --output <s3://my-s3-bucket/raw_data>
    <-- subbing in my own bucket, I get
    FileNotFoundError: [Errno 2] No such file or directory: '<GIT REPO PATH>/s3:/my-s3-bucket/fast-serialize/fast3e198a8e9dd654e746828c0ae929fce3.tar.gz'
    and when I don't feed in a
    --output
    , it creates a new folder inside
    my-s3-bucket
    . When I run it from the UI, I get
    tar: development/fast3e198a8e9dd654e746828c0ae929fce3.tar.gz: Cannot open: Not a directory
    y
    e
    • 3
    • 17
  • l

    Laura Lin

    10/25/2022, 1:39 AM
    How do you modify k8-array plugin config? I tried adding a k8s-array at this level for
    max-array-job-size
    but it didn't seem to work.
    k
    d
    • 3
    • 5
  • p

    Panos Strouth

    10/25/2022, 9:27 AM
    Hi everyone, I am new to K8S and Flyte but I managed to install Flyte on EKS by following this guide: https://docs.flyte.org/en/latest/deployment/aws/manual.html I tried to access flyte using flytectl and it worked. Unfortunately, when I try to use pyflyte to execute a workflow remotely I get the following error:
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNKNOWN
    details = "failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
    caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
    status code: 403, request id: 88d09420-d2e3-4772-8767-83cff32d91af"
    debug_error_string = "UNKNOWN:Error received from peer ipv4:xx.xx.xx.xx:443 {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403
    Seems like an error in IRSA (IAM Role for ServiceAccount). The installation guide suggests to attach IAM roles to the whole EC2 node. Personally I decided to use IRSA because I think this is the correct way to provide permissions to applications. Using EC2-wide roles means that every application running on the instance has the role permissions. With IRSA you allow IAM roles be assumed by applications running in specific namespaces…some kind of more fine-grained control. But as I said I am still a K8S beginner so no strong opinion. My IAM setup has 2 roles: flyte-user-role and iam-role-flyte. Both roles have full s3 permissions. The most important part is the trust policy. Since I use IRSA both roles have the following trust policy:
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "",
    "Effect": "Allow",
    "Principal": {
    "Federated": "arn:aws:iam::xxxxxxxx:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
    "StringEquals": {
    "<http://oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:aud|oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>",
    "<http://oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:sub|oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:sub>": "system:serviceaccount:flyte:default"
    }
    }
    }
    ]
    }
    Note the “flyte” namespace in the Condition. My flyte services run in “flyte” namespace and they should be able to assume the above roles. I think the problem is related to IAM trust policies because flyte service does not have the required permissions to assume the IAM role. Has anyone faced a similar issue? Any help is appreciated!
    r
    • 2
    • 8
  • a

    Adedeji Ayinde

    10/25/2022, 5:56 PM
    what is the correct way to use the existing flyte-sandbox-lite image or (other local image) to register a task to a remote cluster programmatically. The task failed to run in the remote cluster.
    img = ImageConfig.from_images(
        "<http://cr.flyte.org/flyteorg/flyte-sandbox-lite:sha-4f73dc6994dfeafb9eecd9b17d16d7f9275b577a|cr.flyte.org/flyteorg/flyte-sandbox-lite:sha-4f73dc6994dfeafb9eecd9b17d16d7f9275b577a>",
    )
    
    t1 = remote.register_task(
        entity=generate_normal_df,
        serialization_settings=SerializationSettings(image_config=img),
        version="v1.0",
    )
    I got the following error message
    [1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
    [abdmkkdhj6rvsx6cvkww-n0-0] terminated with exit code (1). Reason [Error]. Message: 
    Starting Docker daemon...
    Terminated
    Timed out while waiting for dockerd to start
    .
    k
    • 2
    • 1
  • t

    Tom Melendez

    10/25/2022, 8:56 PM
    Hey Folks, using Flyte remote and would like to pull out some data that I see in the console. I use recent_executions, which returns a list of FlyteWorkflowExecution, but inputs property seems to be blank. Yet, in the console I see the inputs for that same execution ID. How do access those?
    k
    • 2
    • 4
  • c

    Craig Amundsen

    10/25/2022, 10:36 PM
    Greetings - I am trying to do some introspection while running a workflow. It is my understanding that since flytekit 1.1.x it is possible to (perhaps via a FlyteContext instance) to retrieve the workflow version that is running. I have been looking through the docs and asking around where I work and haven't been able to figure out how to do this. So, is this possible? and if so, could someone point me in the right direction?
    k
    • 2
    • 5
  • e

    Edgar Trujillo

    10/26/2022, 2:26 AM
    Hey all, Currently seeing a flyte workflow issue, specifically:
    Failed to convert return value for var o0 for function src.tasks.batch_transform.load_bt_result with error <class 'TypeError'>: unhashable type: 'list'. 
    
    SYSTEM ERROR! Contact platform administrators.
    We've double checked that this task is actually returning a pandas df. We've bumped the cache version as well. The flyte propeller logs show:
    {"json":{"exec_id":"f6d3b8ca69b2149bbbbb","node":"n3/n3","ns":"forecaster-training-dev","res_ver":"38555329","routine":"worker-19","tasktype":"python-task","wf":"forecaster-training:dev:src.workflow.end_to_end"},"level":"warning","msg":"No plugin found for Handler-type [python-task], defaulting to [container]","ts":"2022-10-26T01:19:57Z"}
    {"json":{"exec_id":"f6d3b8ca69b2149bbbbb","node":"n4/n3","ns":"forecaster-training-dev","res_ver":"38555329","routine":"worker-19","wf":"forecaster-training:dev:src.workflow.end_to_end"},"level":"warning","msg":"No plugin found for Handler-type [python-task], defaulting to [container]","ts":"2022-10-26T01:19:57Z"}
    {"json":{"exec_id":"f6d3b8ca69b2149bbbbb","node":"n4/n3","ns":"forecaster-training-dev","res_ver":"38555329","routine":"worker-19","wf":"forecaster-training:dev:src.workflow.end_to_end"},"level":"warning","msg":"Failed to record taskEvent, error [EventAlreadyInTerminalStateError: conflicting events; destination: ABORTED, caused by [rpc error: code = FailedPrecondition desc = invalid phase change from FAILED to ABORTED for task execution {resource_type:TASK project:\"forecaster-training\" domain:\"dev\" name:\"src.tasks.batch_transform.load_bt_result\" version:\"KZ5UyzZ2adcfdjkL4edUwg==\"  node_id:\"n4-0-n3\" execution_id:\u003cproject:\"forecaster-training\" domain:\"dev\" name:\"f6d3b8ca69b2149bbbbb\" \u003e  0 {} [] 0}]]. Trying to record state: ABORTED. Ignoring this error!","ts":"2022-10-26T01:19:57Z"}
    k
    e
    • 3
    • 8
  • s

    Schleppo

    10/26/2022, 6:51 AM
    Hello all, i am new in the flyte game 🙂 Does anyone have experience with installation and configuration on an on premise cluster? Is there anything to consider?
    k
    • 2
    • 1
  • k

    Katrina P

    10/26/2022, 1:36 PM
    What's the best way to put a hard limit the number of pods that Flyte & SparkOperator will spin up? (let's say we have some vCPU limit of 100 or something)
    k
    • 2
    • 10
  • g

    George D. Torres

    10/26/2022, 4:39 PM
    Does any one have any tips for Unmarshaling responses from flyte admin (in Go)? For example when I send a GET to
    /api/v1/data/executions/{id}
    with a execution id, the json response I get can't be Unmarshaled to the proper struct (located here). I'm especially having a hard time Unmarshaling `core.Literal`s
    s
    e
    k
    • 4
    • 7
  • a

    Augie Palacios

    10/26/2022, 10:12 PM
    do yall have an
    flyteconsole
    image that would be able to be deployed on an airgapped network? we are trying to allow users access to the flyte UI but when it is deployed onsite the UI breaks because they are unable to hit the following endpoints:
    <script crossorigin="" src="<https://unpkg.com/react@16.13.1/umd/react.production.min.js>"></script>
    <script crossorigin="" src="<https://unpkg.com/react-dom@16.13.1/umd/react-dom.production.min.js>"></script>
    <script async="" src="<https://www.googletagmanager.com/gtag/js?id=G-0QW4DJWJ20>"></script>
    can probably fork and make quick edit to enable this to work with no internet connection but didn't want to reproduce work if yall had something already
    e
    k
    • 3
    • 2
  • c

    Carlos Cervantes

    10/27/2022, 12:46 AM
    Hi! I'm having trouble understanding what's supposed to happen with k8s pod specs in Flyte. I have a default-node-selector set like so:
    configmap:
    ...
      k8s:
        plugins:
          k8s:
    ...
            default-node-selector:
              algorithm-node: "true"
              "<http://cloud.google.com/gke-smt-disabled|cloud.google.com/gke-smt-disabled>": "false"
    but in my pod task, I add a V1PodSpec with a different node-selector
    @task(task_config=Pod(pod_spec=V1PodSpec(
            node_selector={
                "large-ssd-node": "true",
                "<http://cloud.google.com/gke-large-ssd|cloud.google.com/gke-large-ssd>": "true",
            },
    ...)
    def mytask():
    However, when I look at the yaml that ultimately gets generated, it seems to be a merge of the two different node-selectors
    nodeSelector:
        algorithm-node: "true"
        <http://cloud.google.com/gke-large-ssd|cloud.google.com/gke-large-ssd>: "true"
        <http://cloud.google.com/gke-smt-disabled|cloud.google.com/gke-smt-disabled>: "false"
        large-ssd-node: "true"
    Is this expected behavior? If so, is there a way to replace the default-node-selector, so I only end up with
    nodeSelector:
        <http://cloud.google.com/gke-large-ssd|cloud.google.com/gke-large-ssd>: "true"
        large-ssd-node: "true"
    y
    • 2
    • 6
  • s

    Sujith Samuel

    10/27/2022, 6:50 AM
    #ask-the-community #announcements I am trying to allocate NVIDIA MIGS to a flyte job. The guide here https://docs.flyte.org/projects/cookbook/en/stable/auto/deployment/configure_use_gpus.html gives information only to allocate GPU's to flyte jobs. Migs however are like smaller chunks of an allocatable GPU which can be given to individual pods. I see that Flyte only has the default
    <http://nvidia.com/gpu|nvidia.com/gpu>
    Is there any way to get Migs into this mix of things. @SeungTaeKim I see that you were working on this 4 months back, did you get a solution to this issue. Please assist.
    s
    s
    k
    • 4
    • 17
  • s

    Sujith Samuel

    10/27/2022, 2:16 PM
    #ask-the-community Has anyone tried installing Flyte in a service mesh like Istio. What I am trying is to setup Istio with an oauth provider which then authenticates users to flyte....
    e
    y
    • 3
    • 6
  • t

    Tarmily Wen

    10/27/2022, 4:35 PM
    Hi, I am wondering if there is an example/tutorial on multi-node multi-gpu training. I only see the single-node multi-gpu training example.
    k
    k
    • 3
    • 13
  • j

    Jay Ganbat

    10/27/2022, 8:18 PM
    Hi all, i have seen this odd error in dynamic task where • Dynamic task does not generate a node ◦ one of the condition is not met • Dynamic task returns a Map type • So we just return some default value Then we are getting hit with this error
    Traceback (most recent call last):
      File "/fn/lib/venv/lib/python3.10/site-packages/flytekit/core/base_task.py", line 479, in dispatch_execute
        native_outputs = self.execute(**native_inputs)
      File "/fn/lib/venv/lib/python3.10/site-packages/flytekit/core/python_function_task.py", line 163, in execute
        return self.dynamic_execute(self._task_function, **kwargs)
      File "/fn/lib/venv/lib/python3.10/site-packages/flytekit/core/python_function_task.py", line 268, in dynamic_execute
        return self.compile_into_workflow(ctx, task_function, **kwargs)
      File "/fn/lib/venv/lib/python3.10/site-packages/flytekit/core/python_function_task.py", line 204, in compile_into_workflow
        literals={
      File "/fn/lib/venv/lib/python3.10/site-packages/flytekit/core/python_function_task.py", line 205, in <dictcomp>
        binding.var: binding.binding.to_literal_model() for binding in workflow_spec.template.outputs
      File "/fn/lib/venv/lib/python3.10/site-packages/flytekit/models/literals.py", line 467, in to_literal_model
        return Literal(map=LiteralMap(literals={k: binding.to_literal_model() for k, binding in self.map.bindings}))
      File "/fn/lib/venv/lib/python3.10/site-packages/flytekit/models/literals.py", line 467, in <dictcomp>
        return Literal(map=LiteralMap(literals={k: binding.to_literal_model() for k, binding in self.map.bindings}))
    ValueError: too many values to unpack (expected 2)
    So we do construct the dictionary in the dynamic task and return it though like
    return in_fastq1, in_fastq2, {k: get_empty_flyte_file() for k in in_metrics_keys}
    is this expected?
    e
    • 2
    • 7
  • s

    Sathish kumar Venkatesan

    10/28/2022, 7:50 AM
    Team, i followed the flyte documentation and updated the snowflake JWT_TOKEN. it is working fine, however token validity is just one day. we need to create new everyday and new token require editing the kubernetes secret. we can create jwt_token without expiry?
    FLYTE_SNOWFLAKE_CLIENT_TOKEN: <JWT_TOKEN>
    s
    y
    k
    • 4
    • 9
  • a

    Andrew Korzhuev

    10/28/2022, 11:45 AM
    Hello! We're deploying Flyte on EKS and prefer to use workflow-specific IAM roles. Could you say if there is a naming schema for writing temporary files to S3 we could depend on to restrict permissions at least on the project level? For example we see that currently there are multiple different naming, which makes it complicated to create reliable bucket access policies:
    s3://.../2y/project-x/development/
    s3://.../metadata/propeller/project-x-development-aq9kwx7nbqmxhccp2wgj/
    y
    t
    • 3
    • 12
  • h

    Hampus Rosvall

    10/28/2022, 12:07 PM
    Hey, we have a workflow where one of the task Pods are not starting. It started to happen after adjusting resource requests, but I can’t see that the Pod is being scheduled. If there were no worker nodes matching resources we should see Pod in pending state, waiting for a worker node to be scheduled up. Tailing all logs from
    flyte
    namespaces grepping after task name yields following logs (in comment).
    d
    • 2
    • 2
  • r

    Rahul Mehta

    10/28/2022, 3:52 PM
    Hey folks, does Flyte's support for integrating w/ the various Kubeflow training operators also include XGBoost support? I only found references to the PyTorch/TF operators in the docs
    k
    k
    • 3
    • 11
  • a

    Adedeji Ayinde

    10/28/2022, 4:20 PM
    Hello! Is there a guide on the use of git-actions for workflow/launchplan registration and launchplan activation in a remote flyte cluster. The goal is to use git actions for production deployment.
    r
    • 2
    • 5
  • y

    Yash Panchwatkar

    10/28/2022, 5:36 PM
    hello all I am trying to deploy flyte I guess reached at the end just got the issue that when I am trying to hit the endpoint flyte console is not reachable please help me to get where I am getting wrong... here is the svc pic
  • y

    Yash Panchwatkar

    10/28/2022, 5:39 PM
    h
    e
    y
    • 4
    • 18
  • a

    Adedeji Ayinde

    10/28/2022, 9:14 PM
    Hi, Can a launchplan be activated and deactivated using the flyte UI?
    k
    j
    • 3
    • 6
Powered by Linen
Title
a

Adedeji Ayinde

10/28/2022, 9:14 PM
Hi, Can a launchplan be activated and deactivated using the flyte UI?
k

Kevin Su

10/28/2022, 10:15 PM
you can deactivated by using flytectl. https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_update_launchplan.html#synopsis
cc @Eugene Jahn can LP be deactivated on flyteconsole? I did’t see the button to archive it.
a

Adedeji Ayinde

11/02/2022, 5:59 PM
@Eugene Jahn Do you have any update on this please?
k

Kevin Su

11/02/2022, 6:52 PM
cc @Jason Porter
j

Jason Porter

11/02/2022, 8:31 PM
The UI currently cannot archive LP’s
But I created a ticket to track this 👍 https://github.com/flyteorg/flyteconsole/issues/633
View count: 5