https://flyte.org logo
Join the conversationJoin Slack
Channels
announcements
ask-the-community
auth
conference-talks
contribute
databricks-integration
datahub-flyte
deployment
ecosystem-unionml
engineeringlabs
events
feature-discussions
flyte-bazel
flyte-build
flyte-console
flyte-deployment
flyte-documentation
flyte-github
flyte-ui-ux
flytekit
flytekit-java
flytelab
great-content
hacktoberfest-2022
helsing-flyte
in-flyte-conversations
introductions
jobs
konan-integration
linkedin-flyte
random
ray-integration
ray-on-flyte
release
scipy-2022-sprint
sig-large-models
workflow-building-ui-proj
writing-w-sfloris
Powered by Linen
announcements
  • j

    Jake Neyer

    05/25/2022, 5:40 PM
    Hey all! As promised, thanks largely to @Matthew Griffin, we are dropping our Flyte VSCode extension for anyone who is interested https://github.com/Striveworks/Flyte-Wingman We would love feedback and contributions!
    🎉 15
    ❤️ 10
    😛artyparrot: 4
    n
    b
    +2
    • 5
    • 7
  • g

    George Odette

    05/25/2022, 8:59 PM
    Hi all, not flyte related. But has anyone ever experienced this while working with docker? Trying to upload a workflow
    k
    • 2
    • 1
  • s

    Samhita Alla

    05/26/2022, 9:46 AM
    📣 Contributors of the Month 📣 Hi everyone! For the month of May, we want to give shout-outs and send swag to the following list of contributors who are sweating it out to make Flyte even better! • @eugene jahn contributed to FlyteConsole, our UI. He’s now part of our team, so you can expect consistent contributions from him. • @Nick Müller (MorpheusXAUT) worked on a couple of features and fixes, including standardizing flyteidl docs generation, improving the local set-up for contributors, and adding an interruptible override to launch forms. We hope to have many more of your contributions, Eugene and Nick! All contributors of the month should be available on flyte.org.
    🎉 11
    😛artyparrot: 2
    🙌 7
    💯 2
  • s

    Sandra Youssef

    05/26/2022, 11:33 PM
    Hi Flyers, Learn how MLOps is actually a combination of Machine Learning, DevOps, and Data Engineering, and how Flyte can help solve MLOps challenges in this blog by Union.ai's @Samhita Alla in cooperation with the MLOps Community. https://mlops.community/mlops-with-flyte-the-convergence-of-workflows-between-machine-learning-and-engineering/
    ❤️ 6
  • s

    Sandra Youssef

    05/27/2022, 4:41 PM
    Learn how we are growing the Flyte ecosystem, the latest releases and improvements, upcoming conference talks, and soon-to-be-launched UnionML! All in Flyte Monthly Issue #8. Out now! https://www.getrevue.co/profile/flyte/issues/flyte-monthly-issue-8-1157760 Subscribe here: https://www.getrevue.co/profile/flyte
    ❤️ 4
  • s

    Sandra Youssef

    05/30/2022, 5:50 PM
    Hi Flyers, Join our Community Sync tomorrow 5/31, featuring: • Flyte Contributor of the Month guest appearances • Flyte blog writer guest appearance • Flyte ML projects - UnionML and Flyte-Wingman VS Code Extension - short talks • FlyteConsole UI updates with @Jason Porter, and • MethaneSAT guest speaker, @Nicholas LoFaso presenting "Transforming Satellite Data with Flyte @ MethaneSAT." Not to be missed! See you tomorrow 9am PT on Zoom! Flyte Team https://addevent.com/event/EA7823958
  • e

    Eugene Cha

    05/31/2022, 6:13 AM
    we're trying to run the caching.py example to see how the caching works, but it appears to only work sometimes. we increased the sleep time to 50 seconds
    def hash_pandas_dataframe(df: pandas.DataFrame) -> str:
        return str(pandas.util.hash_pandas_object(df))
    
    
    @task
    def uncached_data_reading_task() -> Annotated[
        pandas.DataFrame, HashMethod(hash_pandas_dataframe)
    ]:
        return pandas.DataFrame({"column_1": [1, 2, 3]})
    
    
    @task(cache=True, cache_version="1.0")
    def cached_data_processing_task(df: pandas.DataFrame) -> pandas.DataFrame:
        time.sleep(50)
        return df * 2
    
    
    @task
    def compare_dataframes(df1: pandas.DataFrame, df2: pandas.DataFrame):
        assert df1.equals(df2)
    
    
    @workflow
    def cached_dataframe_wf():
        raw_data = uncached_data_reading_task()
    
        # We execute `cached_data_processing_task` twice, but we force those
        # two executions to happen serially to demonstrate how the second run
        # hits the cache.
        t1_node = create_node(cached_data_processing_task, df=raw_data)
        t2_node = create_node(cached_data_processing_task, df=raw_data)
        t1_node >> t2_node
    
        # Confirm that the dataframes actually match
        compare_dataframes(df1=t1_node.o0, df2=t2_node.o0)
    
    
    if __name__ == "__main__":
        df1 = cached_dataframe_wf()
        print(f"Running cached_dataframe_wf once : {df1}")
    but sometimes the caching works and sometimes it doesnt. we've tried running with pyflyte run --remote caching.py cached_dataframe_wf as well as trying the relaunch button but as you can see in the pictures it tends to not work and i'm not sure why. any ideas?
    p
    k
    k
    • 4
    • 26
  • e

    Emirhan Karagül

    06/01/2022, 11:13 AM
    Hi everybody, We have been using flyte schedules for a while now. Yesterday something mysterious happened. A scheduled launchplan got executed 15 hours earlier. Usually the execution used to start around 3 seconds after 5:00 AM UTC. Anybody has an idea on how I can debug this and see what happened? (I hope it happened due to some cosmic bit flipping 😄) Thank you!
    ~ flytectl get execution -p default -d production f38f2b53cfda08ccb000 -o yaml
    closure:
      createdAt: "2022-05-31T14:03:10.025255897Z"
      duration: 501.772263113s
      outputs:
        uri: gs://<our-flyte-store>/metadata/propeller/default-production-f38f2b53cfda08ccb000/end-node/data/0/outputs.pb
      phase: SUCCEEDED
      startedAt: "2022-05-31T14:03:15.128550714Z"
      updatedAt: "2022-05-31T14:11:36.900814113Z"
      workflowId:
        domain: production
        name: flyte_workflows.collaborative_filtering.workflow.pipeline
        project: default
        resourceType: WORKFLOW
        version: 0.2.2
    id:
      domain: production
      name: f38f2b53cfda08ccb000
      project: default
    spec:
      launchPlan:
        domain: production
        name: hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0
        project: default
        resourceType: LAUNCH_PLAN
        version: 0.2.2
      metadata:
        mode: SCHEDULED
        scheduledAt: "2022-06-01T05:00:00Z"
        systemMetadata: {}
    p
    • 2
    • 7
  • r

    Robin Kahlow

    06/01/2022, 11:51 AM
    Is it not possible to have default arguments for lists on workflows? eg. when i try to
    pyflyte run
    on
    @workflow
    def wf(
        total_samples: List[int] = [16, 32, 64, 256],
    ):
    I get
    TypeError: the JSON object must be str, bytes or bytearray, not list
    but without specifying a default it does work (and i can pass it to pyflyte as a string json list)
    s
    • 2
    • 4
  • r

    Robin Kahlow

    06/01/2022, 3:57 PM
    Trying to use GPUs, I added a tolerations section as described here https://docs.flyte.org/projects/cookbook/en/stable/auto/deployment/configure_use_gpus.html (and in a previous comment where it was clarified where to apply this https://flyte-org.slack.com/archives/CNMKCU6FR/p1651591056890689?thread_ts=1651584781.772139&amp;cid=CNMKCU6FR), ie.
    # -- Kubernetes specific Flyte configuration
      k8s:
        plugins:
          # -- Configuration section for all K8s specific plugins [Configuration structure](<https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config>)
          k8s:
            default-env-vars: []
            #  DEFAULT_ENV_VAR: VALUE
            default-cpus: 100m
            default-memory: 100Mi
    
            resource-tolerations:
              - <http://nvidia.com/gpu|nvidia.com/gpu>:
                - key: "key1"
                  operator: "Equal"
                  value: "value1"
                  effect: "NoSchedule"
    and I applied that with helm and also tried restarting the Flyte pods (kubectl rollout restart deploy), but the pods that get started by Flyte workflows don't get these tolerations (although they do get a default nvidia.com/gpu "exists" toleration regardless of my addition above). Anything I'm doing wrong?
    k
    • 2
    • 20
  • s

    Sandra Youssef

    06/01/2022, 5:18 PM
    Hi Flyers, Thank you for attending this week's Flyte Community Sync. Special thanks to our contributors who made guest appearances to talk about their work with Flyte, and our guest speaker Nicholas LoFaso. The recordings are now available on YouTube: • Community Updates & Contributor guest appearances, by @Martin Stein, Union.ai.

    Recording▾

    • FlyteConsole UI Updates, by @Jason Porter, Union.ai.

    Recording▾

    • Transforming Satellite Data with Flyte @ MethaneSAT, by @Nicholas LoFaso, MethaneSAT.

    Recording▾

    Meeting notes Join us for the next community sync on June 14th! Flyte Team
    :flyte: 4
    🙌 7
  • d

    David Przybilla

    06/02/2022, 6:25 AM
    👋 Hey Flyte community, I am sketching what I learnt while doing some tiny contributions. Hopefully to pave the road for upcoming contributors. Currently sketching them in the link below. Any advices on where would be the best place for such content to live?. I was thinking of adding a
    contributor_guide.rst
    to
    flyteorg/flyte
    . https://github.com/dav009/flyte/blob/dp-contributor-fastrack/rsts/community/contributor_fast_track.md
    :flyte: 1
    ❤️ 7
    p
    s
    • 3
    • 3
  • m

    Maarten de Jong

    06/02/2022, 11:31 AM
    Hi guys! Is there a way to invalidate the cache for a certain input for a task within a workflow, without increasing the cache version/invalidating the cache of other inputs?
    n
    k
    n
    • 4
    • 29
  • n

    Nastya Rusina

    06/02/2022, 7:51 PM
    Hi all 👋🏽 The new release of FlyteConsole v1.1.1 is now available. 🎉 More info available in #flyte-console channel.
    🎉 8
  • m

    Matheus Moreno

    06/03/2022, 3:08 PM
    Hey, everyone! I'm trying to execute the Flyte sandbox but this is happening. What could it be?
    Deploying Flyte...
    Getting updates for unmanaged Helm repositories...
    ...Successfully got an update from the "<https://googlecloudplatform.github.io/spark-on-k8s-operator>" chart repository
    ...Successfully got an update from the "<https://kubernetes.github.io/dashboard/>" chart repository
    ...Successfully got an update from the "<https://charts.bitnami.com/bitnami>" chart repository
    Error: can't get a valid version for repositories contour. Try changing the version constraint in Chart.yaml
    y
    e
    • 3
    • 3
  • s

    SeungTaeKim

    06/07/2022, 4:22 AM
    Hi, I am trying to assign GPU (Nvidia MIG - Multi-Instance GPU) on flyte. However, I cannot see how I assign it on each tasks in flyte (there were no mentions in flyte docs). Does anyone have a solution to it?
    s
    k
    y
    • 4
    • 14
  • r

    Rahul Mehta

    06/07/2022, 10:21 PM
    Is anyone in the flyte community currently using pants as their build system? In particular, curious if there are any particular considerations for running flyte workloads that are packaged as pex binaries
    k
    k
    • 3
    • 4
  • k

    Klemens Kasseroller

    06/08/2022, 12:33 PM
    Hi, I am trying to use lists of FlyteFiles inside dataclasses. It seems to me, that when passing the dataclass from one task to the other, the reference to the remote source is lost. See the following example:
    from dataclasses import dataclass
    from flytekit import task, workflow
    from typing import List
    
    from dataclasses_json import dataclass_json
    from flytekit.types.file import FlyteFile
    
    
    @dataclass_json
    @dataclass
    class InputsContainer:
        files: List[FlyteFile]
    
    
    @task
    def task1(inputs: List[FlyteFile]) -> InputsContainer:
        print("TASK1 remote source: ", inputs[0].remote_source)
        return InputsContainer(files=inputs)
    
    
    @task
    def task2(inputs: InputsContainer) -> None:
        print("TASK2 remote source: ", inputs.files[0].remote_source)
    
    
    @workflow
    def main_workflow(inputs: List[FlyteFile]) -> None:
        task1_outputs = task1(inputs=inputs)
        task2(inputs=task1_outputs)
    
    
    if __name__ == '__main__':
        file_path = FlyteFile("<s3://test-bucket/test.json>")
        main_workflow(inputs=[file_path])
    The output generated is:
    TASK1 remote source:  <s3://test-bucket/test.json>
    TASK2 remote source:  None
    Could anyone help me out here? Thanks!
    k
    e
    +3
    • 6
    • 10
  • a

    Andrew Dye

    06/08/2022, 4:08 PM
    \
  • s

    SeungTaeKim

    06/09/2022, 8:33 AM
    I have finished up this issue by using the pod example on this link. someone who wants to try MIG will define a pod spec through k8s module. I attach the sample codes through gist
    👍 2
    s
    • 2
    • 2
  • s

    Sandra Youssef

    06/09/2022, 4:58 PM
    📣📣📣 Union.ai is proud to announce the release of UnionML, an open-source MLOps framework built on Flyte, that can bundle Python functions into ML microservices. It is the only library that seamlessly manages both data science workflows and production lifecycle tasks. UnionML's release is being announced today at MLOps World 2022 - Toronto, where creator @Niels Bantilan will be hosting a demo booth this afternoon and presenting a talk tomorrow, 6/10 at 1:45pm EDT, titled UnionML: a Microframework for Building Machine Learning Applications. Check it out: • Press release: Union.ai releases UnionML for seamless creation of web-native machine learning applications • Website: https://www.union.ai/unionml • Demo

    video▾

    • Blog post: Union.ai releases UnionML for seamless creation of web-native machine learning applications • Release notes: https://github.com/unionai-oss/unionml/releases/tag/v0.1.0 • Star it on GitHub!
    🎉 17
    🔥 17
    k
    • 2
    • 2
  • s

    Sandra Youssef

    06/09/2022, 5:23 PM
    Also, join #unionml for more information and latest updates!
  • b

    Brian Tang

    06/10/2022, 11:21 PM
    Cross posting from https://flyte-org.slack.com/archives/C01P3B761A6/p1654537014728059 to see if anybody here might know 🙂 . I tried restarting the
    flytepropeller
    deployment and it’s hard to tell if the system is picking up the changes:
    $ kubectl logs -f deployment/flytepropeller -nflyte | grep -i logs
    Found 6 pods, using pod/flytepropeller-785bcc6f6d-sr8bl
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [plugins.logs] updated. No update handler registered.","ts":"2022-06-08T16:32:15Z"}
    {"json":{"src":"viper.go:396"},"level":"debug","msg":"Config section [plugins.logs] hasn't changed.","ts":"2022-06-10T22:36:33Z"}
    From looking at the plugins — it seems like adding that config should just automatically work
    r
    • 2
    • 5
  • r

    Robin Kahlow

    06/12/2022, 12:23 AM
    Startrd working on a cookiecutter template here https://github.com/RobinKa/flyte-template, I wanted to make it easy to build and execute workflows for new people, so after creating a project with it all you have to do (besides having docker, flytectl set up) is install the dev dependencies and run
    python commands.py build-execute.py --version v1
    ❤️ 6
    h
    e
    k
    • 4
    • 8
  • a

    Afiz

    06/12/2022, 2:52 PM
    👋 Hi everyone! Is this the correct channel to post questions about UnionML?
    ❤️ 2
    k
    m
    • 3
    • 3
  • s

    Sandra Youssef

    06/13/2022, 7:40 PM
    Hi Flyers, Join the Flyte Community Sync tomorrow 6/14 for: • Community member appearances • The all-new FlyteDecks by @Eduardo Apolinario (eapolinario) • Guest speaker, @Matthew Griffin, presenting "*Pterodactyl, Javascript SDK for Flyte*." 9am PT Calendar Invite &amp; Zoom Link See you there! Flyte Team
    🌶️ 3
    🔥 6
  • g

    George Odette

    06/14/2022, 6:27 PM
    Hi @Ketan (kumare3) and anyone else, is there a flyte version of this functionality?
    k
    • 2
    • 1
  • v

    Vijay Saravana

    06/14/2022, 7:23 PM
    Hello guys, Is there a detailed document on Flyte Map tasks. I just see a very superficial page in the website.
    k
    m
    s
    • 4
    • 12
  • s

    Slackbot

    06/15/2022, 9:14 AM
    This message was deleted.
    s
    o
    • 3
    • 2
  • f

    Fabio Grätz

    06/15/2022, 10:56 AM
    Hey, is there documentation for the new pod templates feature?
    d
    k
    y
    • 4
    • 27
Powered by Linen
Title
f

Fabio Grätz

06/15/2022, 10:56 AM
Hey, is there documentation for the new pod templates feature?
d

Dan Rammer (hamersaw)

06/15/2022, 11:21 AM
Hey @Fabio Grätz, are you referring to setting a default PodTemplate for k8s Pod configuration? If so, I don't believe we have written anything up (will file an issue for it). Hopefully configuration is pretty simple, you just need to set the default-pod-template-name configuration option on FlytePropeller. When executing, FlytePropeller attempts to use the PodTemplate in the namespace that the Pod will be created in (ex. by default a pod in the project
flytesnacks
and domain
development
will look for a PodTemplate in the
flytesnacks-development
namespace). If that PodTemplate does not exist, it then attempts to find on in the namespace that FlytePropeller runs in.
The only thing to note is that PodTemplates are required to have a container set. In the implementation we override this value, because Flyte requires certain containers to be running. So when definined the default PodTemplates you need to do something like:
apiVersion: v1
kind: PodTemplate
metadata:
name: flyte-default-template
namespace: flyte
template:
metadata:
spec:
containers:
- name: noop
image: <http://docker.io/rwgrim/docker-noop|docker.io/rwgrim/docker-noop>
subdomain: "default-subdomain"
Where in this example I defined a
noop
container.
cc @Marc Paquette
👍 1
✅ 1
A link to the github issue for visibility.
f

Fabio Grätz

06/15/2022, 11:56 AM
Thank you, I will try this out! The original motivation for looking into this is that I need to start my workflow tasks on a tainted node pool of high-powered machines that e.g. the flyte backend etc. wouldn’t be able to use. I have been using the default tolerations to do this but the Spark tasks don’t use them currently. I fixed this in this PR (Ketan already tagged you I think). Does this fix make sense to you? Spark on K8s currently unfortunately doesn’t use v1.PodTemplateSpec but a custom pod spec (see discussion). So I think that the pod template wouldn’t solve my problem as flytepropeller would have to translate the pod template into sparks custom pod spec. Is this correct?
d

Dan Rammer (hamersaw)

06/15/2022, 1:51 PM
I think your PR is certainly justified. Looking a bit deeper I wonder if we should take it a step further. It looks like (and sound like in the discussion) that the spark PodSpec is a subset of the k8s PodSpec. Should we set all spark configuration according to the k8s plugin config? I see that NodeSelector, HostNetwork, SchedulerName, Affinity (among others) are in both PodSpecs but not carried over from Flyte k8s configuration. Is there any reason that these values should not be?
f

Fabio Grätz

06/15/2022, 1:54 PM
No, i don’t think there is a good reason against also carrying those over. I will add this to the PR in the next days and ping you again once ready?
d

Dan Rammer (hamersaw)

06/15/2022, 1:56 PM
And yeah, currently the default PodTemplate is only applied to Pods launched using the Pod plugin (ie. Container and Pod tasks). Maybe it makes sense to apply these values as default to all resources, including Spark. There seem to be a few options: • Spark 3 allows passing default PodTemplates, but require a volume mount I think - so it could be a little messy • We could use the Flyte default PodTemplate to initialize a default Spark PodSpec - similar to the mapping in the opposite direction that Spark 3 seems to do. If we want to transition to using the Flyte default PodTemplate rather than adding one-off configuration in the k8s plugin configuration (as we certainly want to) this may set some precedent. We would have to have a deeper discussion.
Sounds great! Thanks for putting this together, I'll be waiting on the changes!
f

Fabio Grätz

06/15/2022, 1:58 PM
👍 Will implement the changes regarding the default config in the k8s plugin (probably on the weekend) and I’m happy to have a deeper discussion about using pod templates also for spark later on.
👍 2
d

Dan Rammer (hamersaw)

06/15/2022, 2:02 PM
cc @Ketan (kumare3) @Haytham Abuelfutuh RE: Spark plugin configuration and extending application of Flyte default PodTemplate
❤️ 1
f

Fabio Grätz

06/19/2022, 8:40 PM
Hey @Dan Rammer (hamersaw), I continued working on this but converted the PR to draft because I’m not done yet. I could use your input on something 🙂 Flyteplugins currently uses
<http://github.com/GoogleCloudPlatform/spark-on-k8s-operator|github.com/GoogleCloudPlatform/spark-on-k8s-operator> v0.0.0-20200723154620-6f35a1152625
. In this version,
SparkPodSpec
has:
// SecurityContenxt specifies the PodSecurityContext to apply.
	// +optional
	SecurityContenxt *apiv1.PodSecurityContext
Notice the
SecurityContext
of type
PodSecurityContext
. In flyteplugins we set the spark pod's security context to the DefaultPodSecurityContext accordingly:
SecurityContenxt: config.GetK8sPluginConfig().DefaultPodSecurityContext.DeepCopy(),
The k8s plugin config has both a
DefaultPodSecurityContext
as well as a
DefaultSecurityContext
. In the newer spark-on-k8s-operator versions, this has been fixed and there is now both the
PodSecurityContext
as well as the
SecurityContext
. Do you agree that this should be fixed in flyteplugins by using a newer version of the spark-on-k8s-operator? I tried fixing this but
go get -v -u <http://github.com/GoogleCloudPlatform/spark-on-k8s-operator@master|github.com/GoogleCloudPlatform/spark-on-k8s-operator@master>
gives me the following error:
go: <http://github.com/GoogleCloudPlatform/spark-on-k8s-operator@v0.0.0-20220615230608-94775cd89ca0|github.com/GoogleCloudPlatform/spark-on-k8s-operator@v0.0.0-20220615230608-94775cd89ca0> requires
        <http://k8s.io/kubernetes@v1.19.6|k8s.io/kubernetes@v1.19.6> requires
        <http://k8s.io/api@v0.0.0|k8s.io/api@v0.0.0>: reading <http://k8s.io/api/go.mod|k8s.io/api/go.mod> at revision v0.0.0: unknown revision v0.0.0
This appears to be a known issue due to the way k8s uses its
go.mod
and people have written bash scripts to work around this. I wonder whether you are others within flyteorg have experienced this before and can give me a hint how to handle this (in case you agree that upgrading spark-on-k8s-operator makes sense). Thanks 🙂
k

Ketan (kumare3)

06/20/2022, 12:13 AM
We have sadly. This is why eventually we want out of binary backend plugins
But solution- cc @Haytham Abuelfutuh and @Yuvraj have seen I think
d

Dan Rammer (hamersaw)

06/22/2022, 4:02 AM
Pinging to keep this active. I agree, if we are updating the Spark plugin configuration, we might as well try to clean up everything. In reading the issue a bit, it seems this k8s dependency problem is not going away - so if we ever want to update spark-on-k8s-operator we will need a fix here. If I understand correctly, the proposed solution adds
replace
commands for all of the k8s internal dependencies because the kubernetes repo is not meant to be a dependency so they declare v0.0.0 for all and use
replace
to point to a local version. Is this going to be a fix we can isolate to flyteplugins? Or would this need to be in flytepropeller as well? Not sure how replace cascades in the build. I think that integrating a script to pull k8s dependencies and insert replace statements in the go.mod may be solution? @Haytham Abuelfutuh / @Yuvraj thoughts?
f

Fabio Grätz

06/23/2022, 4:27 PM
I can get back to this on the weekend and try whether adding the
replace
via the script in flyteplugins works without cascading to flytepropeller @Dan Rammer (hamersaw).
d

Dan Rammer (hamersaw)

06/23/2022, 5:34 PM
Thanks, really appreciate your time on this! I would love to get this upgraded fully to set the lastest spark pod spec fields. but if it gets to be too much I think working with the existing spark-on-k8s version is still a good sight better than the current situation.
f

Fabio Grätz

06/26/2022, 9:35 PM
Hey @Dan Rammer (hamersaw), quick update, I haven’t forgotten about this 🙂 in order to use the latest spark-on-k8s-operator, I used this script to add the required
replace
instructions for all used k8s packages (to later test whether this workaround can be used in flyteplugins without cascading to flytepropeller). Works smoothly. In order to get the tests green again, I’m now working on fixing
tasks/pluginmachinery/k8s/client.go
since
<http://sigs.k8s.io/controller-runtime|sigs.k8s.io/controller-runtime>
is updated from
v0.8.2
to
v0.12.2
and two tags after
v0.8.2
, in
v0.9.0-alpha.0
, the
ClientBuilder
which flyteplugins uses here was deprecated in favor of
NewClientFunc
(see commit message). I haven’t figured out how to adapt flyteplugins to that change yet but will continue working on this in the next few days… Might have to get back to you for some guidance 😅🙏
d

Dan Rammer (hamersaw)

07/18/2022, 4:53 PM
Hey @Fabio Grätz, sorry I was out for a little bit so I let this go stale. Lets dive back in a wrap this up! So it looks like there was some previous attempt to bump the spark-on-k8s-operator version in this PR by just hardcoding the k8s version dependencies. In discussing this a bit further, how do you feel about just hardcoding these? it should make wrapping this up pretty quick right?
f

Fabio Grätz

07/19/2022, 11:46 AM
Hey Dan, I also had a few busy weeks and couldn’t work on it but will have a few hours waiting at an airport tomorrow. I’ll use these to work on this 🙂
👍 1
Hey @Dan Rammer (hamersaw), hard-coding the k8s requirements (which can be automated nicely using this script) works but it will have to be done in flytepropeller as well and upgrading the k8s version there will require some other minor fixes. Also this leads a bit into dependency hell since when upgrading k8s to newer versions >=1.22 the kubeflow training operators cause problems because flyteplugins doesn’t use the new kubeflow train operators mono-repo yet and the individual pytorch operator, tensorflow operator, … repos haven’t been updated to support k8s versions >=1.22. I’m happy to continue trying to find a good solution for this (first finish the ‘k8sPluginConfig -> spark plugin’ ticket while upgrading to newest spark operator and k8s to 1.21.x and then in a separate PR maybe continue upgrading to up-to-date k8s versions while replacing the separate legacy kubeflow train operator repositories with the new mono-repo.) But I would like to confirm with you first that hard-coding the k8s requirements not only in flyteplugins but also in flytepropeller is ok for you. Do you see another way if one ever wants to upgrade to newer k8s versions?
d

Dan Rammer (hamersaw)

07/26/2022, 1:51 PM
@Fabio Grätz again, very much appreciate all the effort you're dumping into this. So without (hopefully) diving too deep I want to explain the hesitancy. We have been wanted to get out-of-core plugins integrated for some time. This means something like hashicorps grpc plugin or go's dll plugin system - where we no longer need a flyteplugins repo with everything bundled. This should solve the current issue, each plugin maintains it's own dependencies and is not compiled into FlytePropeller, etc. Out-of-core plugins will certainly not be completed by flyte 1.2 (october release), but i'm thinking i'll try to make it my project for 1.3. It sounds like you have this pretty thought out; from our side we just want to ensure that the refactor on out-of-core plugins is not too messy. It sounds like we can upgrade to the latest spark operation with 1.21 upgrade by just setting dependencies in flyteplugins and flytepropeller. Then the refactoring necessary for out-of-core plugins is removing these. Does that sound right? Trying to make sure we are not going so deep that we're adding a ton of work for ourselves later on. Also, is the
PodSecurityContext
the only field that we're gaining by an upgrade the latest spark operator?
k

Ketan (kumare3)

07/26/2022, 1:55 PM
Cc @Yee
f

Fabio Grätz

07/26/2022, 4:35 PM
What you are saying makes a ton of sense after having tried to find a way through the dependency hell of the different plugins compiled into flytepropeller for several hours yesterday. It is super difficult to find k8s versions that allow upgrading, in this case, to newer spark-on-k8s versions without requiring other plugins to upgrade as well which then in turn require other things to be adapted… Also with 1.21 I’m not fully through all problems yet. So I absolutely understand the hesitancy and why you want to go for a plugin system where the plugins don’t have to agree on a certain set of dependencies but maintain their own. I see two good options: • I can figure out a k8s version/way through dependency hell and upgrade to a new version of spark-on-k8s by pinning k8s dependencies in flytepropeller and flyteplugins without requiring any other changes that would later complicate the refactoring for out-of-core plugins. We would gain being up to date with all fields but I actually need to look into whether there are even others than
PodSecurityContext
. I will time-box this effort. • I will transfer all fields from the k8s plugin config that can be set with the current spark-on-k8s version (which would solve my current problem of using the default tolerations) and I would be happy to upgrade spark-on-k8s and also other plugins later once the plugins have been moved out of core. Does that make sense to you?
d

Dan Rammer (hamersaw)

07/26/2022, 5:25 PM
Absolutely, I think the later makes the most sense. Regarding upgrading versions, what would take 10 hours for a hacky solution now should be a simple version bump once out-of-core plugins are implemented. Additionally, it will unblock you guys in the short-term. @Yee what do you think?
f

Fabio Grätz

07/26/2022, 5:58 PM
Agreed, will go for 2. I’ll take a look at which of the fields can be updated with the current version and amend the PR. Will ping you then 👍
y

Yee

07/26/2022, 6:12 PM
yes! let’s just add everything that can be added today without wrangling dependencies
👍 1
View count: 15