• e

    Evan Sadler

    2 months ago
    Hello! I am trying to use the Flyte Spark plugin with the SynapseML package with the local runner. It works until I try to add the
    spark.jars.packages
    to the config (see example below). There is some kind of connection refused error:
    io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused:
    . I imagine this has to do with some configuration that I don’t quite understand in the backend. Any help is much appreciated. I tested it out with another package and it had the same error.
    import datetime
    import random
    from operator import add
    
    import flytekit
    from flytekit import Resources, task, workflow
    
    
    from flytekitplugins.spark import Spark
    
    @task(
        task_config=Spark(
            # this configuration is applied to the spark cluster
            spark_conf={
                "spark.driver.memory": "8g",
                "spark.jars.repositories": "<https://mmlspark.azureedge.net/maven>", 
                "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.9.5"  # adding this causes problems
            }
        ),
        limits=Resources(mem="2000M"),
        cache_version="1",
    )
    def hello_spark(partitions: int) -> float:
        print("Starting Spark with Partitions: {}".format(partitions))
    
        n = 100000 * partitions
        sess = flytekit.current_context().spark_session
        count = (
            sess.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
        )
        pi_val = 4.0 * count / n
        print("Pi val is :{}".format(pi_val))
        return pi_val
    e
    Samhita Alla
    5 replies
    Copy to Clipboard
  • Sandra Youssef

    Sandra Youssef

    2 months ago
    Hi Flyers, Join us tomorrow for our very first Fireside Chat with Flyte Contributors, plus community updates and roadmap items in our biweekly community sync. Enjoy guest appearances from: • @Sugato Ray, PhD Candidate & UnionML contributor • @Matheus Moreno, ML Engineer at Hurb • @Robin Kahlow, ML Engineer at Ntropy • @Mike Zhong, SR Software Engineer at Embark Veterinary • @krishna Yerramsetty, Data Scientist at Infinome Biosciences Tuesday 7/12 at 9am PTCalendar Invite and Zoom Link See you there! Flyte Team
  • Bernhard Stadlbauer

    Bernhard Stadlbauer

    2 months ago
    Hi! We are currently trying to add automatic tracing through datadog to the pods started through Flyte. To do so, we would need to set a few environment variables and mount a volume. While we can set the environment variables in the config, there is no way to add volumes here. Is there any other way that we're missing? The only other thing I could think of is switching all of our tasks to pod tasks, where we would have control over the whole spec, but that is something we would like to avoid if possible. Also happy to contribute here if this is something you'd see a need for.
    Bernhard Stadlbauer
    Dan Rammer (hamersaw)
    +1
    17 replies
    Copy to Clipboard
  • Rahul Mehta

    Rahul Mehta

    2 months ago
    We've been putting flyte through its paces on high fan-out workflows (ie ~50 parallel subworkflows using
    @dynamic
    ) and we've noticed some flakiness w/ the graph view when expanding subworkflows. In lieu of that, it seems like the timeline view is better suited to finding failing nodes. Would there be interest in a contribution to improve filtering in the timeline view (ie. using the same categories in the graph view)?
    Rahul Mehta
    Ketan (kumare3)
    +1
    9 replies
    Copy to Clipboard
  • seunggs

    seunggs

    2 months ago
    Is the name of the workflow (i.e. the function decorated with
    @workflow
    ) the unique identifier for that workflow (combined with project and domain)?
  • seunggs

    seunggs

    2 months ago
    i.e. you can’t have two workflows with the same name?
    seunggs
    Samhita Alla
    4 replies
    Copy to Clipboard
  • p

    Prada Souvanlasy

    2 months ago
    hello! assuming that we're in a
    @dynamic
    workflow, is there a way to define dependencies between tasks without relying on their respective outputs? i.e, we would like to run
    task2()
    after
    task1()
    even though the latter does not output anything. In a classic
    @workflow
    , we could rely on
    create_node()
    +
    >>
    but can't figure out the equivalent for
    @dynamic
    p
    Ketan (kumare3)
    +2
    18 replies
    Copy to Clipboard
  • Rémy Dubois

    Rémy Dubois

    2 months ago
    Run a task on others error
    Rémy Dubois
    Ketan (kumare3)
    10 replies
    Copy to Clipboard
  • Sandra Youssef

    Sandra Youssef

    2 months ago
    Hi Flyers, Flyte will be at SciPy 2022 this week!@Niels Bantilan will talk about the basics of Flyte and its programming model that gives it productionizing strengths, in "Reliable, Reproducible, Recoverable, and Auditable Machine Learning for Production Workloads with Flyte" Thursday 7/14, 3:50pm CDT
  • Niels Bantilan

    Niels Bantilan

    2 months ago
    Developing some memes for Scipy… what do you think?