Hi all wave Question Is there any way to configure different Flyte #flyte-support

Hi all! :wave: Question: Is there any way to confi...

abundant-judge-84756

03/10/2025, 10:51 AM

Hi all! 👋 Question: Is there any way to configure different Flyte agent endpoints that would get automatically used by tasks in different domains? Use case: if running a workflow in 'dev' I want it to get sent to my dev agent deployment, and then 'live' to go to a live agent. Currently we have a config similar to:

Copy code

flyteagent:
  enabled: true
  plugin_config:
    plugins:
      agent-service:
        defaultAgent:
          endpoint: "dns:///flyteagent.flyte-core.svc.cluster.local:8000"
          insecure: true
        agents:
          my-custom-agent:
            # I want to use a different endpoint here depending on the domain of my task
            endpoint: "dns:///my-custom-agent.live:8000"
        agentForTaskTypes:
          - custom_task: my-custom-agent

damp-lion-88352

03/10/2025, 2:15 PM

interesting

damp-lion-88352

03/10/2025, 2:15 PM

1st time hear this usecase

abundant-judge-84756

03/10/2025, 2:31 PM

@damp-lion-88352 Does it feel like an anti-pattern/is there another way we could consider to test changes to an agent?

glamorous-carpet-83516

03/10/2025, 6:28 PM

we don’t support this right now. mind creating an issue for this

billowy-church-83438

03/11/2025, 12:41 AM

+1 To add on what Hanru and Kevin said. @abundant-judge-84756, is your use case for both local and remote testing? If so, running the agent locally or remotely requires proper Kubernetes configuration, including authentication and authorization to the Kubernetes API and flyte control/data planes. Our standard practice is to use the Flyte Sandbox to spin up a local Kubernetes cluster, allowing us to run the agent locally while leveraging the existing agent configuration. That said, there doesn’t seem to be a strong need to complicate the Flyte Agent stack by explicitly supporting both local and remote execution, as developers can already use the same agent configuration schema and functionalities for testing. Even if Flyte Agent were to support workflow requests for both environments, they would still require separate setup. Given that users can already leverage the sandbox for local testing, the return on investment (RoI) for adding this complexity appears low. However, if there’s a use case where different execution paths are needed for different types of workflows, there might be a justification—but it would depend on the exact requirements. If the agent is meant to handle distinct workflow types differently, a more scalable approach might be to create separate agents dedicated to specific runtimes rather than overloading a single agent with multiple execution modes.

glamorous-carpet-83516

03/11/2025, 7:35 PM

might be to create separate agents dedicated to specific runtimes rather than overloading a single agent with multiple execution modes

agree, the alternative is to create a separate agent.

abundant-judge-84756

03/13/2025, 3:27 PM

@billowy-church-83438 The use case here is for remote testing - we have a self-hosted flyte cluster which sends tasks to our self-hosted agents. If we want to make a change to this agent code, we would like to be able to integration test the connection between the agents and our remote workflows. Ideally, these tests would involve running our workflows in the

dev

next

domain on the flyte cluster, and have these automatically talk to the next/staging versions of the agents. In the current setup, the flyte cluster only knows about one agent endpoint for a particular task type - the live agent - so we don't have an easy way to integration test agent changes without deploying the changes to live agents. Pointing a local sandbox workflow to the -next agents is an option but isn't ideal - we can't always replicate the conditions on our remote cluster in the local sandbox environment.

If the agent is meant to handle distinct workflow types differently, a more scalable approach might be to create separate agents dedicated to specific runtimes rather than overloading a single agent with multiple execution modes.

I'm not 100% sure I follow this comment, but it might be a misunderstanding of our use case. We do already have separate agent instances - next (/staging) and live, which have identical functionality and handle identical tasks, and we're hoping to use one as a staging ground to deploy and test new changes. There's only two ways around this that we've been able to think of at the moment - option 1) self-host a second Flyte cluster that's used for testing workflows before making changes to live infrastructure. This confuses the concept of domains, but would allow us to run a 'staging' configuration that points to different agent endpoints. - option 2) create different task definitions - MyAgentTaskNext and MyAgentTaskLive that get picked up by different agents, and use conditionals in the workflows to decide which task to run based on the execution domain. This feels like a bit of an anti-pattern and it introduces a lot of conditional steps to our workflows 🤔 Happy to make an issue - I'll wait until we've confirmed whether this is something that does indeed make sense as a potential feature.

glamorous-carpet-83516

03/13/2025, 11:55 PM

@abundant-judge-84756 we got some ideas to support this in the backend. we can share the PR with you next week.

billowy-church-83438

03/14/2025, 3:50 AM

Sorry has been tight up by things in the past days. So Thanks for the clarification. Now I have better understanding of your use case

The use case here is for remote testing

identical code/execution but for different domains

we’re hoping to use one as a staging ground to deploy and test new changes.

I think what you are proposing is some cost-effective way to do integration testing in PROD. 🙂 For example, currently, we have the luxury to spin off another flyte data planes with bunch of k8s GPU/nodes provisioned for staging. Once tested, the same code same execution will be rolled out to PROD. Your use case probably can enable the testing in PROD directly without additional resources cost and operational overhead. :) Looking forward to seeing the PR that Kevin mentioned as well. 👍

2 Views

Open in Slack

Previous Next