Hi all! :wave: Question: Is there any way to confi...
# flyte-support
a
Hi all! đź‘‹ Question: Is there any way to configure different Flyte agent endpoints that would get automatically used by tasks in different domains? Use case: if running a workflow in 'dev' I want it to get sent to my dev agent deployment, and then 'live' to go to a live agent. Currently we have a config similar to:
Copy code
flyteagent:
  enabled: true
  plugin_config:
    plugins:
      agent-service:
        defaultAgent:
          endpoint: "dns:///flyteagent.flyte-core.svc.cluster.local:8000"
          insecure: true
        agents:
          my-custom-agent:
            # I want to use a different endpoint here depending on the domain of my task
            endpoint: "dns:///my-custom-agent.live:8000"
        agentForTaskTypes:
          - custom_task: my-custom-agent
d
interesting
1st time hear this usecase
a
@damp-lion-88352 Does it feel like an anti-pattern/is there another way we could consider to test changes to an agent?
g
we don’t support this right now. mind creating an issue for this
b
+1 To add on what Hanru and Kevin said. @abundant-judge-84756, is your use case for both local and remote testing? If so, running the agent locally or remotely requires proper Kubernetes configuration, including authentication and authorization to the Kubernetes API and flyte control/data planes. Our standard practice is to use the Flyte Sandbox to spin up a local Kubernetes cluster, allowing us to run the agent locally while leveraging the existing agent configuration. That said, there doesn’t seem to be a strong need to complicate the Flyte Agent stack by explicitly supporting both local and remote execution, as developers can already use the same agent configuration schema and functionalities for testing. Even if Flyte Agent were to support workflow requests for both environments, they would still require separate setup. Given that users can already leverage the sandbox for local testing, the return on investment (RoI) for adding this complexity appears low. However, if there’s a use case where different execution paths are needed for different types of workflows, there might be a justification—but it would depend on the exact requirements. If the agent is meant to handle distinct workflow types differently, a more scalable approach might be to create separate agents dedicated to specific runtimes rather than overloading a single agent with multiple execution modes.
g
might be to create separate agents dedicated to specific runtimes rather than overloading a single agent with multiple execution modes
agree, the alternative is to create a separate agent.
a
@billowy-church-83438 The use case here is for remote testing - we have a self-hosted flyte cluster which sends tasks to our self-hosted agents. If we want to make a change to this agent code, we would like to be able to integration test the connection between the agents and our remote workflows. Ideally, these tests would involve running our workflows in the
dev
or
next
domain on the flyte cluster, and have these automatically talk to the next/staging versions of the agents. In the current setup, the flyte cluster only knows about one agent endpoint for a particular task type - the live agent - so we don't have an easy way to integration test agent changes without deploying the changes to live agents. Pointing a local sandbox workflow to the -next agents is an option but isn't ideal - we can't always replicate the conditions on our remote cluster in the local sandbox environment.
If the agent is meant to handle distinct workflow types differently, a more scalable approach might be to create separate agents dedicated to specific runtimes rather than overloading a single agent with multiple execution modes.
I'm not 100% sure I follow this comment, but it might be a misunderstanding of our use case. We do already have separate agent instances - next (/staging) and live, which have identical functionality and handle identical tasks, and we're hoping to use one as a staging ground to deploy and test new changes. There's only two ways around this that we've been able to think of at the moment - option 1) self-host a second Flyte cluster that's used for testing workflows before making changes to live infrastructure. This confuses the concept of domains, but would allow us to run a 'staging' configuration that points to different agent endpoints. - option 2) create different task definitions - MyAgentTaskNext and MyAgentTaskLive that get picked up by different agents, and use conditionals in the workflows to decide which task to run based on the execution domain. This feels like a bit of an anti-pattern and it introduces a lot of conditional steps to our workflows 🤔 Happy to make an issue - I'll wait until we've confirmed whether this is something that does indeed make sense as a potential feature.
g
@abundant-judge-84756 we got some ideas to support this in the backend. we can share the PR with you next week.
b
Sorry has been tight up by things in the past days. So Thanks for the clarification. Now I have better understanding of your use case
The use case here is for remote testing
identical code/execution but for different domains
we’re hoping to use one as a staging ground to deploy and test new changes.
I think what you are proposing is some cost-effective way to do integration testing in PROD. 🙂 For example, currently, we have the luxury to spin off another flyte data planes with bunch of k8s GPU/nodes provisioned for staging. Once tested, the same code same execution will be rolled out to PROD. Your use case probably can enable the testing in PROD directly without additional resources cost and operational overhead. :) Looking forward to seeing the PR that Kevin mentioned as well. 👍