Hi, could you help me with AWS Batch job setup? - ...
# flyte-support
m
Hi, could you help me with AWS Batch job setup? • I have flyte deployed into AWS/EKS using deploy-flyte TF • I updated my configuration with https://www.union.ai/docs/v1/flyte/deployment/flyte-plugins/batch/ so that the AWS batch plugin is configured in the chart (I'm not sure if I entered all values correctly, since the documentaiton is not very descriptive). For example, what should be roleNameKey value here? I added the value of flyte_backend_irsa_role that was set up with deploy-flyte • the confuguration for flyteadmin and flytepropeller from the page above already includes job queue names that should be used • Now I'm trying the example of https://www.union.ai/docs/v1/flyte/integrations/external-service-backend-plugins/aws-batch-plugin/batch/ Where there is only AWSBatch configuration added as task_config (I did not add any parameters nor platformCapabilities as I was hoping some good defaults would be used Now I registered the flow in flyte and I am told I should execute it from UI. UI asks me for Role ARN and Kubernetes service account. Do I need to provide these when I added them to the server configuration in the plugin setup?
It seems that the region is set incorrectly, I'm getting
Copy code
{
  "json": {
    "exec_id": "aqjx7k4klv8kgwd9mcbx",
    "ns": "jiri-test-development",
    "res_ver": "459661272",
    "routine": "worker-32",
    "wf": "jiri-test:development:aws_batch_example.my_wf"
  },
  "level": "error",
  "msg": "Error when trying to reconcile workflow. Error [failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [aws_array]: AccessDeniedException: User: arn:aws:sts::999548594534:assumed-role/flyte-dev-backend-role/1757078478030632749 is not authorized to perform: batch:RegisterJobDefinition on resource: arn:aws:batch:us-east-2:999548594534:job-definition/flytekit\n\tstatus code: 403, request id: 4190a36d-4da6-47a4-be0e-4721d52ecb9a]. Error Type[*errors.NodeErrorWithCause]",
  "ts": "2025-09-05T13:23:27Z"
}
while I have the job queues configured for us-east-1, this complains about us-east-2 Docs says this about plugin config and region: https://www.union.ai/docs/v1/flyte/deployment/flyte-plugins/batch/#update-flytepropellers-configuration It's shows
Copy code
plugins:
    aws:
      batch:
        # Must match that set in flyteAdmin's configMap flyteadmin.roleNameKey
        roleAnnotationKey: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>
      # Must match the desired region to launch these tasks.
      region: us-east-2
In the snippet. However I have put this under "enabled_plugins"
Copy code
configmap:  
  enabled_plugins:
    aws:
      batch:
        roleAnnotationKey: "{{ .Values.userSettings.backendIAMRole }}"
      region: us-east-1
Because there was no "plugins:" section. Is this a problem? That it should be under "configmap.plugins"? It's kind of confusing
f
i think the external service backend plugin for batch is not used anymore
we prefer connectors
thats probably bad in the docs
m
Oh. But is there a connector for AWS Batch? I don't see any under https://www.union.ai/docs/v1/flyte/integrations/#connectors. There's Sagemaker, but documentation how to set it up is even shorter
f
I am sorry I was mistaken I thought some wrote one. We have the genetic boot base, this should make it trivial to write one
m
Sorry, I just want to know how can I run workflows remotely on AWS Batch. Is it not possible with AWSBatchConfig and current AWS Batch plugin? It seems that these parts were not removed from the code so they should provide something right? I think my configuration is just a bit off, as I'm getting errors
flyte-dev-backend-role/1757078478030632749 is not authorized to perform: batch:RegisterJobDefinition on resource: arn:aws:batch:us-east-2
So it seems almost correct, it just needs to realize correct region: I have tried to set up us-east-1 through the configuration snippet I shared above. Is there something missing?
Or do I need to use this boto connector? But how should I configur flyte to use it? I do not find much information in the documentation...
f
it should be possible, but AWS Batch backend plugin is not in a maintained state - IMO. We prefer external service connections using connectors now. May I ask what is the usecase to use Batch instead of k8s native?
m
That's more of an architecture decision in the company. For example right now we are able to use GPU-enabled machine with AWS Batch, but not on Kubernetes... 🤷🏻
😢 1
f
Interesting
Yes then connectors would be the best way - let me talk to @glamorous-carpet-83516 and some other maintainers about the status of AWS batch backend plugin
👍 1
m
FYI I found out the issue with my configuration. For helm chart, I added
Copy code
enabled_plugins:
    # -- Tasks specific configuration [structure](<https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig>)
    plugins:
      aws:
        batch:
          roleAnnotationKey: "{{ .Values.userSettings.backendIAMRole }}"
        region: "{{ .Values.userSettings.accountRegion }}"
"plugins" section into the "enabled_plugins" and now the region is set up correctly. This was quite unclear from documentation