Hi! I've finally managed to deploy Flyte (binary) ...
# flyte-support
m
Hi! I've finally managed to deploy Flyte (binary) on my EKS cluster. I'm trying to run now my first workflow there as I have been able to access the UI console (I have exposed the UI with traefik, let's call it
<http://my-flyte.co/console|my-flyte.co/console>
). I've modified locally the
config.yaml
to look like this:
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///my-flyte.co
  insecure: true
  authType: Pkce
logger:
  show-source: true
  level: 0
But when I run the command:
pyflyte run --remote basics/hello_world.py hello_world_wf
from the basics repo I end up with the following error:
RuntimeError: Failed to get signed url for fastfa12345.tar.gz.
Any clue of what may be happening? I've deployed it via Helm chart and I've also configured a Google Auth system to enter the exposed service (the console,
flyte-binary-http
).
a
looks like an IAM permissions error. Are you using IRSA?
m
Hey @average-finland-92144, thanks for your answer! I'm using IRSA for the backend role (
data-flyte-backend-sa-role
). Both roles (the backend and the workers one) have attached a policy to read and write over the same bucket (let's call it
flyte-metadata
). In the Helm chart, I have this statement:
Copy code
serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::12345:role/data-flyte-backend-sa-role
I think I have no IRSA for the workers, but not sure if it's needed. I attach bellow my whole Helm chart if it helps:
Copy code
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ./external-secrets.yaml
helmCharts:
  - name: flyte-binary
    version: 1.14.1
    repo: <https://flyteorg.github.io/flyte>
    releaseName: flyte-binary
    namespace: flyte
    valuesInline:
      deployment:
        extraVolumes:
          - name: flyte-db-credentials
            secret:
              secretName: flyte-db-credentials
        extraVolumeMounts:
          - name: flyte-db-credentials
            mountPath: /etc/flyte/secrets
            readOnly: true
      configuration:
        database:
          username: flyteadmin
          passwordPath: /etc/flyte/secrets/password
          host: data-flyte.blabla.region.rds.amazonaws.com
          port: 5432
          dbname: flyteadmin
        storage:
          metadataContainer: flyte-metadata
          userDataContainer: test-data-lake
          provider: s3
          providerConfig:
            s3:
              region: "region"
              authType: "iam"
        inline:
          cluster_resources:
            custom_data:
            - production:
              - defaultIamRole:
                  value: arn:aws:iam::12345:role/data-flyte-default-sa-role
            - staging:
              - defaultIamRole:
                  value: arn:aws:iam::12345:role/data-flyte-default-sa-role
            - development:
              - defaultIamRole:
                  value: arn:aws:iam::12345:role/data-flyte-default-sa-role
          task_resources:
            defaults:
              cpu: 500m
              memory: 500Mi
              storage: 500Mi
          plugins:
            k8s:
              inject-finalyzer: true
              default-env-vars:
                - AWS_METADATA_SERVICE_TIMEOUT: 5
                - AWS_METADATA_SERVICE_NUM_ATTEMPTS: 20
          storage:
            cache:
              max_size_mbs: 100
              target_gc_percent: 100
        serviceAccount:
          create: true
          annotations:
            eks.amazonaws.com/role-arn: arn:aws:iam::12345:role/data-flyte-backend-sa-role
Do you see anything wrong? Let me re-say I have also configured a Google Oauth system agains the port 8088. To access the console I have to log myself with my google account, not sure if it could be causing any issue when trying to connect remotely:
Copy code
# Flyte
    - match: Host(`my-flyte-cluster.me`)
      kind: Rule
      priority: 90
      middlewares:
        - name: traefik-forward-auth
          namespace: traefik
        - name: gzip
          namespace: traefik
        - name: add-platform-info
      services:
        - name: flyte-binary-http
          namespace: flyte
          port: 8088
a
so if you do a
kubectl describe sa default -n flytesnacks-development
is it annotated with the role?
m
Yes, it is. I've figured out a couple of things that are not working and can explain a bit the situation: 1. When setting the endpoint (
<http://my-flyte-cluster.me|my-flyte-cluster.me>
) in the
config.yaml
, it seems to be mandatory to use a GRPC endpoint, but my K8s is only supporting HTTPS. Is it supported to use HTTPS instead of GRPC as endpoint? 2. Btw, I finally achieved running a workflow by routing the K8s port to my machine which supports HTTPS connections. I ended up with this error:
from <s3://flyte-metadata/flytesnacks/development/B5LQ==/fastfa1234.tar.gz> to ./ (recursive=False). Original exception: Forbidden
. It's pretty obvious it has to deal again with policies and attached role to the workers, because when I see the pod in charge of running it, the created role is not associate to it.
Copy code
cluster_resources:
            customData:
            - production:
              - defaultIamRole:
                  value: arn:aws:iam::12345:role/data-flyte-default-sa-role
            - staging:
              - defaultIamRole:
                  value: arn:aws:iam::12345:role/data-flyte-default-sa-role
            - development:
              - defaultIamRole:
                  value: arn:aws:iam::12345:role/data-flyte-default-sa-role
Why this roles are not associate it? Should I create a new serviceAccount for the workers? In the docs I checked the
ServiceAccount
section but it is only used to generate the SA for the backend... Thanks in advance for your help @average-finland-92144!
a
Is it supported to use HTTPS instead of GRPC as endpoint?
even if you put everything into a single K8s service, your Ingress controller needs to support gRPC for the flytekit client communication with admin. There are some notes on Traefik configuration in this issue
Why this roles are not associate it?
Do you have a section like this on your Helm values?
Copy code
configuration:
  clusterResourceTemplates:
    inline:
      001_namespace.yaml: |
        apiVersion: v1
        kind: Namespace
        metadata:
          name: '{{ namespace }}'
      002_serviceaccount.yaml: |
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: default
          namespace: '{{ namespace }}'
          annotations:
            <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: '{{ defaultIamRole }}'
That template will auto annotate the
default
SA (the one the workers use by default) with the corresponding IAM role
m
Do you have a section like this on your Helm values?
```configuration:
clusterResourceTemplates:
inline:
001_namespace.yaml: |
apiVersion: v1
kind: Namespace
metadata:
name: '{{ namespace }}'
002_serviceaccount.yaml: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: '{{ namespace }}'
annotations:
eks.amazonaws.com/role-arn: '{{ defaultIamRole }}'```
That template will auto annotate the
default
SA (the one the workers use by default) with the corresponding IAM role
Yes I do, but
clusterResourceTemplates
is not at the same level as
configuration
? At least it looks like this in the
values.yaml
file here and also in this

demo

. Regarding the worker's role, I am using this:
Copy code
module "flyte_irsa_default_role" {
  source    = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version   = "5.34.0"
  role_name = "${local.cluster_names[0]}-flyte-default-sa-role"

  role_policy_arns = {
    s3_metadata = aws_iam_policy.read_write_flyte_metadata_bucket.arn
  }

  oidc_providers = {
    ex = {
      provider_arn               = module.eks.eks.oidc_provider_arn
      namespace_service_accounts = ["*:default"]
    }
  }

  tags = {
    Name        = "${local.cluster_names[0]}-flyte-default-sa-role"
    Environment = var.environment
    Owner       = "data"
  }
}
The attached policy is the same as the one used for the backend, and that one is actually working so I'm kind of lost 😕
a
Yes I do, but
clusterResourceTemplates
is not at the same level as
configuration
? At least it looks like this in the
values.yaml
file here and also in this

demo

.
You're right And is the
default
SA annotated with the IAM role? Like if you do a
kubectl describe sa default -n flytesnacks-development
what do you see?
m
Hey @average-finland-92144, sorry for the late response, I was digging deeper to see what was going on. It seems to be a TF issue, it is not interpreting the wildcard to assign the role to every possible namespace but just for the one called
*
(xd). I finally managed to run successfully a workflow 🥹 Now I would have to play a bit with authentication methods, as I am forced to use Google OAuth when exposing my service through Traefik, and not sure if it is natively supported or should I find a workaround...
a
Oh but are you using the Terraform in the
deploy-flyte
repo? Also yes, Google Oauth is supported, should work regardless of the ingress controller in most cases