Hi all, I'm having issues deploying the basic (non...
# flyte-deployment
g
Hi all, I'm having issues deploying the basic (non production) Flyte on EKS. I have been following this tutorial with partial success. TL;DR - 1. The pod wasn't able to pull this image
<http://cr.flyte.org/flyteorg/flytekit:py3.10-1.10.1b0|cr.flyte.org/flyteorg/flytekit:py3.10-1.10.1b0>
and I don't know where the list of available images. 2. I wasn't able to configure nodes with a private subnet as described in the walkthrough. I created the roles described under 01. I then moved to create a cluster. This step assumes "API server endpoint access - public and private". I'm not entirely sure what this is referring to. An EKS cluster can be configured for "public and private". But this can only be set after the cluster is created. I created a VPC with 2 public and 2 private subnets, and used them when creating the EKS cluster via the given command line. NODEGROUP - I followed the guide but the nodegroups were NOT created successfully. If I selected all the 4 subnets upon creation, the nodegroup was trying to randomly use 1 when creating nodes. The nodes created on the private subnet couldn't join the EKS cluster. I moved to using only the public subnets and the nodes were starting OK. The next steps of creating the bucket and databases went through OK. On to the deployment part - I downloaded the starter yaml (which was missing the
database.username
) and deployed Flyte. It was successful, placing 1 pod on the public node. I noticed that liveness probe failed and then noticed the configuration was wrong (in the chart itself in your repo) - specifying "http" as the port, instead of 8088. I added this configuration on my end. Lastly, I ran port forwarding (needed to run 2 instances because http and grpc are defined as separate services) and successfully accessed Flyte console and triggered a workflow. However, the pod wasn't able to schedule because it requested 2 cpu - sounds like way too much! how can it be configured per task? if at all? I increased the node size and then was able to schedule but now I'm receiving "containers with unready status: [f830d928e559a4ba7be9-n0-0]|Back-off pulling image "cr.flyte.org/flyteorg/flytekit:py3.10-1.10.1b0" error.
c
regarding images, i was pointed to this page for recent tagged versions
g
I'm not sure where this is configured though... can't seem to find it
ah, I think I know
c
> how can it be configured per task? if at all? i'm pretty sure you can do that in the k8 task resource config or task level podtemplate attributes, but i manage it using the flytectl cli. e.g:
Copy code
flytectl update task-resource-attribute --attrFile taskconfig.yaml
where
taskconfig.yaml
is:
Copy code
domain: development
project: flyte-az
defaults:
  cpu: "1"
  memory: "1Gi"
limits:
  cpu: "1"
  memory: "1Gi"
d
@Guy Arad I'm sorry about your experience here. We've been putting efforts on making the process a bit more automated. Current progress using Terraform is here (for EKS): https://github.com/davidmirror-ops/deploy-flyte/blob/main/environments/aws/README.md
Could you file an issue for the port you had to fix please?
g
Thanks @David Espejo (he/him)! I have taken a step back for a while... I will review it in a week or two.
@Chris Grass does this let me control the resources per task? this seems generic for all tasks (I'm not versed enough with Flyte yet)
@David Espejo (he/him) Did you one better - created a PR
d
@Guy Arad You cant set default resource requests and limits at the platform, project or task level. For taks, you only need to add to the decorator: For example:
Copy code
@task(requests=Resources(cpu='2', mem='6Gi'))
For projects, you'd use the
task-resource-attribute
and for platform-wide (which controls the defaults any Task can request), you can set it in your values file under `configmap`: Example
Copy code
task_resource_defaults:
    task_resources:
      defaults:
        cpu: 1000m
        memory: 1000Mi
        storage: 1000Mi
      limits:
        storage: 2000Mi
c
> You cant set default resource requests and limits at the platform, project or task level. @David Espejo (he/him) i think you mean to say you can set default resource limits at each level?
d
@Chris Grass right! sorry