Hi ! We are looking in to Flyte for fine-tuning LL...
# ask-the-community
a
Hi ! We are looking in to Flyte for fine-tuning LLMs in our organization. I am trying to setup a simple Flyte cluster on GCP GKE following this tutorials : https://docs.flyte.org/en/latest/deployment/deployment/cloud_simple.html , but I only see helm file for eks-starter.yaml , is there a configuration for GKE ?
ok, i assume i need to use shttps://github.com/flyteorg/flyte/blob/master/charts/flyte-binary/values.yaml and configure relevant settings for GCP ?
k
ohh man this is a big miss - cc @David Espejo (he/him) / @jeev i thought we had examples of gke now? There are a few discussions in the past - https://discuss.flyte.org/t/13166006/hi-all-working-on-setting-up-flyte-on-gke-as-part-of-a-side-
Also the channel #flyte-on-gcp cc @Fabio Grätz is the area lead
a
Thanks
f
I think we should bring back this GCP getting started page which is not part of the latest docs anymore.
@Ashika UMAGILIYA can you please try to follow it regardless?
I suggest that you deploy using Nginx ingress. This means that when you reach the “certificate” section, don’t use the
Copy code
kind: ManagedCertificate
but instead install cert manager which is documented right below.
To configure auth for the nginx ingress on gcp, follow this guide here.
Please let me know if you are stuck at any step, will try to help.
a
Thank you. I managed to get the cluster up and running
f
Awesome 😄
a
any idea on how to fix the issue with the CLI ? https://flyte-org.slack.com/archives/CP2HDHKE1/p1694484098954209
f
Haven’t see this exact one. Did you configure a domain and TLS cert for your ingress?
a
No ingress, first i wanted to try with a simple setup. So I try with port-forwarding.
f
Ah
There is an`insecure` flag in the client config. Set it to true. Then it won’t try to establish a TLS connection.
For port-forwarding, this is ok
~/.flyte/config.yaml
Do you have this file?
a
yes, this error is after setting that flag to "true" .
does the CLI use gRPC ? If so SSL is must?
f
The CLI does use gRPC but SSL is not a must necessarily for this. I can confirm that I worked with port-forwardning flyteadmin before and there wasn’t any TLS involved.
What is admin.endpoint in your client config? And what your port-forwarding command?
a
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///127.0.0.1:8088
  authType: Pkce
  insecure: true
logger:
  show-source: true
  level: 0
Screen Shot 2023-09-12 at 16.58.07.png
port-forwarding.
f
Just as sanity check, can you put localhost instead of 127… in the client config?
a
yeah, first i tried that 😞
f
But you put 8088 into the client config.
Is this the http port?
Needs to be the grpc one
a
oh i see, so its 8089
thanks let me try
f
(I always used flyte-core not binary. In this case I would use port 81 which is the flyteadmin gRPC one)
a
"flyte-core not binary" >> I only see one pod
Copy code
flyte-backend-flyte-binary-5876c5745b-hhtrd   1/1     Running   0          17h
f
Yes this is correct for flyte binary.
a
with helm it only started this pod. Is there away to start the "core" pod?
f
There are two helm charts, one called flyte-core -> multiple pods, one called flyte-binary -> one pod. But there is currently no need for you to switch to the other one I’d say.
a
i see, with 8089 still having the same error
f
Just wanted to say that I’m 100% it needs to be the gRPC for the flyte-core helm chart. I would be very surprised if it was different for the flyte-binary helm chart.
try with localhost again maybe now
a
no luck 😞
f
also maybe check that port-forwarding is still running. Also you do 8089:8090, maybe check again that this is correct and not the wrong way round
😞
a
oh, new error. I think CLI to server communication works now ! Seems like some GCS permission issue. Should be able to fix this.
f
Awesome
a
Screen Shot 2023-09-12 at 17.09.41.png
thanks alot ! let me play around little bit
f
Yeah
communication works
The server needs to create a signed url for blob storage for flytekit to upload the code.
The service account of the server needs the permission to create these signed urls
a
ok got it. let me add that and try again. Thanks alot
f
Give
"iam.serviceAccounts.signBlob"
to the respective sa and it should work
a
btw, shouldn't this also SA have access/permissions to dynamically create pods/containers via the k8s api ?( to execute the task logic ) . I didn't see such permissions here : https://docs.flyte.org/en/v1.0.0/deployment/gcp/manual.html#permissions ?
f
I think these come from the k8s service account
Can you do
kubectl -n flyte get sa
?
And than
get sa <name> -o yaml
.
I think there you should see permissions to create pods
a
Copy code
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    <http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: <mailto:dev01-flyte-poc-iam@fr-stg-datalake-k8s.iam.gserviceaccount.com|dev01-flyte-poc-iam@fr-stg-datalake-k8s.iam.gserviceaccount.com>
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
  creationTimestamp: "2023-09-11T14:31:45Z"
  labels:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte-backend
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: flyte-binary
    <http://app.kubernetes.io/version|app.kubernetes.io/version>: 1.16.0
    <http://helm.sh/chart|helm.sh/chart>: flyte-binary-v1.9.1
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: {}
          f:<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: {}
        f:labels:
          .: {}
          f:<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: {}
          f:<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: {}
          f:<http://app.kubernetes.io/name|app.kubernetes.io/name>: {}
          f:<http://app.kubernetes.io/version|app.kubernetes.io/version>: {}
          f:<http://helm.sh/chart|helm.sh/chart>: {}
    manager: helm
    operation: Update
    time: "2023-09-11T14:31:45Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:<http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: {}
    manager: kubectl-annotate
    operation: Update
    time: "2023-09-11T14:32:02Z"
  name: dev01-flyte-gke-sa
  namespace: flyte
  resourceVersion: "227723"
  uid: f0b5b2bd-b98f-43d8-8f90-bdf0e0ecf66d
f
Sorry, you also need to check role and rolebinding for this service account. In role you see the permission to create pods I guess
kubectl get role <name> -o yaml
a
ok, btw these are created by the helm right ? Because i dont think i created any k8s sa or rolebindings
f
yeah helm
a
Screen Shot 2023-09-12 at 18.11.39.png
f
Tbh I wouldn’t worry about this before you notice that you start an execution but actually the pods are not appearing.
a
yes, actually the task failed.
oh sorry, its again has something to do with GCS permnissions
let me playaround with some permissions.. thank alot for the help
f
Do
kubectl get sa <service account name> -o yaml
There should be a gcp service account mentioned in the annotations.
This one needs permissions to view and create objects.
a
that SA has all the permissions (GCS Storage Admin). But I just noticed in the UI the sa / iam is shown as "default" .
f
That means the
default
kubernetes service account in the respective namespace the task pod runs in is used.
Can you check whether the default service account has the annotation with the iam workload identity -> the gcp service account it is bound to?
a
i just bounded "default" account to GCP SA using
Copy code
gcloud iam service-accounts add-iam-policy-binding <mailto:dev01-flyte-poc-iam@fr-stg-datalake-k8s.iam.gserviceaccount.com|dev01-flyte-poc-iam@fr-stg-datalake-k8s.iam.gserviceaccount.com> \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:fr-stg-datalake-k8s.svc.id.goog[flyte/default]"
but why it use "default" sa ? SA is defined in helm values right?
f
For every flyte project or flyte project/domain, there is a namespace in which the task pods run.
Each of these namespaces has a k8s service account called default.
This service account is used for tasks.
It is not the same k8s service account that is used for the flyte backend.
This is correct.
But you also need to annotate the default service account so that it uses the workload identity mapping I think
a
hmm i see, i assumed this would fix it
Copy code
kubectl annotate serviceaccount default \
    --namespace flyte \
    <http://iam.gke.io/gcp-service-account=dev01-flyte-poc-iam@fr-stg-datalake-k8s.iam.gserviceaccount.com|iam.gke.io/gcp-service-account=dev01-flyte-poc-iam@fr-stg-datalake-k8s.iam.gserviceaccount.com>
where "dev01-flyte-poc-iam" is the GCP IAM Service account
still doesn't work ! oh man ! this is harder than i imagined 😉 I thought by today i would be running a training example on a Ray cluster 😉 i'll try this tomorrow. Thanks again for the help
f
What is the current error message though?
In the UI that it can’t access gcs 403?
a
yes,, its the same
f
Ok 😕
Well, have a nice evening!
Ping me tomorrow if still stuck
a
thanks.. ill do a fresh installation tomorrow and try again
finally 🙂 "_Each of these namespaces has a k8s service account called default_." >> this really helped. Wasnt aware of that concept.
d
thanks so much @Fabio Grätz sorry for the struggles @Ashika UMAGILIYA. We're working on a GCP reference implementation and we'll make sure to apply the learnings from this thread. Any further question please let us know
f
Awesome it works now 🙂