https://flyte.org logo
#ask-the-community
Title
# ask-the-community
d

Durgai Vel

09/11/2023, 2:48 PM
Hi all, I need help in setting up a single cluster multi node environment, and also some help with the tasks scheduling between nodes for the above specified environment(like optionally specifing node values for tasks, so it can be executed on that node), kindly help with the docs....I have searched through the
<http://docs.flyte.org|docs.flyte.org>
, I could only find Multi Cluster or SingleCluster Single Node examples... If I already have a kubernetes cluster with multi node running, can I just start the
flyte sandbox
in the control plane node, if so will the flyte schedule the tasks on different nodes based on resource availability. Thanks in advance, :)
d

David Espejo (he/him)

09/11/2023, 5:52 PM
Hi @Durgai Vel Is your K8s environment on the cloud or is it on-premises?
like optionally specifing node values for tasks, so it can be executed on that node
You can make use of PodTemplates, adding the
nodeAffinity
(K8s docs) config you want there and then calling the PodTemplate from your Task(s). If your use case is about nodes with GPUs, then using taints and tolerations might be better
d

Durgai Vel

09/11/2023, 5:57 PM
My k8s environment is on-premise, with two running nodes on the same network. In my case both nodes has GPUs but how should I deploy the flyte sandbox on a running k8s cluster, and what flyte services should be deployed on control plane node and what services on worker nodes. Is there any documentation for this case? What are taints and tolerations can you please expand on those! @David Espejo (he/him)
d

David Espejo (he/him)

09/11/2023, 6:03 PM
Sure, what K8s distribution are you using? There's a tutorial for on-premise K8s deployment that uses K3s: https://github.com/davidmirror-ops/flyte-the-hard-way#flyte-on-local-kubernetes
so do you plan to run workflows on both nodes?
d

Durgai Vel

09/11/2023, 6:16 PM
I was planning to use kubeadm to deploy cluster, but if k3s is good I can move to it. By distribution if you mean version, I guess it’s 1.28 something, not sure. I am planning to execute parallel tasks on different nodes, by parallel I mean from the graph where two different and independent tasks in a workflow being executed like scheduling tasks on different nodes. I hope I’m not confusing, if so please let me know @David Espejo (he/him)
d

David Espejo (he/him)

09/11/2023, 6:31 PM
no, it's fine, thanks Durgai. By distribution I mean the tool/method you're using to deploy K8s. K3s is lightweight compared with the result of using kubeadm, but it's far easier to get started. If you plan to use GPUs on both nodes I don't think you need to use tolerations at all, that's only when you have non-GPU nodes in the cluster and you want to have control on Pod placement. Also regarding Flyte, in this case you can have the control plane and data plane running on both, Flyte will rely on the K8s scheduler to assign nodes to Pods. This recent Flyte School episode describes the architecture:

https://www.youtube.com/watch?v=EQSHqtlTXwM

d

Durgai Vel

09/12/2023, 2:38 PM
@David Espejo (he/him), from the docs you've mentioned above, I read through it and found that https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/README.md#part-ii-multipler-worker-nodes-and-ingress-coming-soon. So I think this is what I am searching for. Does flyte schedule tasks on worker nodes from the master node itself or should it be deployed on worker nodes too. Any updates on this would be more helpful. Also the docs for starting the control plane and data plane in a running k3s cluster from the above repo point to certain files which was removed. eg: https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/on-premises/002-install-local-flyte.md#configure-dependencies-and-install-flyte , from the point
2. Review the manifest located here
, the
here
points to a removed file location... Is there any updated docs or guide on how to start the flyte kit with all of its components like
flyteadmin, propeller, UI, minio, kubernetes-dashboard, docker registery
manually or using any
yaml file
and
kubectl
. I had a lot of issues in installing flyte on my local cluster by following the docs above, in need of help.
d

David Espejo (he/him)

09/12/2023, 4:06 PM
@Durgai Vel sorry, just fixed the broken link and change a bit the steps to simplify: https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/on-premises/002-install-local-flyte.md Please let me know your experience.
Does flyte schedule tasks on worker nodes from the master node itself or should it be deployed on worker nodes too
I think this division of "master vs worker nodes" is more meaningful on K8s than on Flyte. Typically, a K8s master node is configured to prevent workloads to run there. With Flyte, a single or multi node cluster is treated both as control and data plane, meaning, workflows can run in all nodes (unlesss you need something different). Conversely, the "control and data plane" dialogue in Flyte is meaningful if you have multiple Kubernetes clusters (which doesn't seem to be your case) where you have control plane and data plane nodes but even there, you could run workflows in both. Not sure if this is completely clear for you, otherwise happy to have a call
d

Durgai Vel

09/13/2023, 6:26 AM
1. @David Espejo (he/him), also there is a broken link in
6. Install Flyte using the values file provided here:
, kindly fix this too.. 2. Would be more helpful if there is any straightforward yaml file, like the postgre and minio deploying yaml files, for flyte's admin, propeller, registery etc. 3. From your statement I think, from a single cluster multi node point, all the flyte deployments should be deployed in the master node, and the workloads will be scheduled and executed on the worker nodes (by flyte scheduler or kubernetes scheduler). If this is correct, thanks for the clarification, I'll let you know if I have any further doubts.
d

David Espejo (he/him)

09/13/2023, 2:26 PM
Hi Durgai 1. Links fixed, please check again. 2. Well, that would be a lot of YAML. That's why Flyte is distributed as a Helm package (chart) that will not only handle the YAML rendering but will centralize the resource management lifecycle (I mean, uninstalling Flyte would be a matter of just doing
helm uninstall...
instead of deleting resources manually, If you need YAML files you could render the Helm chart locally, but that's not typically needed 3. In a single Flyte cluster with multiple nodes, all nodes can be used to run workloads. If you have two servers with GPUs, I guess you want to use both to run workloads and that's the default behavior. Node placement is controlled by the K8s scheduler by default, unless you need something else. If you want to use your two nodes you just need to make sure that the K8s master node doesn't have a taint that prevents it from running Pods (this is the case by default with K3s). As long that's in place, Flyte will submit executions to the K8s API who will, in turn, pick the node where they will run as Pods. Propeller is a controller that helps K8s to translate a DAG definition to actual K8s resources and manages the execution lifecycle, but doesn't control node placement.
d

Durgai Vel

09/13/2023, 5:19 PM
@David Espejo (he/him) From the yaml file, for database host, we were using a host name instead of an ip, should that too be added in the /etc/hosts, and also we haven’t specified any port value for the db host , in https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/on-premises/local-values.yaml line number 4, just asking out of curiosity, Also when using flytectl demo start, the flyte propeller, flyte admin, flyte docker registry etc. will get deployed as a separate pods, in this case you’ve only specified one pod named flyte-binary. What are the services it contains can you specify them individually, it would be more helpful for our development. Thanks in advance…
d

David Espejo (he/him)

09/13/2023, 5:48 PM
no. the
host
here is in the format where the Flyte pod will use K8s service discovery to contact the
postgres
service in the
flyte
namespace inside the same cluster. So it's not something resolvable from your machine
d

Durgai Vel

09/13/2023, 6:26 PM
Ok, and what about the second question, regarding the services in flyte-binary pod. @David Espejo (he/him)
d

David Espejo (he/him)

09/13/2023, 7:40 PM
Also when using flytectl demo start, the flyte propeller, flyte admin, flyte docker registry etc. will get deployed as a separate pods
flyte-binary
package all those services on, err, a single binary (single Pod). The
flyte-core
Helm chart deploys separate Pods but it's more geared towards multi K8s cluster deployments.
d

Durgai Vel

09/14/2023, 6:23 AM
@David Espejo (he/him), 1. From the readme you specified, in
5. save DB password in secret
, I haven't manually set any DB password to postgres DB also couldn't find any password field in the local-flyte-resources.yaml , so what should I specify in the secret? 2. At what port will the flyte docker registry be running, as of
flytectl demo start
the registry will be running on
port: 30000
, since now all the services were handled within a single pod. 3. Also in our case, by starting the cluster using k3s and then using helm to deploy flyte-binary, when I stopped the k3s cluster using
k3s-uninstall.sh
and then again tried to start the cluster from the initial step, I couldn't find the registered tasks on my flyte UI, but that is not the case with
flytectl demo teardown
. I can get the previously registered workflows by starting the flytectl demo cluster. So is it the expected behaviour or am I making any mistake (does it relate to postgres db connection, since I haven't specified any password when creating the secret)?
@David Espejo (he/him) Any updates?
d

David Espejo (he/him)

09/15/2023, 5:19 PM
Hi Durgai, for some reason I missed this thread. 1. You're right. Wile the password can be stored on a Secret, it's actually not being used due to this: https://github.com/davidmirror-ops/flyte-the-hard-way/blob/0ed5ee8260f4d56be2ee2f03f7c89d36d9d62b9c/manifests/local-flyte-resources.yaml#L79 which came straight from the
flyte-deps
chart. It allows local connections without requiring a password. Not great. Would you mind filing an issue for this? I'll need to test the alternative, more secure options for this. If you also want to help, that'd be awesome 2. I don't think flyte-binary ships with a Docker registry 3. I think this is expected behavior. If you uninstall the K3s cluster that holds Flyte and its dependencies (including DB), it won't be able to retrieve any pre-existing object. Sandbox is different because (not 100% sure of this) if you
teardown
the cluster, the Docker daemon may still retain the volumes and then a new cluster will reuse those volumes. On the tutorial the DB survives Pod deletions/crashes but not a K3s cluster deletion, because even the PersistentVolumes will go away. We can find ways to persist that data outside the lifecycle of the K3s cluster, it just takes a bit more time and still won't meet every possible on-prem scenario but, feel free to file an issue I hope some of this is helpful. You can still use the sandbox on the two nodes, but that would be two separate clusters. Sandbox is not prepared for multicluster ops
d

Durgai Vel

09/27/2023, 8:20 AM
@David Espejo (he/him) raised a pull request to update the auth method for postgres from trust to password authentication here , kindly review.. also sorry for the delay.
3 Views