Hi everyone, I'm exploring the installation of Fl...
# ask-the-community
d
Hi everyone, I'm exploring the installation of Flyte in a multi-cluster setup with a specific requirement: one execution cluster in the cloud and another on-premises. The cloud cluster will run both the control and data planes, while the on-premises cluster will run only the data plane (flytepropeller). Ideally, the on-prem setup would utilize MinIO, and the cloud data plane would use S3. My understanding is that FlyteAdmin generates presigned URLs for client data uploads. Is it possible to configure FlyteAdmin to direct source code distribution(tar.gz files) and other uploads to the on-premises MinIO when scheduling a workflow there? Currently, if I schedule a workflow on the on-prem execution cluster, it's unable to pull data from S3 because it's configured to use MinIO for local object storage and is missing the AWS credentials. I am using the latest
flyte-core
helm chart that I adjusted to my needs by following Multiple k8s Deployment docs
y
you can’t have both… s3 and minio. at least not without some work.
you can have different buckets, but not different endpoints
can you use minio for everything?
d
I could, I was just researching if this is possible. The problem is that if I go with the shared s3 object storage, executions on the on-prem cluster(because of it's limited network bandwith) will be slower compared to the same solution using minio. Is there a way to keep task outputs locally on the same execution cluster, or potentially in memory, and pass it to the next task?
I might just go with two separate installations of control plane, one for on prem and another for cloud
y
wait i thought on prem == minio
d
That's right. Minio on prem, s3 for cloud. I am saying that if I use s3 as object storage for on prem, it will be slower
y
not really… not the primitive data at least. you understand that distinction right?
primitive i/o like floats and strings are what we call metadata, they go into the metadata bucket (along with other things like offloaded objects admin uses).
and off-loaded data types like files and dataframes go into another bucket.
(think of this as stack and heap kinda)
so there’s a natural built-in distinction between the two buckets but not endpoints. user containers need access to both buckets. flyte itself only needs access to the metadata one.
d
Got it, thank you 🙏