Terence Kent
11/04/2023, 4:20 AMSamhita Alla
Terence Kent
11/06/2023, 7:54 AMflyte-ws-{whatever}
.flyte-{server-side-project-name}/
README.md
domains/ <-- code/tools for per-namespace resources.
development/
sync-resources.sh
do-some-admin-thing.sh
rsrcs/
default-pod-template.yaml
something-else.yaml
staging/
production/
workflowsets/ <-- projects created with pyflyte init
register-all.sh
app-x-workflows/
model-y-workflows/
Here, the per-project+domain resources are managed in code alongside the workflows. And, we assume two things: it makes sense for workflow sets to be bound to a single project and a project should expect multiple workflow sets. This also promotes a master-branch-only development style for workflows due to layout.
Another approach is to lay code out like like above, but without the workflowsets
directory. Instead, each workflow set gets it's own repository (maybe a naming convention would be flyte-ws-{purpose}
). The assumption being made in this case is that workflows are not bound to a project (just like software projects are not bound to a deployment). Here, the workflow sets get branches and typical development flows.
The second approach feels similar to what is implied in the flytesnacks example directory, but that's a special case and may not be a good reference for this.
The last option we've considered is to assume a server-side project should have a single workflow set. In this case, the project structure would look like this.
flyte-{server-side-project-name}/
README.md
Dockerfile
domains/
development/
sync-all.sh
staging/
production/
workflows/
some-workflow.py
This promotes the creation of more server-side projects and moves back to the master-only branching approach.Samhita Alla
workflowsets
directory.
This again depends on what you want to accomplish. If there are multiple people/teams working on workflow sets in a single project, it may make sense to have them in separate repositories.Terence Kent
11/07/2023, 5:47 PMSamhita Alla
I’m surprised to hear that folks don’t usually manage the backend config (project+domain) resources in code.May be some do, but I haven't come across any.
should we not be managing resources per-namespace as we've been doing? Is there an easier path?I don't think so. @Yee, would like to know your thoughts.
Yee
Terence Kent
11/08/2023, 8:06 PMi’m confused, apologies. what’s a workflow set?That is a new term to this thread! See the starting comment.
are you guys on a mono repo?We are debating repo scope here. Our current thinking is a repo per flyte server-side project.
typically people use different flyte projects for different teams.We are leaning that way now. Initially, we wanted to scope flyte projects per purpose (
productx-microbatching
, productx-etl
, modely-training
, modely-validation
). But that seems to go against how projects are intended to be used.
the “code/tools for per-namespace resources” what are those? could you give us more examples of what these are?Sure thing. Say we have 3 types of k8s resources we need to load into a project+domain namespace (podtemplates, service-accounts, secrets). We may have the podtemplates + service-accounts stored in code as YAML manifests. A tool would be a
sync-all.sh
that applies the values to namespace (typical IaC flow).
Then, say the secret has a credential that needs to be rotated every N days. We may have some rotate-x-secret.sh
script sthat grabs the credential from a secret store, waits until all pods are terminated, then updates the secret.
So, it's things like that.
some users use the cluster resource controller to create k8s resources on the cluster.That's not an obvious one to me. I've seen how using that mechanism can allow us to set certain-per task or per-project settings easily (esp. around resources), but I didn't notice anywhere I could define the resources pods need to reference (service accounts, secrets, etc).
with respect to the initial question btw, I think lots of people do do one repo = one flyte project.Great, that hints that our last proposed example may be the one that matches Flytes usage model the best.
Yee
Terence Kent
11/09/2023, 1:33 AMYee
Terence Kent
11/09/2023, 4:31 AMYee
Samhita Alla
Terence Kent
11/09/2023, 4:11 PMpyflyte
) per server-side project. That means it's probably going to be the use case considered as the project evolves.
2. Managing boilerplate resources in each project+domain namespace is best to do with the clusterresource-template
feature (this guy).
That's very helpful! The only left over thing is the language to disambiguate between the two things both called a "flyte project".