:wave: I have a question about the server-side pro...
# ask-the-community
t
đź‘‹ I have a question about the server-side project resources (created using flytectl) and the code projects referred to in the documentation (this guy). They are both described as "flyte projects", but are very different things. Are there more specific names for these two things? To avoid confusion, I've been calling the code projects "workflow sets" but if there is a more accepted name, I'd love to use that instead.
s
I don't think we have one but I like "workflow set"!
t
Interesting! Well, FWIW, we've taken to naming our "workflow set" repositories
flyte-ws-{whatever}
.
@Samhita Alla following on that topic, we've been going back and forth on how to organize our Flyte code. How to think of projects is at the center of it. We can see things easily ending up hard to manage if we go down the wrong path. I think an example the easiest way to communicate different choices. Say we organize our local code ("workflow sets") by server-side project. That might look like this.
Copy code
flyte-{server-side-project-name}/
  README.md
  domains/                <-- code/tools for per-namespace resources.
    development/
       sync-resources.sh
       do-some-admin-thing.sh
       rsrcs/
          default-pod-template.yaml
          something-else.yaml
    staging/
    production/
  workflowsets/          <-- projects created with pyflyte init
    register-all.sh
    app-x-workflows/
    model-y-workflows/
Here, the per-project+domain resources are managed in code alongside the workflows. And, we assume two things: it makes sense for workflow sets to be bound to a single project and a project should expect multiple workflow sets. This also promotes a master-branch-only development style for workflows due to layout. Another approach is to lay code out like like above, but without the
workflowsets
directory. Instead, each workflow set gets it's own repository (maybe a naming convention would be
flyte-ws-{purpose}
). The assumption being made in this case is that workflows are not bound to a project (just like software projects are not bound to a deployment). Here, the workflow sets get branches and typical development flows. The second approach feels similar to what is implied in the flytesnacks example directory, but that's a special case and may not be a good reference for this. The last option we've considered is to assume a server-side project should have a single workflow set. In this case, the project structure would look like this.
Copy code
flyte-{server-side-project-name}/
  README.md
  Dockerfile
  domains/
    development/
       sync-all.sh
    staging/
    production/
  workflows/
     some-workflow.py
This promotes the creation of more server-side projects and moves back to the master-only branching approach.
All that back story leads to a short question: What is the intended path for mapping local code projects (containing workflow sets) to flyte server-side projects? Is there some documentation about best practices (or at least the intentions) there?
s
I don't think we have best practices documented around this topic as it depends on the use-case under consideration and our users' preferences as well. From what I've seen, configurations related to projects/domains aren't managed in code, but I like the way you're organizing the backend config and your Flyte projects aka workflow sets. It seems reasonable to have a server-side project with config and all your workflow sets. > And, we assume two things: it makes sense for workflow sets to be bound to a single project and a project should expect multiple workflow sets. +1 Each server-side project can pertain to a directory with the relevant config and workflows, and you can have multiple directories to manage your projects. You may probably benefit from having a cookiecutter template to kick this off. > Another approach is to lay code out like like above, but without the
workflowsets
directory. This again depends on what you want to accomplish. If there are multiple people/teams working on workflow sets in a single project, it may make sense to have them in separate repositories.
t
That’s very helpful, thank you! I’m surprised to hear that folks don’t usually manage the backend config (project+domain) resources in code. We’re finding that we start out needing at least 3-4 k8s resources in each project namespace (podtemplates, service accounts, and secrets). Since it's 3x namespaces per project, even with only a handful of projects you get quite a few. This brings up a follow up question - should we not be managing resources per-namespace as we've been doing? Is there an easier path?
s
I’m surprised to hear that folks don’t usually manage the backend config (project+domain) resources in code.
May be some do, but I haven't come across any.
should we not be managing resources per-namespace as we've been doing? Is there an easier path?
I don't think so. @Yee, would like to know your thoughts.
y
i’m confused, apologies. what’s a workflow set?
are you guys on a mono repo? typically people use different flyte projects for different teams.
some users use the cluster resource controller to create k8s resources on the cluster.
the “code/tools for per-namespace resources” what are those? could you give us more examples of what these are?
the cluster resource controller is good at creating and updating k8s resources by project/domain, but I don’t think it has the capability to not run something for certain combinations of project/domain.
with respect to the initial question btw, I think lots of people do do one repo = one flyte project.
t
i’m confused, apologies. what’s a workflow set?
That is a new term to this thread! See the starting comment.
are you guys on a mono repo?
We are debating repo scope here. Our current thinking is a repo per flyte server-side project.
typically people use different flyte projects for different teams.
We are leaning that way now. Initially, we wanted to scope flyte projects per purpose (
productx-microbatching
,
productx-etl
,
modely-training
,
modely-validation
). But that seems to go against how projects are intended to be used.
the “code/tools for per-namespace resources” what are those? could you give us more examples of what these are?
Sure thing. Say we have 3 types of k8s resources we need to load into a project+domain namespace (podtemplates, service-accounts, secrets). We may have the podtemplates + service-accounts stored in code as YAML manifests. A tool would be a
sync-all.sh
that applies the values to namespace (typical IaC flow). Then, say the secret has a credential that needs to be rotated every N days. We may have some
rotate-x-secret.sh
script sthat grabs the credential from a secret store, waits until all pods are terminated, then updates the secret. So, it's things like that.
some users use the cluster resource controller to create k8s resources on the cluster.
That's not an obvious one to me. I've seen how using that mechanism can allow us to set certain-per task or per-project settings easily (esp. around resources), but I didn't notice anywhere I could define the resources pods need to reference (service accounts, secrets, etc).
with respect to the initial question btw, I think lots of people do do one repo = one flyte project.
Great, that hints that our last proposed example may be the one that matches Flytes usage model the best.
t
Can you be more specific about where that would help out for managing per-project resources? I want to be sure I'm thinking about it correctly. I think that will allow me to define a series of template resources to be loaded into each project+domain namespace on creation. If so, for resources that are immutable-after-initialization and the same across all projects+domains, that makes total sense. We don't happen to have any of those ATM, but still interesting.
y
these are customizable by project/domain though
t
ah, I can see that - it has per-domain values that are also specified in that chart. That might be helpful for resources that are pretty similar between projects and only need a few variables customized. What happens when you add/remove/change one of these templates or values? Does it sync resources in the various project+domain namespaces?
y
it does.
sorry was slammed today.
maybe samhita can give you the details?
if not i’ll respond tomorrow, so sorry
s
@Terence Kent you just need to update the helm chart and it should sync resources. Have you tried that?
t
@Samhita Alla - no, haven't used that feature at all, actually. Was just curious about it since @Yee had brought it up.
Trying to sum up this thread (so it doesn't end up too off-topic): 1. The general convention is a single client-side project (created with
pyflyte
) per server-side project. That means it's probably going to be the use case considered as the project evolves. 2. Managing boilerplate resources in each project+domain namespace is best to do with the
clusterresource-template
feature (this guy). That's very helpful! The only left over thing is the language to disambiguate between the two things both called a "flyte project".