Hey! My team is about to start using Flyte althou...
# ask-the-community
r
Hey! My team is about to start using Flyte although we are kind of stuck interpreting some parts of its terminology and features. Please see my questions below. My first question is regarding the hierarchy that Flyte uses to structure workflows in separate spaces. I'm trying to understand what
domains
are intended for. I guess that it is an abstract term and I can use it for whatever I want. However, all the examples I have seen so far use
domains
to separate environments (e.g.
dev
,
staging
,
prod
). If that is the main purpose then I'm confused, I really don't see the point of Flyte taking the responsibility of managing these environments. I would rather use a CI/CD tool to deploy the workflows to different Kubernetes environments that we maintain. I would like to know if I'm missing something here. The second question that I have is partly derives from the above one. Due to the high number of ML dependencies that our modules depend on I'm trying to utilize
reference workflows
in order to decouple some of those thus allowing us to build docker images with smaller sizes. It seems to be the perfect feature to achieve this, however, to define a reference workflow it is unavoidable to define the target workflow's domain. It is confusing to me because tasks and workflows become tied to a domain only when they are registered. As for reference workflows I have to pass the domain in the workflow definition, corrupting our source code with information that relates to an abstraction level that is not even present at that point. I would like to know if there is a way to register reference workflows the same transparent way as tasks and workflows, i.e. without having to pass the domain in the workflow definition, only at the time of registration. (Of course in that case no actual registration would take place, it would only verify that the target workflow exists in the specified domain.) Thanks in advance!
k
Hi Richard, Thank you for joining the community and trying out Flyte My name is Ketan and I created Flyte at lyft. Domains are option you can use one domain. So are projects. These are logical groupings. Reasons why Flyte has them, Ml / data products need lot of iteration. They need different hardware and frameworks. You want to use a common set of infra/resources across teams and experiments, and maintaining entire Flyte clusters may become challenging and does not allow for sharing like reference tasks, cache etc Thus Flyte provides projects and within each project clear separation between development and production workloads. All while Not having to change your endpoint It is possible to setup Flyte in multi cluster mode, where each project or domain etc can be a separate cluster This alleviates concerns of users in having to understand where to do what, have one endpoint portal to look at and makes the infra simpler to manage
m
I'm also interested in this conversation. What do you mean domains are optional? As far as I know there has to be at least one "default" domain explicitly set when you're building up your environment. My other concern with the above is that production and development data are stored in the same bucket. At least I haven't stumbled upon such configuration. This can raise security concerns for most. In this one cluster setup I can see problems when you're trying to utilize reference_task and reference_workflow, since you explicitly have to set the project, domain etc in the source of the higher level package you're creating. This to me becomes problematic because it's not up to deployment/runtime to specify the referenced workflow or task, but it's set at packaging time (kind of compilation time). Do I also see this wrong?
k
You can change the requirements that production uses a different bucket. Buckets for data can be set per project/domain/launchplan
And then Iam roles
Yes to set project domain at compilation time you can use env vars or get the current domain from context
Let me share some examples later - we were infact trying to add this to productionization docs
And yes one domain is default - but you can just have one
m
Thanks, looking forward to those examples
k
hi @Mike Olasz / @Richard Bellon I know I said i will do it, but I was completely off. I absolutely had no time today to do this, but will try to get to it soon. Problem is I am traveling from through through early next week. if someone else can cc @Samhita Alla / @Kevin Su / @Eduardo Apolinario (eapolinario) then please ping and you can do so
s
Here's how you can send the domain only at registration time:
Copy code
@reference_task(
    project="flytesnacks",
    domain="{{ registration.domain }}",
    name="advanced_composition.files.normalize_columns",
    version="ef84Eg5pvPJd3IFtJ-o0Bg==",
)
def normalize_columns(
    csv_url: FlyteFile,
    column_names: List[str],
    columns_to_normalize: List[str],
    output_location: str,
) -> FlyteFile:
    ...
The following are the commands I ran to package and register the code:
Copy code
pyflyte --pkgs productionizing.reference_task package -f
flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version v2
r
Thank you for your help, much appreciated! So there is a macro we can use to capture the registration domain, that indeed solves our problem. Out of curiosity, can we use macros for the project name and the other params (where it makes sense) as well? Do you have compiled list of these macros?
s
I think project, domain and version are the available macros.