Hi everyone! :wave: I've been testing Flyte for a...
# flyte-support
m
Hi everyone! 👋 I've been testing Flyte for a few weeks using the sandbox locally before deploying it productively on EKS. I am working on a template that facilitates the creation of workflows and their registration in Flyte (I attach the directory architecture in the image to give the full context). The file in which the tasks and workflows are instantiated resides in
pipelines/flyte.py.
The workflows and tasks depend on custom python modules (like
steps
) as well as reading other files like configuration templates in
config/project_conf.yml
. I was able to register my workflow and get it to work properly using from the
src
directory:
pyflyte register steps pipelines
. I would like to do this programmatically using
FlyteRemote
. I have been able to register my workflow using
register_workflow(entity, version, project, domain)
, but it tells me problems finding files (like
config/project_conf.yml
itself). I think it may be related to the root that it uses by default when registering it this way, but I'm not sure how to continue to go this way. The
register_workflow
function is called when running the
main.py
file. Do you know where I am failing and how could I fix it? Let me know if further explanation or details are needed to fully understand the issue and thanks a lot in advance 😄
a
Hey Hugo, welcome You can set the
.register_script
root by including `source_path`https://docs.flyte.org/en/latest/api/flytekit/generated/flytekit.remote.remote.FlyteRemote.html#flytekit.remote.remote.FlyteRemote.register_script From the docs, seems like
copy_all
is deprecated but then you should be able to use something like `fast_package_options= {copy_style=copy}`(reference) to make it go recursively through the contents of the folder
m
Awesome David, thanks a lot for the quick response! It worked using this configuration, if it could help anyone in the same situation:
Copy code
remote = FlyteRemote(config=Config.for_sandbox(), default_project='flytesnacks', default_domain='development')

registered_script = remote.register_script(
        entity=inference_workflow,
        version="1.0.0",
        source_path=".",
        fast_package_options=FastPackageOptions([], copy_style="copy", show_files="show_files")
)
😄
a
awesome, thanks for sharing!
m
Hello again! 👋 Following in the same line, I am now trying to register a Launch Plan using the Flyte
remote
module. For this, I am using this code:
Copy code
from flytekit import LaunchPlan
from src.pipelines.flyte import inference_workflow
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config

remote = FlyteRemote(config=Config.for_sandbox(), default_project='flytesnacks', default_domain='development')

cron_lp_every_five_minutes = LaunchPlan.get_or_create(
        name="scheduled_lp",
        workflow=inference_workflow,
        schedule=CronSchedule(schedule="*/5 * * * *"), # every 5 minutes
    )

registered_launchplan = remote.register_launch_plan(
        entity=cron_lp_every_five_minutes,
        version="4.0.0"
    )
As far as I have seen in the documentation, it seems to be correct, but I get an error when registering (even including the
name
field):
Copy code
Users/me/python3.11/site-packages/flytekit/core/tracker.py:337 in _task_module_from_callable
AttributeError: 'LaunchPlan' object has no attribute '__name__'
I have tried several configurations but I can't find the problem. Any suggestions that I can try? Thanks again in advance!
a
@mammoth-parrot-74806 you may seem to be hitting this issue? https://github.com/flyteorg/flyte/issues/6062
m
Exactly @average-finland-92144, the case is the same as in the last message of the issue: a workflow previously registered using the previous code (
register_script
) to which I want to assign a new LaunchPlan in order to schedule it when I need it. It seems that by default the
register_launch_plan
is trying to re-register my workflow, but that is not the goal, but to modify an existing one. Despite using the same
name
and
version
of the already deployed workflow, the error persists. Is there a solution or alternative to schedule existing workflows or is it an ongoing issue?
Okay, I just saw the PR was merged last week and it was enough to update
flytekit
. I've managed to register the launchplan by this way:
Copy code
cron_lp_every_five_minutes = LaunchPlan.get_or_create(
        name="scheduled_lp",
        workflow=inference_workflow,
        schedule=CronSchedule(schedule="*/5 * * * *"), # every 5 minutes
        default_inputs={}
    )
remote.register_launch_plan(entity=cron_lp_every_five_minutes, version="4.0.0")
The key thing was to use the same version for the
LaunchPlan
as the one used when registering the
Workflow
with the
register_script
function. Just in case it is useful for anyone else 🙌 Thanks again for your help David!
Hi again! After testing the behaviour of Flyte locally I'm definitely into deploying it to my EKS cluster in AWS. I'm following this guide to deploy in the Single Cluster (simple mode), but I have a couple of doubts regarding the docs: 1. We already have an existing EKS Cluster, so I can omit the first steps creating the role for the cluster and pods. 2. I created a bucket in S3 for metadata purposes, let's call it
flyte-metadata-bucket
. 3. A new policy must be created to be able to access S3 as indicated here. Should the allowed S3 bucket the
flyte-metadata-bucket
or a new one that must be created to store outputs from my workflows? 4. Regarding the Roles, for both the
flyte-system-role
and the
flyte-workers-role
, apart from the
sts:AssumeRoleWithWebIdentity
permissions, the policy created in 3. should be also attached the policy created in 3., or as the bucket is related with metadata it is only needed in the
flyte-system-role
? 5. Finally, regarding the Helm chart, would it be the one in charge of creating the aforementioned Service Accounts for both the
flyte-system-role
and the
flyte-workers-role
or should I create them by myself? I have not clear as the documentations says that the Helm will take care of it but the commands mentioned there are also creating the Service Account 🤔 I am doing the creation of every resource using Terraform, I saw the tf template for the deployment but it just exists for the
flyte-core
deployment, not for the
flyte-binary
one and a few things like naming and needed resources defer between the guide and the template's code (even comparing it with the code under the
flyte-binary
tf template in this branch). Thanks a lot in advance, step by step I am nearer to have my first ML pipeline running in production! 😄