mammoth-parrot-74806
01/16/2025, 2:55 PMpipelines/flyte.py.
The workflows and tasks depend on custom python modules (like steps
) as well as reading other files like configuration templates in config/project_conf.yml
.
I was able to register my workflow and get it to work properly using from the src
directory: pyflyte register steps pipelines
.
I would like to do this programmatically using FlyteRemote
. I have been able to register my workflow using register_workflow(entity, version, project, domain)
, but it tells me problems finding files (like config/project_conf.yml
itself). I think it may be related to the root that it uses by default when registering it this way, but I'm not sure how to continue to go this way. The register_workflow
function is called when running the main.py
file. Do you know where I am failing and how could I fix it? Let me know if further explanation or details are needed to fully understand the issue and thanks a lot in advance 😄average-finland-92144
01/16/2025, 3:06 PM.register_script
root by including `source_path`https://docs.flyte.org/en/latest/api/flytekit/generated/flytekit.remote.remote.FlyteRemote.html#flytekit.remote.remote.FlyteRemote.register_script
From the docs, seems like copy_all
is deprecated but then you should be able to use something like `fast_package_options= {copy_style=copy}`(reference) to make it go recursively through the contents of the foldermammoth-parrot-74806
01/16/2025, 4:25 PMremote = FlyteRemote(config=Config.for_sandbox(), default_project='flytesnacks', default_domain='development')
registered_script = remote.register_script(
entity=inference_workflow,
version="1.0.0",
source_path=".",
fast_package_options=FastPackageOptions([], copy_style="copy", show_files="show_files")
)
😄average-finland-92144
01/16/2025, 4:43 PMmammoth-parrot-74806
01/17/2025, 10:07 AMremote
module. For this, I am using this code:
from flytekit import LaunchPlan
from src.pipelines.flyte import inference_workflow
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config
remote = FlyteRemote(config=Config.for_sandbox(), default_project='flytesnacks', default_domain='development')
cron_lp_every_five_minutes = LaunchPlan.get_or_create(
name="scheduled_lp",
workflow=inference_workflow,
schedule=CronSchedule(schedule="*/5 * * * *"), # every 5 minutes
)
registered_launchplan = remote.register_launch_plan(
entity=cron_lp_every_five_minutes,
version="4.0.0"
)
As far as I have seen in the documentation, it seems to be correct, but I get an error when registering (even including the name
field):
Users/me/python3.11/site-packages/flytekit/core/tracker.py:337 in _task_module_from_callable
AttributeError: 'LaunchPlan' object has no attribute '__name__'
I have tried several configurations but I can't find the problem. Any suggestions that I can try?
Thanks again in advance!average-finland-92144
01/17/2025, 7:39 PMmammoth-parrot-74806
01/20/2025, 9:30 AMregister_script
) to which I want to assign a new LaunchPlan in order to schedule it when I need it.
It seems that by default the register_launch_plan
is trying to re-register my workflow, but that is not the goal, but to modify an existing one. Despite using the same name
and version
of the already deployed workflow, the error persists.
Is there a solution or alternative to schedule existing workflows or is it an ongoing issue?mammoth-parrot-74806
01/20/2025, 10:08 AMflytekit
. I've managed to register the launchplan by this way:
cron_lp_every_five_minutes = LaunchPlan.get_or_create(
name="scheduled_lp",
workflow=inference_workflow,
schedule=CronSchedule(schedule="*/5 * * * *"), # every 5 minutes
default_inputs={}
)
remote.register_launch_plan(entity=cron_lp_every_five_minutes, version="4.0.0")
The key thing was to use the same version for the LaunchPlan
as the one used when registering the Workflow
with the register_script
function. Just in case it is useful for anyone else 🙌
Thanks again for your help David!mammoth-parrot-74806
01/22/2025, 12:10 PMflyte-metadata-bucket
.
3. A new policy must be created to be able to access S3 as indicated here. Should the allowed S3 bucket the flyte-metadata-bucket
or a new one that must be created to store outputs from my workflows?
4. Regarding the Roles, for both the flyte-system-role
and the flyte-workers-role
, apart from the sts:AssumeRoleWithWebIdentity
permissions, the policy created in 3. should be also attached the policy created in 3., or as the bucket is related with metadata it is only needed in the flyte-system-role
?
5. Finally, regarding the Helm chart, would it be the one in charge of creating the aforementioned Service Accounts for both the flyte-system-role
and the flyte-workers-role
or should I create them by myself? I have not clear as the documentations says that the Helm will take care of it but the commands mentioned there are also creating the Service Account 🤔
I am doing the creation of every resource using Terraform, I saw the tf template for the deployment but it just exists for the flyte-core
deployment, not for the flyte-binary
one and a few things like naming and needed resources defer between the guide and the template's code (even comparing it with the code under the flyte-binary
tf template in this branch).
Thanks a lot in advance, step by step I am nearer to have my first ML pipeline running in production! 😄