hi all :wave:, I just wanted to ping the community...
# announcements
n
hi all 👋, I just wanted to ping the community to ask a quick question 🤔: Say you have workflow that uses a trained model to generate predictions on a scheduled launchplan. The question is, how do you typically want to get features for that prediction? I.e. when the scheduled workflow kicks off, where are you reading those features from? Do you need the kick-off time as a parameter to fetching data from, say, an s3 bucket or DB?
k
What I have seen in the past is a query against a metastore - or a datawarehouse
or if you are using something like Firehose, then data comes partitioned by timestamps
n
Do you need the kick-off time as a parameter to fetching data from, say, an s3 bucket or DB?
Cool, so that’s a “yes” to this question
r
Yep agreed that a date parameter is important. Whether we're training a model or batch inference, we'll always do it 'as of' some date. Something like Airflow's logical_date would be great.
s
Old thread but I want to chime in and say "yes absolutely". Tangential to this is being able to re-run a failed workflow at the originally scheduled time. I have not found a way of doing it but this is a crucial feature for e.g. backfilling jobs.
k
@Sebastian this is something we are thinking of adding. Today you should be able to run as running is simply an adhic execution
Also does the rerun not work? Or recover
s
How would you expose a time stamp and at the same time have it work with scheduling? If you set the time stamp parameter to datetime.now() in your launch plan it is executed at build time, not executive time.
k
No, Flyte allows timestamp to be variable for scheduled workflows right?
It’s called kickoff time input arg - you have to explicitly bind it - https://docs.flyte.org/projects/cookbook/en/stable/auto/core/scheduled_workflows/lp_schedules.html
s
That has not been my experience
k
Would love to understand
s
Thank you for the link. So kickoff_time is a special arg which gets supplied by the scheduler? Is it exposed though the web ui as well?
k
Yes
You can call it whatever
In Flyte everything has inputs. All inputs are exposed in Ui
To schedule you can only have one variable input all others need to be fixed. This is why launchplans exist
You can fix all other inputs and tell the launch plan which I put should the time value be sent in - on this case it is kickoff_time
Does that help - cc @Samhita Alla maybe we have a better doc here?
s
Thank you very much. So to clarify the error I made: I had in the launch plan specified something like
default_inputs={"execution_time": datetime.now()}
, which is evaluated to the build time
k
Sorry for confusion @Sebastian
Aah ya, that is fixing the time to when you build
s
But if I understand you correctly i can remove that line and add
kickoff_time_input_arg="execution_time",
and it should be fine
k
Sadly this will be allowed as Flyte thinks you want a constant time of the build time as an arg
Correct
s
Yeah that makes sense in hindsight and was an error on my side
Thank you for your help, this has been very valuable!
k
No docs should cater to avoiding confusion- please recommend an edit
We are always here
s
If you think other people will make the same mistake it could be worth adding to the docs page something like "the scheduler specification and its arguments are executed when the flyte resources are compiled so something like
default_args={"kickoff_time": datetime.now()}
won't get the scheduled time but the build time. That's why we have
kickoff_time_input_arg="kickoff_time"
..."
k
Good idea
160 Views