prehistoric-mechanic-34647
08/11/2022, 10:36 PM.py
file (e.g. download_data.py, preprocess_data.py, train_model.py, eval_model.py, etc).
Currently I have wrangled these scattered .py scripts into somewhat of a workflow using a Makefile such that each step in the pipeline can be executed through a make
command (e.g. make download_data
, make_preprocess_data
, etc).
The target of each Makefile step calls a .sh
shell script that executes the .py
file for that step.
The command make run_entire_pipeline
calls each of the ~7 steps in sequence, as a rudimentary (linear) DAG.
Obviously this rough pipeline misses a few benefits such as caching earlier steps such that they do not need to be executed if they've already been performed (e.g. no need to re-download data on a subsequent model training pipeline run if the data has already been downloaded on an earlier run of the pipeline and if there have been no changes in that data).
What is the best way to migrate this Make-based workflow into a Flyte-based workflow? Specifically is there a way to map each .py
scripts to a @task
when building a @workflow
pipeline in Flyte? I learned about the Flyte "Script mode", and it sounds somewhat akin to what I'm trying to do, but I'm totally new to Flyte. Thanks for any help and direction.
I'm working with very large digital pathology whole slide image (WSI) images, BTW. Does Flyte support inputs of the WSI variety? I.e. .mrxs
, .tiff
, .czi
, .jpeg
, .png
, etc?freezing-airport-6809
freezing-airport-6809
Obviously this rough pipeline misses a few benefits such as caching earlier steps such that they do not need to be executed if they’ve already been performed (e.g. no need to re-download data on a subsequent model training pipeline run if the data has already been downloaded on an earlier run of the pipeline and if there have been no changes in that data).The benefits you get is - failure tolerance, distributed execution, caching and isolation. Today you do not get the benefit of re-using data that has been downloaded already
What is the best way to migrate this Make-based workflow into a Flyte-based workflow? Specifically is there a way to map eachThere are 2 ways 1. Use the ShellTask to model what you had today with little more data passing. Thus model it as a Flyte workflow 2. Or update your scripts to have ascripts to a.py
when building a@task
pipeline in Flyte? I learned about the Flyte “Script mode”, and it sounds somewhat akin to what I’m trying to do, but I’m totally new to Flyte. Thanks for any help and direction.@workflow
task
function each
@task
def foo(...):
globals ...
3. You can also mix and match.
The workflow can also be constructed either using imperative model or using the @workflow
syntax/DSL
Note : you can ofcourse mix and match and slowly migrate if you want.
Ideally migrate to the @task
syntax as this is already python
I’m working with very large digital pathology whole slide image (WSI) images, BTW. Does Flyte support inputs of the WSI variety? I.e.Any type of File can be handled using the FlyteFile It will automatically upload and download files to S3/GCS etc Example: Workfing Withe Files,.mrxs
,.tiff
,.czi
,.jpeg
, etc.png
prehistoric-mechanic-34647
08/12/2022, 2:57 PMShellTask
object? From the ShellTask
docs it looks like the user would have to refactor the shell script to explicitly specify the script's inputs and outputs. Specifically using the syntax {inputs.input_name}
and {outputs.output_name}
.
Obviously we could do this manually for each input and output.
But what if you're passing in an entire dictionary of config params (using something like Hydra's ConfigDict object). Could you simply pass in all those numerous hyperparameters using {inputs.myHydraConfigObject}
rather than spelling each one out like {inputs.hydra_object.learning_rate}
, {inputs.hydra_object.num_epochs}
, etc?freezing-airport-6809
late-pencil-3873
08/15/2022, 7:07 AMtall-lock-23197
prehistoric-mechanic-34647
08/16/2022, 10:02 PMhydra-core
with flyte
freezing-airport-6809
pyflyte run