Taylor Stout02/16/2023, 3:06 PM
Dan Rammer (hamersaw)02/16/2023, 3:14 PM
Taylor Stout02/16/2023, 3:38 PM
Dan Rammer (hamersaw)02/16/2023, 4:44 PM
Taylor Stout02/16/2023, 4:49 PM
Dan Rammer (hamersaw)02/16/2023, 7:22 PM
Flyte packages the python code and writes it to the blobstore, then when a task is executed it needs to download the code from the blobstore, decompress it, and then it can execute it. The alternative approach is to build a specific image that already contains the code. It is explained in more depth here. This way, executing a task just starts a container and runs the python fucntion (without needing to download the code from blobstore first). This is the preferred approach for productionized workflows because of the performance. I don't think it will have significant impact on the task execution time, but it may be worth trying. If the blobstore access if very slow fast registration will result in more overhead.
pyflyte run --remote
set that as part of the environment variable in the @task decorator itself.
Taylor Stout02/16/2023, 7:55 PM
Automatically registering file:// as file with <flytekit.types.structured.basic_dfs.PandasToParquetEncodingHandler object at 0x7f6a5a8593f0>
Registered <flytekit.types.structured.basic_dfs.PandasToParquetEncodingHandler object at 0x7f6a5a8593f0> as handler for <class 'pandas.core.frame.DataFrame'>, protocol gs, fmt parquet
Automatically registering file:// as file with <flytekit.types.structured.basic_dfs.ParquetToPandasDecodingHandler object at 0x7f6a5a7eb670>
Taylor Stout02/16/2023, 8:00 PM
Taylor Stout02/17/2023, 2:48 PM