Taylor Stout
02/16/2023, 3:06 PMDan Rammer (hamersaw)
02/16/2023, 3:14 PMDan Rammer (hamersaw)
02/16/2023, 3:15 PMDan Rammer (hamersaw)
02/16/2023, 3:16 PMDan Rammer (hamersaw)
02/16/2023, 3:17 PMDan Rammer (hamersaw)
02/16/2023, 3:18 PMTaylor Stout
02/16/2023, 3:38 PMDan Rammer (hamersaw)
02/16/2023, 4:44 PMTaylor Stout
02/16/2023, 4:49 PMDan Rammer (hamersaw)
02/16/2023, 7:22 PMpyflyte run --remote
Flyte packages the python code and writes it to the blobstore, then when a task is executed it needs to download the code from the blobstore, decompress it, and then it can execute it.
The alternative approach is to build a specific image that already contains the code. It is explained in more depth here. This way, executing a task just starts a container and runs the python fucntion (without needing to download the code from blobstore first). This is the preferred approach for productionized workflows because of the performance.
I don't think it will have significant impact on the task execution time, but it may be worth trying. If the blobstore access if very slow fast registration will result in more overhead.Yee
FLYTE_SDK_LOGGING_LEVEL=10
set that as part of the environment variable in the @task decorator itself.Yee
@task(environment={"FLYTE_SDK_LOGGING_LEVEL": "10"})
Yee
Taylor Stout
02/16/2023, 7:55 PMAutomatically registering file:// as file with <flytekit.types.structured.basic_dfs.PandasToParquetEncodingHandler object at 0x7f6a5a8593f0>
Registered <flytekit.types.structured.basic_dfs.PandasToParquetEncodingHandler object at 0x7f6a5a8593f0> as handler for <class 'pandas.core.frame.DataFrame'>, protocol gs, fmt parquet
Automatically registering file:// as file with <flytekit.types.structured.basic_dfs.ParquetToPandasDecodingHandler object at 0x7f6a5a7eb670>
Yee
Yee
Yee
Taylor Stout
02/16/2023, 8:00 PMTaylor Stout
02/16/2023, 8:01 PMYee
Yee
Yee
Yee
Taylor Stout
02/17/2023, 2:48 PMTaylor Stout
02/17/2023, 2:48 PMTaylor Stout
02/17/2023, 4:37 PMTaylor Stout
02/17/2023, 4:38 PM