Jan Fiedler
08/29/2023, 2:25 PMAZURE_STORAGE_ACCOUNT_NAME
, AZURE_STORAGE_ACCOUNT_KEY
) in the pods that run my Flyte Tasks. In the future i would like to use workload identities instead of storage account keys but thats another topic.
My Question: Is it even possible to connect to 2 different azure storage accounts with flytekit / fsspec and the way flyte works? What i have in mind is having a cluster storage account for Flyte (metadata) and a some user storage account where i upload and download user data via Flytefile / Flytedirectories. Hope this made sense 🙂Thomas Newton
09/12/2023, 4:54 PMAZURE_STORAGE_ACCOUNT_NAME
and AZURE_STORAGE_ACCOUNT_KEY
cause varying levels of disruption to other libraries that try to authenticate to Azure. I concluded that I can't accept a solution that sets any pod level Azure auth e.g. standard environment variables.
I can think of 2 ways to do this:
1. Use a single credential and fix so that Azure default credentials works in flyte.
a. Use your fix from https://github.com/flyteorg/flytekit/pull/1813
b. Fix how paths and fsspec filesystems are managed so that it extracts the storage account name from the path.
c. Use one role based authentication method e.g. service principal or workload identity for everything in the pod.
d. This almost worked apart from the outputPrefix
is not a valid Azure fsspec path. It looks like its {.configmap.remoteData.schema}://{.storage.custom.container}/{.configmap.cor.propeller.metadata-prefix}/{some unique ID}
but a valid path would look something like abfs://<container>@<storage-account>.<http://dfs.core.windows.net/|dfs.core.windows.net/><path-within-container>
. I think this is generated somewhere in flytepropellor https://docs.flyte.org/en/latest/concepts/data_management.html#:~:text=The%20argument%20%2D%2DoutputLocationPrefix%20allows,path%20to%20store%20the%20data.&text=In%20the%20sandbox%2C%20the%20default,root%20of%20the%20local%20bucket https://github.com/flyteorg/flytepropeller/blob/9aec46ddcf1995f41f893938b59e3d29f9494f62/pkg/controller/nodes/task/taskexec_context.go#L211.
2. Add explicit Azure storage configs that don't overlap with Azure defaults.
a. Add an Azure config similar to GCSConfig
and S3Config
that has authentication attributes.
b. Set the new flyte specific Azure config options using the pod template.
c. Tasks are now free to use whatever azure authentication they like for everything else.
d. For flyte's authentication we can use any method that does not require pod level configuration e.g. pod identity or workload identity.
e. I actually have a working version of this https://github.com/Tom-Newton/flytekit/commit/f714dbed827ce2f9d9052591bd1dda3429dd8d22. Use FLYTE_AZURE_ACCOUNT_NAME
and FLYTE_AZURE_ACCOUNT_KEY
environment variables.Jan Fiedler
09/12/2023, 5:49 PM