Hey Flyte-on-Azure warriors :wave: I am now at the...
# flyte-deployment
j
Hey Flyte-on-Azure warriors đź‘‹ I am now at the point where i have a healthy Flyte deployment on Azure with ingress and tls in place. With the custom stow storage configuration for azure i am also able to run simple workflows. To make it work, i needed to set env variables (
AZURE_STORAGE_ACCOUNT_NAME
,
AZURE_STORAGE_ACCOUNT_KEY
) in the pods that run my Flyte Tasks. In the future i would like to use workload identities instead of storage account keys but thats another topic. My Question: Is it even possible to connect to 2 different azure storage accounts with flytekit / fsspec and the way flyte works? What i have in mind is having a cluster storage account for Flyte (metadata) and a some user storage account where i upload and download user data via Flytefile / Flytedirectories. Hope this made sense 🙂
t
Did you make any progress on this. I've been working on the same problem. I found that setting
AZURE_STORAGE_ACCOUNT_NAME
and
AZURE_STORAGE_ACCOUNT_KEY
cause varying levels of disruption to other libraries that try to authenticate to Azure. I concluded that I can't accept a solution that sets any pod level Azure auth e.g. standard environment variables. I can think of 2 ways to do this: 1. Use a single credential and fix so that Azure default credentials works in flyte. a. Use your fix from https://github.com/flyteorg/flytekit/pull/1813 b. Fix how paths and fsspec filesystems are managed so that it extracts the storage account name from the path. c. Use one role based authentication method e.g. service principal or workload identity for everything in the pod. d. This almost worked apart from the
outputPrefix
is not a valid Azure fsspec path. It looks like its
{.configmap.remoteData.schema}://{.storage.custom.container}/{.configmap.cor.propeller.metadata-prefix}/{some unique ID}
but a valid path would look something like
abfs://<container>@<storage-account>.<http://dfs.core.windows.net/|dfs.core.windows.net/><path-within-container>
. I think this is generated somewhere in flytepropellor https://docs.flyte.org/en/latest/concepts/data_management.html#:~:text=The%20argument%20%2D%2DoutputLocationPrefix%20allows,path%20to%20store%20the%20data.&amp;text=In%20the%20sandbox%2C%20the%20default,root%20of%20the%20local%20bucket https://github.com/flyteorg/flytepropeller/blob/9aec46ddcf1995f41f893938b59e3d29f9494f62/pkg/controller/nodes/task/taskexec_context.go#L211. 2. Add explicit Azure storage configs that don't overlap with Azure defaults. a. Add an Azure config similar to
GCSConfig
and
S3Config
that has authentication attributes. b. Set the new flyte specific Azure config options using the pod template. c. Tasks are now free to use whatever azure authentication they like for everything else. d. For flyte's authentication we can use any method that does not require pod level configuration e.g. pod identity or workload identity. e. I actually have a working version of this https://github.com/Tom-Newton/flytekit/commit/f714dbed827ce2f9d9052591bd1dda3429dd8d22. Use
FLYTE_AZURE_ACCOUNT_NAME
and
FLYTE_AZURE_ACCOUNT_KEY
environment variables.
cc @Victor Delépine and maybe @Yee is interested too.
j
Definitely sounds like the right approach. Haven’t found time yet to tackle this and won’t until mid of next week
Hey Tom, i am back from vacation. Im interested what do you think about my comment