:wave: Does Flyte support project/domain level pre...
# ask-the-community
b
👋 Does Flyte support project/domain level prefixes for metadata storage — similar to how workflow-execution-config allow customization of the
raw_output_data
location? Context: we're trying to upgrade flytekit — and the latest version uses
botocore
to write this metadata whereas the previous version (1.3.2) used the aws cli. Kind of us specific, but the way our s3 libraries are patched means that usages of
botocore
carries a user identity with it while the aws cli executes under the machine identity. Since this metadata prefix (
metadata/propeller
) seems to be set that the global level, it means we'd have to grant all users read/write to this prefix if we want to use the user identity — which is a security issue since the metadata appears to contain user data.
k
Yes
b
Not urgent — but do let me know where I might find this configuration when you have a moment! 🙂
@Ketan (kumare3) would you be able to share the documentation for that? I wasn't able to find a similar attribute in the idls
k
ohh my bad, you can set it
Copy code
raw_output_data_config:
  output_location_prefix: <s3://example/>...
This is entry in the execution config. Set as follows
Copy code
flytectl update workflow-execution-config --attrFile ...
b
Oh. workflow-execution-config just has raw output config location though, right? I was hoping to set the metadata-prefix with project/domain that currently defaults to
metadata/propeller
k
huh?
ohh you want metadata to be segregated
this is not possible in Flyte today
union has this, by separating data plane clusters
b
Follow-up question on metadata (data under
metadata/propeller
). Is it purely required during execution of a Flyte workflow? Trying to understand the implications of setting a retention policy on this prefix. I just tried deleting the metadata folder for a recent run and am unclear if it had any effect on the previous run
d
@Brian Tang > Is it purely required during execution of a Flyte workflow? I think so. During an execution, the Task pod needs access to the metadata object materialize Inputs, or fetch large (offloaded) objects. Also, after each evaluation loop, the execution status is updated with metadata (status, node/task phases, etc) so if a lifecycle policy removes that object I guess the execution would fail completely or would be left otherwise inconsistent
b
👍 Thanks for the clarification @David Espejo (he/him)! Seems like it only affects execution, so a retention policy set sufficiently long (eg., 30d) is completely safe