Question for folks that have setup a split between...
# ask-the-community
e
Question for folks that have setup a split between metadata / data buckets. I'm trying to dial in the ideal config, and I think there's something I'm missing. In the Helm chart for flyte-core, the
storage
section is shared amongst flyteadmin, flytepropeller, etc -- so my understanding is that the bucket specified there should be the metadata bucket. Flyteadmin generates signed urls for: • pyflyte register - to store the workflow definition • pyflyte run - to store the initial inputs But I would think that inputs are data rather than metadata ... am I misunderstanding?
So doesn't that indicate that Flyteadmin has to differentiate between / generate different signed urls based on the context?
k
But I would think that inputs are data rather than metadata ... am I misunderstanding? (edited)
int, string, and float are metadata
flytekit will offload the data to other bucket for flytefile or structured dataset
IIRC, flytekit send metadata to flyteadmin through gRPC, and admin will upload input.pb to s3
n
I also see this issue with splitting things between data and metadata buckets...the problem for me stems from the way
CreateUploadLocation
works. For instance, if I run something like
Copy code
pyflyte run remote-launchplan workflow.test_input --infile test.txt
I see calls to
/flyteidl.service.DataProxyService/CreateUploadLocation
. This will return a signed url for uploading
test.txt
to my metadata bucket, but I would like for the file to go to my data bucket. I have tried modifying flyteadmin's storage location, but then all of the
.pb
files it writes also go to my data bucket Is there a different way I should be launching workflow executions so the inputs are stored in my data bucket?