Question for folks that have setup a split between metadata Flyte #flyte-support

Question for folks that have setup a split between...

gorgeous-waitress-5026

02/28/2024, 6:57 PM

Question for folks that have setup a split between metadata / data buckets. I'm trying to dial in the ideal config, and I think there's something I'm missing. In the Helm chart for flyte-core, the

storage

section is shared amongst flyteadmin, flytepropeller, etc -- so my understanding is that the bucket specified there should be the metadata bucket. Flyteadmin generates signed urls for: • pyflyte register - to store the workflow definition • pyflyte run - to store the initial inputs But I would think that inputs are data rather than metadata ... am I misunderstanding?

gorgeous-waitress-5026

02/28/2024, 6:58 PM

So doesn't that indicate that Flyteadmin has to differentiate between / generate different signed urls based on the context?

glamorous-carpet-83516

02/29/2024, 10:42 AM

But I would think that inputs are data rather than metadata ... am I misunderstanding? (edited)

int, string, and float are metadata

glamorous-carpet-83516

02/29/2024, 10:43 AM

flytekit will offload the data to other bucket for flytefile or structured dataset

glamorous-carpet-83516

02/29/2024, 10:45 AM

IIRC, flytekit send metadata to flyteadmin through gRPC, and admin will upload input.pb to s3

glamorous-carpet-83516

02/29/2024, 10:50 AM

check out here. https://github.com/flyteorg/flyte/blob/00d0a816c66141b9d9066ec4c779caa16c688467/flyteadmin/pkg/manager/impl/execution_manager.go#L911-L918

mysterious-wire-46791

03/07/2024, 7:58 PM

I also see this issue with splitting things between data and metadata buckets...the problem for me stems from the way

CreateUploadLocation

works. For instance, if I run something like

Copy code

pyflyte run remote-launchplan workflow.test_input --infile test.txt

I see calls to

/flyteidl.service.DataProxyService/CreateUploadLocation

. This will return a signed url for uploading

test.txt

to my metadata bucket, but I would like for the file to go to my data bucket. I have tried modifying flyteadmin's storage location, but then all of the

.pb

files it writes also go to my data bucket Is there a different way I should be launching workflow executions so the inputs are stored in my data bucket?

2 Views

Open in Slack

Previous Next