Another question: How can I use Data Persistence L...
# ask-the-community
Another question: How can I use Data Persistence Layer in GCS? I had looked on the documentation, but found nothing... Is there an example of some sort?
@aqui Can anybody help with this?
Hi Victor, are you asking about a Flyte deployment that uses GCS as the blob store? We’re currently revamping our deployment guides, which will include a reference terraform implementation for deploying Flyte to GCP. The legacy guide for deploying to GCP can be found here, which includes a section on configuring a GCS bucket for Flyte here.
flytekit’s data persistence layer is to handle uploading/downloading artifacts from the blob store that’s configured in the control plane (the FlyteAdmin backend)
so the short answer is “yes and no”… “yes” because the data persistence API will use whatever blob store is configured in the cluster. “no” because you’ll need a GCP-backed Flyte cluster to use that blob store in the first place
if I can ask, what are you trying to do?
I'm aiming to create some tasks that download and pre-processes some images, but due to large amount of files, we are trying to optimize it by reducing storage across tasks and to parallelize it. We are aiming to use map tasks for the parallelization part, but I have understood that we would need a data persistence layer to use images metadatas between tasks
So from a flytekit user perspective you generally don’t have to think about the data persistence plugin: you need to just
pip insatll gsutil
and flyte handles serialization/deserialization of data structures (including files and images) to and from the blob store. what kind of metadata do you want to attach to the images?
Just informations regarding image quality of some sort, nothing too complex
right, so we recommend using
to attach metadata to files (see custom python objects in docs):
Copy code
from dataclasses import dataclass
from dataclasses_json import dataclass_json
from flytekit.types.file import PNGImageFile

class ImageWithMetadata:
    Example of a simple custom class that is modeled as a dataclass

    file: PNGImageFile
    metadata: typing.Dict[str, str]
will behave like a regular dataclass, but Flyte will automatically handle writing to/reading from blobstore when it’s used in tasks and workflows
Oh, that's cool! Thank you so much!
no problem! feel free to ping back here if you have any issues