Calvin Leather
09/20/2022, 4:53 PMYee
return file_a, file_b
that they end up close to each other somehow? like possibly in the same prefix in s3?Calvin Leather
09/20/2022, 8:08 PMYee
Niels Bantilan
09/20/2022, 9:08 PMFlyteDirectory
would be appropriate to use here… how are these “file families” distinct from directories?Calvin Leather
09/20/2022, 9:09 PMNiels Bantilan
09/20/2022, 9:16 PMSometimes you want to download just the index so you can check whether something exists in the large file it indexes before you download itinteresting… yeah I think if you can write down requirements like this in an issue it would help us figure out how to extend FlyteDirectories https://github.com/flyteorg/flyte/issues/new?assignees=&labels=enhancement%2Cuntriaged&template=feature_request.yaml&title=%5BCore+feature%5D+
Greg Gydush
09/21/2022, 12:21 AMclass FlyteFileWithIndex(FlyteFile, metaclass=TypeTransformerMeta):
index_extensions: typing.List[str] = []
index_requirement: typing.Literal["all", "any"] = "any"
The type transformer looks at the list of index extensions, checks if “any” or “all” of the index files exists based on the specified index_requirement (“any” is useful for things like BAM that can have either “.bai” or “.bam.bai” suffix, “all” is useful for strict matching). If index(es) exists, it will download both the file and its associated index files in the same directory, otherwise it will error.
So for example, this is what a type would look like to handle BAM files, which can be defined inside the workflow or in a library that is used by the workflow:
class BamFile(FlyteFileWithIndex):
index_extensions = [".bai", ".bam.bai"]
index_requirement = "any"
Calvin Leather
09/21/2022, 2:42 PMYee
Calvin Leather
09/21/2022, 4:54 PMYee
Calvin Leather
09/22/2022, 1:12 AM