Hi folks - I'm looking to build an index of workfl...
# ask-the-community
e
Hi folks - I'm looking to build an index of workflow outputs, but have a couple of questions to resolve. I need a way of indicating what outputs those are exactly. • By convention,
end_node
is possible.. but it's not flexible enough. • Anything in the type system should be marked as indexable, so using a specific custom type won't fit. • I see that LiteralType has both
metadata
and
annotations
fields. Would either be appropriate? I see that
Annotated
is used to generate custom hash keys for cached outputs, so I thought something like this might work more generically:
Copy code
@workflow
def training_workflow() -> Annotated[LogisticRegression, "IndexMe"]:
However, looking through my S3 bucket protobuf files I don't see anything there. Thanks!
Follow-on question, assuming what I'm interested in is possible: Is there a pluggable way to extend Flyte to index these files as they're generated / written? Rather than setting up a batch job to process outputs / index them after they've been written to blob storage, it would be great to be able to hook into Flyte to do it in real time. Any thoughts on the right way to do this?
s
@Ketan (kumare3) would you mind providing your input here?
e
@Ketan (kumare3) I just stumbled on https://github.com/unionai-oss/artifact-demo and it looks like you're building out a first class
Artifact
concept already which might do the things I'm looking for. I also see the PR at https://github.com/flyteorg/flyte/pull/4077. Anything further to disclose here yet on direction / roadmap?
k
@Ethan Brown yes we are building it out
this will be ready early next year
there is a working group in this slack too
also, what you need is also available as an event stream, but artifacts takes it to the next level
all built in and working cohesively
e
Ah, that's great / exciting news! I will be following this space pretty closely then -- it looks like what you have in mind is pretty well aligned with what I was thinking.