Hi folks - I'm looking to build an index of workfl...
# ask-the-community
Hi folks - I'm looking to build an index of workflow outputs, but have a couple of questions to resolve. I need a way of indicating what outputs those are exactly. • By convention,
is possible.. but it's not flexible enough. • Anything in the type system should be marked as indexable, so using a specific custom type won't fit. • I see that LiteralType has both
fields. Would either be appropriate? I see that
is used to generate custom hash keys for cached outputs, so I thought something like this might work more generically:
Copy code
def training_workflow() -> Annotated[LogisticRegression, "IndexMe"]:
However, looking through my S3 bucket protobuf files I don't see anything there. Thanks!
Follow-on question, assuming what I'm interested in is possible: Is there a pluggable way to extend Flyte to index these files as they're generated / written? Rather than setting up a batch job to process outputs / index them after they've been written to blob storage, it would be great to be able to hook into Flyte to do it in real time. Any thoughts on the right way to do this?
@Ketan (kumare3) would you mind providing your input here?
@Ketan (kumare3) I just stumbled on https://github.com/unionai-oss/artifact-demo and it looks like you're building out a first class
concept already which might do the things I'm looking for. I also see the PR at https://github.com/flyteorg/flyte/pull/4077. Anything further to disclose here yet on direction / roadmap?
@Ethan Brown yes we are building it out
this will be ready early next year
there is a working group in this slack too
also, what you need is also available as an event stream, but artifacts takes it to the next level
all built in and working cohesively
Ah, that's great / exciting news! I will be following this space pretty closely then -- it looks like what you have in mind is pretty well aligned with what I was thinking.