Hello all,
@dry-ability-69144 here is interested in Datahub and Flyte integration. In the past -
@jolly-whale-9142 and
@colossal-painter-70298 have also worked on it - there are few other folks in the community.
I have also added folks from LinkedIn, who might also be interested (do not know yet).
@thankful-minister-83577 from my team was working with some folks, but he is out for the past 2 weeks and next 2 weeks. But, eventually he can help a lot.
There is a sample that we created sometime ago, with some community folks -
https://github.com/unionai/flyteevents-datahub.
This is a prototype.
It uses Flyte Events egress and replicates them to Datahub. You have to run the `
flytelineage
script
Problems
• This is not resilient, in case of failures does not do well.
• We would ideally love this to be a framework - that folks can add new plugins if needed. For example Amundsen (spotify has an internal catalog) etc
• Currently it simply listens to all events and keeps them in memory for a workflow and then replicates it. This is not needed, it could use flyte remote to get all the data only on receiving a terminal event.
• No one uses this in production