https://flyte.org logo
Join the conversationJoin Slack
Channels
announcements
ask-the-community
auth
conference-talks
contribute
databricks-integration
datahub-flyte
deployment
ecosystem-unionml
engineeringlabs
events
feature-discussions
flyte-bazel
flyte-build
flyte-console
flyte-deployment
flyte-documentation
flyte-github
flyte-ui-ux
flytekit
flytekit-java
flytelab
great-content
hacktoberfest-2022
helsing-flyte
in-flyte-conversations
introductions
jobs
konan-integration
linkedin-flyte
random
ray-integration
ray-on-flyte
release
scipy-2022-sprint
sig-large-models
workflow-building-ui-proj
writing-w-sfloris
Powered by Linen
datahub-flyte
  • k

    Ketan (kumare3)

    02/09/2023, 6:47 PM
    Hello all, @Victor Gustavo da Silva Oliveira here is interested in Datahub and Flyte integration. In the past - @Stephen and @Fredrik have also worked on it - there are few other folks in the community. I have also added folks from LinkedIn, who might also be interested (do not know yet). @Yee from my team was working with some folks, but he is out for the past 2 weeks and next 2 weeks. But, eventually he can help a lot. There is a sample that we created sometime ago, with some community folks - https://github.com/unionai/flyteevents-datahub. This is a prototype. It uses Flyte Events egress and replicates them to Datahub. You have to run the `
    flytelineage
    script Problems • This is not resilient, in case of failures does not do well. • We would ideally love this to be a framework - that folks can add new plugins if needed. For example Amundsen (spotify has an internal catalog) etc • Currently it simply listens to all events and keeps them in memory for a workflow and then replicates it. This is not needed, it could use flyte remote to get all the data only on receiving a terminal event. • No one uses this in production
  • k

    Ketan (kumare3)

    02/09/2023, 6:48 PM
    We are not datahub experts, i think there is a channel in datahub as well to integrate with flyte
    d
    f
    • 3
    • 4
  • f

    Fredrik

    02/10/2023, 6:15 AM
    We chatted a lot about this with @Yee last fall, and got to a point where I would have needed to start experimenting with implementing some rudimentary support for emitting different types of events from Flyte. Life however got in the way and I have not been able the work on this since. The idea we were discussing was indeed to emit event from Flyte that then would be translated into Datahub/Amundsen/OpenLineage/etc events by a catalog-specific translator service. The event needs have been mapped out (to some extent) in this document, but it might be quite a tall order for me to start implementing these in Flyte in my current situation. Any help on this front would be highly appreciated. I’d be happy to work on a reference event translator (for datahub), though!
    k
    • 2
    • 7
  • v

    Victor Gustavo da Silva Oliveira

    02/16/2023, 8:24 PM
    After a great call with @Fredrik, we stumbled in some situations, that I believe we can think how to solve together. Some of them: How and where Flyte send its callback functions? Is it possible to send a callback in OpenLineage format? How can we capture it on DataHub's side? Fredrik tried using a REST call, but we believe that's not the best way to do it... There where some others considerations, that Fredrik can help me to remember From there, I think we can open a new issue on Flyte's repository, is that right?
    y
    f
    k
    • 4
    • 19
  • v

    Victor Gustavo da Silva Oliveira

    03/06/2023, 4:39 PM
    I had a chat with @Yee last week, and we talked about sync hour agendas to see if we can talk this week. @Fredrik when will you be available to talk? Let's try to do at a time that Yee can participate as well
    f
    y
    • 3
    • 6
  • v

    Victor Gustavo da Silva Oliveira

    03/08/2023, 6:07 PM
    @Yee here
Powered by Linen
Title
v

Victor Gustavo da Silva Oliveira

03/08/2023, 6:07 PM
@Yee here
View count: 5