I can't seem to find a way to maintain my Flyte deployment's state regarding the executions and their data. I can only see commands with
to create and update entities but not removing. What's the intended way of keeping my environment clean of many unsuccessful runs?
I digged through the source code and I indeed can't find a way to remove data from the data stores. This really seems to be a missing key feature and it should be the flyteadmin that knows what records have to be removed whenever you delete an execution, a workflow, a project or whatever.
Do you plan on adding this feature anytime soon? Directly accessing the data stores seems to be a big no-no.
09/27/2023, 4:15 PM
@Eduardo Apolinario (eapolinario) do we have a ticket for this?
i think this at least deserves a ticket or an incubator, cuz we’ve heard this quite a few times. but the answer is no, not right now.
and the reason is that the bar for removing data is much higher than for what it does today, which is just archiving/adding a soft-delete flag to the record.
and what i mean by bar is higher is that the room for error is way higher.
to date, not a single flyte installation has ever deleted data from s3 or the db unintentionally, because there is no deletion code, it just doesn’t exist
but once you add that… it opens up a rabbit hole.
09/27/2023, 6:28 PM
I get that cascading the deletion requires careful design, but it should be the application that knows best what to delete when. It seems to be very difficult for us mortal men to figure out what to delete from the relational database and from the object storage because we lack the knowledge of relations mostly. And writing our own tooling to do so is really error prone, since we need to be very careful with upgrading since it can get out-of-sync pretty easily. From an administration and operation point of view I think it's necessary to be able to hard delete things without being fully aware of the implementation details.