The ideas of contracts and lineage are also very pretty. Reading this article made me realize that I’ve internally normalized unnecessarily over engineered solutions from my previous experiences.
For example, I spent a whole day working out a means to create a versioned label for my models that I internally build in order to be compatible with artifact caching (multiple finetuned models originating from any given cached pretrained model). I use this versioned label for artifact storage, training logs, and model deployment. This is pretty standard so I’ve grown accustomed to handling it manually.
However, my over engineered solution is just yet another thing to manually manage and keep in mind. I’ll upgrade my system to be better utilize these managed artifacts and enjoy the benefits.
I absolutely agree with the premise of this post. Passing around artifacts as URIs is bad practice. Querying them via partitions is beautiful. Tracking lineage allows for accountability. The sum of all of the above allows for better MLOps at scale.
It is very well written. Let me integrate it this weekend and better collect my thoughts.