Does anyone use DVC with Flyte? I'm currently tryi...
# ask-the-community
m
Does anyone use DVC with Flyte? I'm currently trying to set something up, potentially creating a plugin, but wondering if anyone has found any neat way of passing in
*.dvc
files? Or just any tips/tricks in general With Flyte files/directories we can just pass the original S3 object back nicely wrapped which is ideal
h
Hey @Michael Tinsley did you go forward with this idea? We are evaluating potential combination of dvc and Flyte to improve collaboration between team members.
m
Hey, its still on my to do list, I’ve not had much time to look at it yet. Although I’ve a few findings. • FlyteFile works really nicely when you’e grabbing a single file from DVC, and it means you don’t need to duplicate the file in S3 etc. • Directories in DVC work differently, which I hadn’t realised before. I haven’t currently found a way to prevent downloading data via DVC and reuploading it to Flytes data bucket…. this is not ideal IMO but due to DVC rather than Flyte • And I’m not sure what the developer experience should be like / what the API requirements should be. Should we make the user pass in the
file.dvc
files to strongly tie data versions to pipeline versions? We can then lean into Flyte caching. Or should it be more free form where a user can specify the DVC data they want when defining a step? Again we can cache with this, but possibly more error prone? 🤷