hi team, what’s the recommended way to pass transf...
# ask-the-community
hi team, what’s the recommended way to pass transformers models and tokenizers between tasks? eg. i load using
, but i cannot directly return model and pass it to the next tasks because the model cannot be pickled
cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
. Do i need to save the model with transformers’
to nfs and load it when i need to use it in the next task?
Hi Melody! Flyte has a an extendable type engine, so we can add a new type for this library. I believe this is hugging face? I would be happy to add a special transformer. It would be help if you could share a bit more of your example task so I can make sure that it works.
hi Evan! yes, this is a transformers model from huggingface. i’m trying to run the stanford alpaca pipeline as tasks in flyte as an experiment. The highlighted line loads a pre-trained model, and i want to pass this model to the next task in memory but this model cannot be pickled.
Just to add to what Evan said, today flytekit has a huggingface plugin that supports datasets: https://pypi.org/project/flytekitplugins-huggingface/ Contributions are welcome to cover a larger portion of the huggingface API. 🙂
It would be great to add this to the plug-in. I think in @Melody Lui use case you might want to use a helper function to load and not a separate task. Tasks run in separate containers, so it wouldn’t be in memory. The pickle would get saved to s3 and reloaded, which is slow for larger objects.