hi team what s the recommended way to pass transformers mode Flyte #flyte-support

hi team, what’s the recommended way to pass transf...

clever-scientist-39504

06/06/2023, 1:51 AM

hi team, what’s the recommended way to pass transformers models and tokenizers between tasks? eg. i load using

model=transformers.AutoModelForCausalLM.from_pretrained(...)

, but i cannot directly return model and pass it to the next tasks because the model cannot be pickled

cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object

. Do i need to save the model with transformers’

save_pretrained

to nfs and load it when i need to use it in the next task?

billowy-winter-86593

06/06/2023, 4:33 AM

Hi Melody! Flyte has a an extendable type engine, so we can add a new type for this library. I believe this is hugging face? I would be happy to add a special transformer. It would be help if you could share a bit more of your example task so I can make sure that it works.

👍 1

clever-scientist-39504

06/06/2023, 4:56 PM

hi Evan! yes, this is a transformers model from huggingface. i’m trying to run the stanford alpaca pipeline as tasks in flyte as an experiment. The highlighted line loads a pre-trained model, and i want to pass this model to the next task in memory but this model cannot be pickled.

high-accountant-32689

06/06/2023, 5:46 PM

Just to add to what Evan said, today flytekit has a huggingface plugin that supports datasets: https://pypi.org/project/flytekitplugins-huggingface/ Contributions are welcome to cover a larger portion of the huggingface API. 🙂

billowy-winter-86593

06/07/2023, 1:42 PM

It would be great to add this to the plug-in. I think in @clever-scientist-39504 use case you might want to use a helper function to load and not a separate task. Tasks run in separate containers, so it wouldn’t be in memory. The pickle would get saved to s3 and reloaded, which is slow for larger objects.

267 Views

Open in Slack

Previous Next