Hi! Just ran into an issue when training a model i...
# ask-the-community
a
Hi! Just ran into an issue when training a model in huggingface trainer w/ multiple GPUs in the flyte task. Pinned it down to that the shared memory thats accessed by the huggingface trainer runs out (i think for us it had defaulted to about 60mb) The solution we are thinking of it to volume in more storage to the /dev/shm however finding it a bit tricky to find where to do that