I am looking for a way to pass Numpy Arrays (ndarr...
# announcements
b
I am looking for a way to pass Numpy Arrays (ndarray) and PyTorch/Tensorflow Tensors as Flyte Task input/output. I haven’t come across any example yet. I’m aware of the native support for Dataframes. It seems inefficient to convert ndarray/Tensors back and forth using Dataframes. How are folks handling this?
k
cc @Niels Bantilan @Eduardo Apolinario (eapolinario)
n
unfortunately the flytekit TypeEngine doesn’t have native support for numpy arrays or pytorch/tensorflow tensors… would you mind opening up an issue for that @Badar Ahmed? Currently there are 3 paths to doing this: 1. passing dataframes around (as you’ve suggested) 2. passing
List[int]
or
List[float]
and reconstituting your arrays/tensors at the beginning of the next task 3. using a
np.ndarray
or
torch.Tensor
annotation purely for human-readability. Under the hood this will pickle your array/tensor and unpickle it on the other side. (3) is convenient, but you run the risk of deserialization issues if you happen to use different versions of python/numpy/pytorch/tensorflow across your tasks that are not cross-compatible. (2) is really for smaller data use cases since these are stored as FlyteIDL literals. (1) is nice because flyte understands this and stores dataframes as parquet files, which is a more efficient/reliable storage format than pickle.
👍 1
b
Thanks @Niels Bantilan. This is helpful! I’ll open an issue for this.
k
Also @Badar Ahmed our goal is to add support for automatic marshal/unmarshal or tf.tensor - this can be added using type engine plugins. Just not done yet. Contributions welcome. Docs: https://docs.flyte.org/projects/cookbook/en/latest/auto/core/extend_flyte/custom_types.html#sphx-glr-auto-core-extend-flyte-custom-types-py
👍 2
s
265 Views