Niels Bantilan10/25/2022, 2:04 PM
is a TF-specific file format that we wanted to support out-of-the-box?
type and it internally stores the data in a tfrecord file for serialization. I’ve been wondering why someone might use
as a data type.
Ryan Nazareth10/25/2022, 3:57 PM
unless i misunderstood the requirements of the original issue. The main use case(s) seems to be for image data and training on TPUs (see next comment in thread). The alternative i guess is to provide support for a
type which then gets converted to
type and serialised and stored as
- see the images i/o example at the end of https://www.tensorflow.org/tutorials/load_data/tfrecord
and quoting below from the docs regarding the use case “An important use case of the TFRecord data format is training on TPUs. First, TPUs are fast enough to benefit from optimized I/O operations. In addition, TPUs require data to be stored remotely (e.g. on Google Cloud Storage) and using the TFRecord format makes it easier to load the data without batch-downloading.” https://keras.io/examples/keras_recipes/creating_tfrecords/
<-> tfrecord conversion available. We may need to add support for
as well. • IMO, this Flyte type should only perform conversion from `tf.Train.Example`/`tf.data` to tfrecord file (like Flyte ONNX types) cause tfrecord format per se can be used while training the model, but not the vice versa. We can also have a task to read the tfrecord file but it can include a lot of customizations, e.g., https://www.tensorflow.org/tutorials/load_data/tfrecord#read_the_tfrecord_file & https://keras.io/examples/keras_recipes/creating_tfrecords/#train-a-simple-model-using-the-generated-tfrecords. So I’m not sure if we want a task to read the tfrecord file. cc: @Niels Bantilan @Ketan (kumare3)
Ryan Nazareth10/26/2022, 1:03 PM
. Also, regarding your second point, Im assuming in that case i should create a new flytefile type
which would then be returned as `TfRecordFile(path=lv.scalar.blob.uri)`in
method of TypeTransformer, similar to the pytorch ONXX implementation ? Let me know once @Niels Bantilan and @Ketan (kumare3) have confirmed.
def to_python_value(self, ...)
Niels Bantilan10/26/2022, 8:36 PM
, `TFRecord`s and
in real life, and how do we create reasonable i/o and serialization/deserialization boundaries between these objects, again, in the context of how people actual use them, and in a way that provides value to users? I created a draft proposal on the PR: https://github.com/flyteorg/flytekit/pull/1240#issuecomment-1292623521
Dennis O'Brien10/29/2022, 5:03 PM
to Flyte (and whether it's even a good idea) and created this feature request issue: https://github.com/flyteorg/flyte/issues/3038