Samhita Alla
Niels Bantilan
10/25/2022, 2:04 PM.tfrecord
is a TF-specific file format that we wanted to support out-of-the-box?Samhita Alla
tf.Train.Example
type and it internally stores the data in a tfrecord file for serialization. I’ve been wondering why someone might use tf.Train.Example
as a data type.Ryan Nazareth
10/25/2022, 3:57 PM.tfrecord
unless i misunderstood the requirements of the original issue. The main use case(s) seems to be for image data and training on TPUs (see next comment in thread).
The alternative i guess is to provide support for a tf.train.Features
type
which then gets converted to Example
type and serialised and stored as .tfrecord
- see the images i/o example at the end of https://www.tensorflow.org/tutorials/load_data/tfrecordtf.Example
<-> tf.record
and quoting below from the docs regarding the use case
“An important use case of the TFRecord data format is training on TPUs. First, TPUs are fast enough to benefit from optimized I/O operations. In addition, TPUs require data to be stored remotely (e.g. on Google Cloud Storage) and using the TFRecord format makes it easier to load the data without batch-downloading.”
https://keras.io/examples/keras_recipes/creating_tfrecords/Samhita Alla
tf.data
<-> tfrecord conversion available. We may need to add support for tf.data.Dataset
as well.
• IMO, this Flyte type should only perform conversion from `tf.Train.Example`/`tf.data` to tfrecord file (like Flyte ONNX types) cause tfrecord format per se can be used while training the model, but not the vice versa. We can also have a task to read the tfrecord file but it can include a lot of customizations, e.g., https://www.tensorflow.org/tutorials/load_data/tfrecord#read_the_tfrecord_file & https://keras.io/examples/keras_recipes/creating_tfrecords/#train-a-simple-model-using-the-generated-tfrecords. So I’m not sure if we want a task to read the tfrecord file.
cc: @Niels Bantilan @Ketan (kumare3)Ryan Nazareth
10/26/2022, 1:03 PMtf.data
.
Also, regarding your second point, Im assuming in that case i should create a new flytefile type TfRecordFile=FlyteFile[typing.TypeVar('tfrecord')]
which would then be returned as `TfRecordFile(path=lv.scalar.blob.uri)`in def to_python_value(self, ...)
method of TypeTransformer, similar to the pytorch ONXX implementation ?
Let me know once @Niels Bantilan and @Ketan (kumare3) have confirmed.Niels Bantilan
10/26/2022, 8:36 PMtf.train.Example
, `TFRecord`s and tf.data.Dataset
in real life, and how do we create reasonable i/o and serialization/deserialization boundaries between these objects, again, in the context of how people actual use them, and in a way that provides value to users?
I created a draft proposal on the PR: https://github.com/flyteorg/flytekit/pull/1240#issuecomment-1292623521Samhita Alla
Dennis O'Brien
10/29/2022, 5:03 PMtf.data.Dataset
to Flyte (and whether it's even a good idea) and created this feature request issue: https://github.com/flyteorg/flyte/issues/3038