Also <@U01DYLVUNJE>, regarding <
# hacktoberfest-2022
Also @Niels Bantilan, regarding, are there any use cases you could think of?
not sure I understand the question… I assumed
is a TF-specific file format that we wanted to support out-of-the-box?
This PR adds a
type and it internally stores the data in a tfrecord file for serialization. I’ve been wondering why someone might use
as a data type.
@Samhita Alla This is the pattern tf docs suggest for serialising to
unless i misunderstood the requirements of the original issue. The main use case(s) seems to be for image data and training on TPUs (see next comment in thread). The alternative i guess is to provide support for a
type which then gets converted to
type and serialised and stored as
- see the images i/o example at the end of
similar example(s) in keras documentation using
and quoting below from the docs regarding the use case “An important use case of the TFRecord data format is training on TPUs. First, TPUs are fast enough to benefit from optimized I/O operations. In addition, TPUs require data to be stored remotely (e.g. on Google Cloud Storage) and using the TFRecord format makes it easier to load the data without batch-downloading.”
@Ryan Nazareth, makes sense. I have a couple of suggestions: • There’s no
<-> tfrecord conversion available. We may need to add support for
as well. • IMO, this Flyte type should only perform conversion from `tf.Train.Example`/`` to tfrecord file (like Flyte ONNX types) cause tfrecord format per se can be used while training the model, but not the vice versa. We can also have a task to read the tfrecord file but it can include a lot of customizations, e.g., & So I’m not sure if we want a task to read the tfrecord file. cc: @Niels Bantilan @Ketan (kumare3)
@Samhita Alla sure i can make the change and add support for
. Also, regarding your second point, Im assuming in that case i should create a new flytefile type
which would then be returned as `TfRecordFile(path=lv.scalar.blob.uri)`in
def to_python_value(self, ...)
method of TypeTransformer, similar to the pytorch ONXX implementation ? Let me know once @Niels Bantilan and @Ketan (kumare3) have confirmed.
So after discussing with @Eduardo Apolinario (eapolinario) and thinking about this a little more, I think we still need to discuss the core question of: how do people actually use
, `TFRecord`s and
in real life, and how do we create reasonable i/o and serialization/deserialization boundaries between these objects, again, in the context of how people actual use them, and in a way that provides value to users? I created a draft proposal on the PR:
@Samhita Alla @Ryan Nazareth @Ketan (kumare3) @Eduardo Apolinario (eapolinario) @Yee would appreciate your thoughts/feedback
Thanks, @Niels Bantilan! Added a suggestion.
Related to this discussion, I've been thinking about how to add support for
to Flyte (and whether it's even a good idea) and created this feature request issue: