Ryan Nazareth11/23/2022, 2:28 PM
, which has tasks to generate TFRecordFile and TFRecordDirectory from tf.train.Example outputs and subsequently deserialise back to
if required. The bit thats causing an error is
which, seems to not be recognising the Annotated TFRecordDatasetV2 type although it should be the expected type in the transformers i have registered here flytekit/extras/tensorflow/record.py. I presume this is why it is defaulting to FlytePickle as in the traceback (this is only my assumption and based on some other logic i have seen elsewhere in the codebase). Also, my other test
@task def consume(dataset: Annotated[TFRecordDatasetV2, TFRecordDatasetConfig(name="testing")]):
which is testing the
logic works fine so i feel something else is an issue which I am finding difficult to spot at the moment.
expected_python_type = typing.Annotated[flytekit.types.pickle.pickle.FlytePickle.__class_getitem__.<locals>._SpecificFormatClass, TFRecordDatasetConfig(compression_type=None, buffer_size=None, num_parallel_reads=None, name='testing')] def to_python_value(self, ctx: FlyteContext, lv: Literal, expected_python_type: Type[T]) -> T: uri = lv.scalar.blob.uri # Deserialize the pickle, and return data in the pickle, # and download pickle file to local first if file is not in the local file systems. if ctx.file_access.is_remote(uri): local_path = ctx.file_access.get_random_local_path() ctx.file_access.get_data(uri, local_path, False) uri = local_path with open(uri, "rb") as infile: > data = cloudpickle.load(infile) E _pickle.UnpicklingError: invalid load key, '5'.
Ryan Nazareth11/23/2022, 5:20 PM
but not either of
for which you defined TypeTransformers, and hence it’s defaulting to pickle.
Ryan Nazareth11/24/2022, 1:28 PM
for example which is doing something similar ie. transformer defined for TensorFlow2ONNX and converting to ONNXFile type in `to_literal`: https://github.com/flyteorg/flytekit/blob/430795d9ee4aa48957554a6eb1446fa80993681a[…]ytekit-onnx-tensorflow/flytekitplugins/onnxtensorflow/schema.py in the associated test
here (https://github.com/flyteorg/flytekit/blob/430795d9ee4aa48957554a6eb1446fa80993681a/plugins/flytekit-onnx-tensorflow/tests/test_onnx_tf.py#L49-L75 the model promise object is is annotated in output from task
which is then subsequently passed into downstream task
Annotated[ TensorFlow2ONNX, TensorFlow2ONNXConfig( input_signature=(tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),), opset=13 ), ],
, but the parameter is annotated as the expected type
. This is similar to what im trying to do here with
(which ive annoated in upstream task) and then need to handle conversion to
when it is passed into new task, as per the design suggested by @Niels Bantilan in the last code block here https://github.com/flyteorg/flytekit/pull/1240#issuecomment-1292623521
is a FlyteFile for which a TypeTransformer has already been defined. Hence, it won’t default to pickle. What is
? Can you tell me the type of it?
Ryan Nazareth11/25/2022, 12:54 PM
. - so i would need to register this in the transformer i have defined along with
but can multiple types be registered in same transformer via Union like below ?
TENSORFLOW_FORMAT = "TensorFlowRecord" def __init__(self): super().__init__(name="TensorFlow Record", t=Union[TFRecordFile, TFRecordDatasetV2])
? Can’t you use
Ryan Nazareth11/25/2022, 1:15 PM
constuctor with filenames (maybe i misunderstood), as was discussed in https://github.com/flyteorg/flytekit/pull/1240#issuecomment-1293507708
to denote that the value is
, right? The reason I used two separate types in ONNX Flyte type is because
returns a file and it made more sense to use
as the type when the value is consumed in a task. Moreover, the
type needed annotated metadata while the other,
doesn’t require any metadata cause it’s just a file. So what’s the type of
? In ONNX, it was a file, but here, it isn’t case. I didn’t add any additional type transformer for the file cause it already exists. I just added a type:
, similar to
Ryan Nazareth11/25/2022, 3:36 PM
, part of tensorflow dataset api which is wrapper and container for storing collections (in our case tensorflow record files). I don’t think it falls into any of the existing supported types in flytekit/types which i could use or create custom types from. Do you mean what does
return ? I think its
but can check.
and supported) and also include metatdata (
) which user would provide to instantiate
, if that makes sense ?
), you’d say the type to be
and while accepting it as an input (
), you’d say
. Let me know what you think. This should simplify implementing the transformer.
, respectively. Since input will have metadata, that’s where you need to use the user-provided metadata.
Ryan Nazareth11/28/2022, 5:37 AM
needs to be supported here, as I can see there is another issue created for supporting this as another transformer https://github.com/flyteorg/flyte/issues/3038 ?
should be good for now. 🙂
Ryan Nazareth11/28/2022, 11:59 PM
! I’ve added more tests as well and they all pass (ci build failures unrelated to this PR) ! Ive pushed latest changes https://github.com/flyteorg/flytekit/pull/1240/files
Ryan Nazareth11/29/2022, 9:45 PM