melodic-magician-71351
04/17/2023, 11:23 AMFlyteFile[Format]
syntax with `StructuredDataset`s? It looks like they are backed by a different proto, so it's not clear to me how that works.freezing-airport-6809
melodic-magician-71351
04/17/2023, 1:34 PMFlyteFile[structured_dataset.PARQUET]
?freezing-airport-6809
freezing-airport-6809
melodic-magician-71351
04/17/2023, 2:03 PMmelodic-magician-71351
04/17/2023, 2:07 PMFlyteFile[PyTorchModule]
etc and then creating a flytefile when registering the launch plan, but I'm not sure if there's a way to do this with a StructuredDatasetmelodic-magician-71351
04/17/2023, 2:23 PMFlyteFile[structured_dataset.PARQUET]
works fine for inputting as a launch plan parameter, but we can't figure out how to pass the promise to a task that takes a StructuredDataset
as an input.melodic-magician-71351
04/17/2023, 2:25 PMnn.Module
or np.ndarray
) . That doesn't work for StructuredDataset
. We get an error (will paste when I find it), that the input type doesn't match the expected type.melodic-magician-71351
04/17/2023, 2:27 PMglamorous-carpet-83516
04/17/2023, 3:48 PMStructuredDataset.uri
enough-car-91616
04/17/2023, 6:07 PM@task
def do_task(a: StructuredDataset) -> int:
...
@workflow
def do_workflow(a: FlyteFile[structured_dataset.PARQUET]):
...
LaunchPlan.create('PlanB', do_workflow, default_params={'a': FlyteFile('<gs://path_to_flyte_parquet_output>')})
And this is the error we are getting:
Error 0: Code: MismatchingTypes, Node Id: n1, Description: Variable [a] (type [blob:<format:"parquet" > ]) doesn't match expected type [structured_dataset_type:<> ].
Using FlyteFile[NumpyArrayTransformer.NUMPY_ARRAY_FORMAT]
in the workflow and receiving np.ndarray
in the task works fine.thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
enough-car-91616
04/17/2023, 9:30 PMdefault_inputs={'a': StructuredDataset(uri='gs://...', file_format=structured_dataset.PARQUET)}
This runs into the following error:
TypeError: int() argument must be a string, a bytes-like object or a number, not '_NoValueType'
when reading the received parameter in the task using:
a.open(pd.DataFrame).all()
(I declared a
as being StructuredDataset
in the task)enough-car-91616
04/17/2023, 9:30 PMenough-car-91616
04/18/2023, 11:35 PM@task
def do_task(a: Annotated[StructuredDataset, kwtypes(my_column: float)]) -> int:
...
@workflow
def do_workflow(a: StructuredDataset):
...
LaunchPlan.create('PlanB', do_workflow, default_params={'a': StructuredDataset(uri='gs://...', file_format=PARQUET)})
Obviously one can just use StructuredDataset
instad of Annotated[StructuredDataset, kwtypes(my_column: float)]