Scott Blackwood
09/27/2023, 12:15 PM@dataclass_json
@dataclass
class InputDocument:
document_id: str
s3_path: str
The workflow looks like this and goes through multiple tasks which returned intermediary dataclass types
def my_workflow(inputs: list[InputDocument]) -> list[ProcessedDocument]:
...
When executing the workflow via FlyteRemote.execute
I get the following error
File "flytekit/core/type_engine.py", line 321, in assert_type
for f in dataclasses.fields(type(v)): # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/dataclasses.py", line 1244, in fields
raise TypeError('must be called with a dataclass type or instance') from None
TypeError: must be called with a dataclass type or instance
With the data
{
"inputs": [
{
"document_id": "test1",
"s3_path": "<s3://inputs/123.txt>"
}
]
}
Essentially FlyteRemote thinks that the list
should be a dataclass. Any idea what is going on here? It seems like flytkit can’t handle a list (native) of dataclasses.
I can get the workflow to run by adding an additional type_hints={"inputs" : list[dict[str,str]]}
, but I would really rather not constrain the workflow to this. I want to
be able to arbitrarily call multiple different workflows with different inputs.Joe Kelly
09/27/2023, 5:26 PMInputDocument
class extends DataClassJsonMixin
(from dataclasses_json
), can you see if that resolves your issue?Scott Blackwood
09/27/2023, 6:29 PMKevin Su
09/27/2023, 10:48 PMtyping.List
instead of list
Scott Blackwood
09/29/2023, 10:56 AM