https://flyte.org logo
Title
e

Ena Škopelja

03/28/2023, 12:05 PM
Hi all, I have a task returning a
StructuredDataset
that's failing if I turn on
cache_serialize
with this error:
[3/3] currentAttempt done. Last Error: SYSTEM::Traceback (most recent call last):

      File "/opt/venv/lib/python3.9/site-packages/flytekit/exceptions/scopes.py", line 165, in system_entry_point
        return wrapped(*args, **kwargs)
      File "/opt/venv/lib/python3.9/site-packages/flytekit/core/base_task.py", line 572, in dispatch_execute
        raise TypeError(
    Failed to convert return value for var o0 for function {my task} with error <class 'pyarrow.lib.ArrowInvalid'>: ("Could not convert ('... ... (580 characters truncated) ... ...',) with type Row: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column sequence with type object')
The sequence column of the dataframe I return is a string (no non-standard characters). Any idea on what could be causing this? Regular (no
cache_serialize
) cache works just fine.
k

Ketan (kumare3)

03/28/2023, 1:03 PM
Imp cache serialize has nothing to do with it. Seems like a data issue. Some value is unexpected
e

Ena Škopelja

03/29/2023, 8:47 AM
Why does it work with regular cache then? It's a dataclass with string columns
I don't understand what it could be. I can provide an example but the column it's complaining about is just a string column
@Franziska Geiger
k

Ketan (kumare3)

03/29/2023, 1:26 PM
Hmm this is really odd
I want to help, but I cannot reproduce it
If you can help us reproduce