Dylan Wilder

    Dylan Wilder

    1 month ago
    Upgrading to flytekit 1.0 and i have a task taking a single schema which can't marshal it to a dataframe because this conditional is false
    lv and lv.scalar and lv.scalar.schema
    . looking at the inputs it seems that old versions of the task don't have the
    format
    field in the flyte literal, but now they do. also looks like this code returns empty dataframe rather than erroring. any context on these changes that might help point to the behavior we're seeing?
    cc @Govind Raghu
    also cc @Babis Kiosidis for visibility
    Ketan (kumare3)

    Ketan (kumare3)

    1 month ago
    Cc @Yee / @Niels Bantilan
    @Kevin Su
    Yee

    Yee

    1 month ago
    can you copy your type signatures for the failing task and also the tasks responsible for any inputs to the failing task?
    is anything cached?
    also can you send the full stack trace?
    Dylan Wilder

    Dylan Wilder

    1 month ago
    i can, but unfortunately this is an error in JasperSchema (or proto-structureddataset that we need to deprecate) so it won't point to your code
    the main diff on flytebackend is the the addition of the
    format
    field in the protobuf repr
    Yee

    Yee

    1 month ago
    yeah
    but it’s a different proto message.
    Dylan Wilder

    Dylan Wilder

    1 month ago
    signature is bascically
    @task
    def t1() -> pd.Dataframe
        ...
    @task
    def t2(s: JasperSchema) -> ???
    
    @workflow
    def w()
        t2(s=t1())
    Yee

    Yee

    1 month ago
    we didn’t add the field to schema, we changed the backend representation entirely of what a
    pd.DataFrame
    is. now it’s this
    but similar to flyteschema, the type message itself is in the literal, which is why you’re seeing it
    Dylan Wilder

    Dylan Wilder

    1 month ago
    oh shoot so it's serialized as a structured dataset 😢
    well that explains it
    Yee

    Yee

    1 month ago
    yeah 😞
    sorry we didn’t think of this use-case.
    Dylan Wilder

    Dylan Wilder

    1 month ago
    ok i think we can fix this by changing t1 to output jasperschema
    Yee

    Yee

    1 month ago
    yes
    should be able to at least.
    jasper is the thing that writes to bq?
    Dylan Wilder

    Dylan Wilder

    1 month ago
    yea, it's tech debt we need unwind but want to upgrade to 1.0 first
    should this not be caught at registration time btw?
    Yee

    Yee

    1 month ago
    i guess… it’s hard to know though right?
    if that had been FlyteSchema instead of JasperSchema, it should work
    cuz we also added code going the other way
    so the default flyteschema transformer should know how to handle cached structureddatasets
    Dylan Wilder

    Dylan Wilder

    1 month ago
    oh, so flyteschemas can read structured datasets
    got it, ok that's good to know, it's an issue on our side
    Yee

    Yee

    1 month ago
    hang on… am i forgetting things again