Hi team, I recently try my workflow in v1.14.0 fly...
# flyte-support
h
Hi team, I recently try my workflow in v1.14.0 flytekit, and met an strange issue when working with Union type. I have a task A, it will return a Union type Plan.
Copy code
def create_plan() -> Plan:
report_plan = ReportPlans.simple(
        model_plan,
        *charts,
        report_type=report_type,
    )

    return report_plan

# ReportPlans.simple will return SimpleReportPlan.

Plan = Union[ReportPlan, ReleasePlan]
ReportPlan = Union[SimpleReportPlan, UserDefinedReportPlan]
ReleasePlan = Union[
    SimpleModelReleasePlan,
    GetModelShaReleasePlan,
    MapReleasePlan,
    SelectModelThresholdsReleasePlan,
    SetThresholdsReleasePlan,
    AttachFeatureGeneratorsReleasePlan,
    ModelDeployReleasePlan,
    ActivateReleaseGatesReleasePlan,
]
However, when I tried to run this task, it will return:
Copy code
Traceback (most recent call last):
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/py_deps_ml_golden_202407_00_00_py311/flytekit/flytekit/flytekit/core/base_task.py", line 800, in dispatch_execute
        literals_map, native_outputs_as_map = run_sync(
                                              ^^^^^^^^^
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/py_deps_ml_golden_202407_00_00_py311/flytekit/flytekit/flytekit/utils/asyn.py", line 93, in run_sync
        return self._runner_map[name].run(coro)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/py_deps_ml_golden_202407_00_00_py311/flytekit/flytekit/flytekit/utils/asyn.py", line 72, in run
        res = fut.result(None)
              ^^^^^^^^^^^^^^^^
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/python_interpreter_3_11/lib/python3.11/concurrent/futures/_base.py", line 456, in result
        return self.__get_result()
               ^^^^^^^^^^^^^^^^^^^
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/python_interpreter_3_11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
        raise self._exception
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/py_deps_ml_golden_202407_00_00_py311/flytekit/flytekit/flytekit/core/base_task.py", line 655, in _output_to_literal_map
        raise e
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/py_deps_ml_golden_202407_00_00_py311/flytekit/flytekit/flytekit/core/type_engine.py", line 1425, in async_to_literal
        lv = await transformer.async_to_literal(ctx, python_val, python_type, expected)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/app/src/python/flyte/ml_exploration/mlp_example/py_mlp_example.binary.runfiles/py_deps_ml_golden_202407_00_00_py311/flytekit/flytekit/flytekit/core/type_engine.py", line 1934, in async_to_literal
        raise TypeTransformerFailedError(f"Cannot convert from {python_val} to {python_type}")
    flytekit.core.type_engine.TypeTransformerFailedError: Failed to convert outputs of task 'src.python.flyte.ml_exploration.mlp_example.main.create_plan' at position 0.
    Failed to convert type <class 'src.python.flyte.ml_pipeline.config.SimpleReportPlan'> to type typing.Union[src.python.flyte.ml_pipeline.config.SimpleReportPlan, src.python.flyte.ml_pipeline.config.UserDefinedReportPlan, src.python.flyte.ml_pipeline.config.SimpleModelReleasePlan, src.python.flyte.ml_pipeline.config.GetModelShaReleasePlan, src.python.flyte.ml_pipeline.config.MapReleasePlan, src.python.flyte.ml_pipeline.config.SelectModelThresholdsReleasePlan, src.python.flyte.ml_pipeline.config.SetThresholdsReleasePlan, src.python.flyte.ml_pipeline.config.AttachFeatureGeneratorsReleasePlan, src.python.flyte.ml_pipeline.config.ModelDeployReleasePlan, src.python.flyte.ml_pipeline.config.ActivateReleaseGatesReleasePlan].
Could someone point me the direction for troubleshooting?
d
does this work in flytekit 1.13.15?
h
Yeah... We use dataclass + dataclass_json
d
interesting
h
Sorry, it doesn't work with 1.13.0. It works with 1.11.0.
d
so in 1.14.0
we have more strict type checking for Union type
I guess old flytekit support Union with ambigous type (which is a bug) in some edge case
and we fix it in the new flytekit
h
What do you suggest the fix here?
d
can you run pyflyte run -vv xxx.py xxx?
I want to read more log to help you
also please help me check if there's any ambiguous problem
h
can you run pyflyte run -vv xxx.py xxx?
I believe I can!
What specfic file I should look at?
d
run the code you have errror
I mean you have this error by running
pyflyte run xxx.py t1
h
Gotcha
Untitled
Hi Han-Ru, I am still trying to run the code in my devbox, there is some difficulty there. But I found the full log in the flyteadmin, and here it is above.
I was able to run the code, following is the log:
Untitled
h
@helpful-van-10149, @thankful-minister-83577 and I were tracing through the code and the conclusion is twofold: 1. this is not an ambiguous transformer error https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L1909 2. the union transformer tries to convert the value into a literal using each of of the types described in the Union type, and (in flytekit 1.14.0) it logs a warning if it fails. In case all those calls fail we return a
TypeTransformerError
. Since you have a local repro, can you bump the verbosity of the flytekit logs, try again and reason from the exception why the Dataclass transformer is failing in all cases?
h
Thank you all!
bump the verbosity of the flytekit logs
May I know how can I do this?
btw, I also see a lot of logs like this:
Copy code
19:53:52.172255 ERROR    type_engine.py:616 - Failed to extract schema for
                         object <class
                         'src.python.flyte.ml_pipeline.config.SimpleReportPlan'>
                         , error: Type
                         src.python.flyte.ml_pipeline.config.CreateModelProtocol
                         of field "create_model_fn" in
                         src.python.flyte.ml_pipeline.config.CreateBuiltInModelP
                         lan isn't supported
                         Please remove `DataClassJsonMixin` and `dataclass_json`
                         decorator from the dataclass definition
19:53:52.192758 WARNING  type_engine.py:639 - Failed to extract schema for
                         object <class
                         'src.python.flyte.ml_pipeline.config.SimpleReportPlan'>
                         , (will run schemaless) error: Field.__init__() takes 1
                         positional argument but 2 were givenIf you have
                         postponed annotations turned on (PEP 563) turn it off
                         please. Postponedevaluation doesn't work with json
                         dataclasses
19:53:52.210011 ERROR    type_engine.py:616 - Failed to extract schema for
                         object <class
                         'src.python.flyte.ml_pipeline.config.UserDefinedReportP
                         lan'>, error: Type
                         src.python.flyte.ml_pipeline.config.ReportProtocol of
                         field "report_fn" in
                         src.python.flyte.ml_pipeline.config.UserDefinedReportPl
                         an isn't supported
                         Please remove `DataClassJsonMixin` and `dataclass_json`
                         decorator from the dataclass definition
19:53:52.227690 WARNING  type_engine.py:639 - Failed to extract schema for
                         object <class
                         'src.python.flyte.ml_pipeline.config.UserDefinedReportP
                         lan'>, (will run schemaless) error: Field.__init__()
                         takes 1 positional argument but 2 were givenIf you have
                         postponed annotations turned on (PEP 563) turn it off
                         please. Postponedevaluation doesn't work with json
                         dataclasses
Should I set
export FLYTE_SDK_LOGGING_LEVEL=10
?
h
was about to type ^
t
also set the developer logger
h
Copy code
FLYTE_SDK_DEV_LOGGING_LEVEL
Is this one?
Untitled
Much more logs
I tried to repro with a simple workflow:
Copy code
@dataclass
class A:
    a: int

@dataclass
class B:
    a: int

YOUNG = Union[A, B]

@task
def foo(inp: int) -> YOUNG:
    return B(a=inp)

@task
def foo2(inp: YOUNG) -> int:
    return inp.a

@workflow
def wf():
    inp_return = foo(inp=1)
    foo2(inp=inp_return)
But the error seems fall into Ambiguous:
Copy code
Message:

    TypeError: Failed to convert outputs of task 'src.python.flyte.ml_exploration.airail.sweeps_example.main.foo' at position 0.
    Failed to convert type <class 'src.python.flyte.ml_exploration.airail.sweeps_example.main.B'> to type typing.Union[src.python.flyte.ml_exploration.airail.sweeps_example.main.A, src.python.flyte.ml_exploration.airail.sweeps_example.main.B].
    Error Message: Ambiguous choice of variant for union type.
d
yes this should fail
local execution: 1. turn dataclass to msgpack bytes, and both will be able to be converted to msgpack bytes remote execution: 1. backend compiler will check json schema and judge these the same and make them fail
h
What should we change to make the code running?
d
turn
Copy code
@dataclass
class B:
    a: int
to
Copy code
@dataclass
class B:
    b: int
don't use the same attr name
turn
Copy code
@dataclass
class A:
    a: int

@dataclass
class B:
    a: int
to
Copy code
@dataclass
class A:
    a: int

@dataclass
class B:
    b: int
t
@helpful-van-10149 there’s no notion of mro or class definitions when working with dataclasses/pydantic models right? Not having this is what allows these objects to work in a flyteremote/jupyter context. flytekit is comparing the json schemas of these objects. In this most recent example, yes, it’s a conflict because the dataclass/pydantic can’t tell the two apart. (the fact that one is named
A
and one is named
B
is irrelevant - the schemas are identical)
@damp-lion-88352 we can do a better job with the error messages though… the error messages in the union transformer in particular aren’t as helpful as i’d like.
d
YES can do this today
thank you
t
like i was seeing this in testing…
Copy code
TypeError: Ambiguous choice of variant for union type. Both Object-Dataclass-Transformer and Object-Dataclass-Transformer transformers match
we can make that more clear - let’s print the user type also
h
@thankful-minister-83577 @damp-lion-88352 Thank for explaining the simple workflow, I get it now. How about the actual workflow's issue?
d
in Union type, we will iterate every type in
Union
, and see if we can transform python val to literal
in your case
there are more than 1 possibility
so you have to change you attr name of some dataclass type
or it will not work
when iterating every type, if we can transform >1 kind of type, then we will raise error
this PR can fix your problem
do you mind take a look? cc @helpful-van-10149
h
This PR to improve the ambiguous message is step in the right direction for flytekit but I'm not convinced that's what's causing the issue @helpful-van-10149 is seeing. The reason I say this is because we don't see any mention to ambiguous in the stack traces he shared. In fact, this message is intriguing. I want to confirm that all attempts to turn the python value into a literal are failing in the Union transformer loop, so we end up raising the TypeTransformerFailedError (btw, does that error message about
Field.___init___
make sense?). I'm especially interested in these exceptions and since you have a local repro, if you set
FLYTE_SDK_LOGGING_LEVEL=30
you should be able to see those without a lot of clutter. Also, can you confirm the versions of the dependencies you're running in each case? Specifically, what are the versions of
dataclasses-json
,
mashumaro
, and
marshmallow
when you run this in flytekit 1.11.0 and 1.14.0?
d
I think the schema doesn't matter since we use the try except logic to handle these cases
h
Untitled
what are the versions of
dataclasses-json
,
mashumaro
, and
marshmallow
when you run this in flytekit 1.11.0 and 1.14.0?
For flytekit v1.14.0:
dataclasses-json
: 0.5.7
mashumaro
: 3.13
marshmallow
: 3.21.3 For flytekit v1.11.0:
dataclasses-json
: 0.5.7
mashumaro
: 3.13
marshmallow
: 3.21.3
d
Copy code
FLYTE_USE_OLD_DC_FORMAT=true pyflyte run xxx
can you help me try this with flytekit 1.14.0?
h
@high-accountant-32689 I don't think there is any logs regarding
UnionTransformer failed attempt to convert from {python_val} to {t} error: {e}
FLYTE_USE_OLD_DC_FORMAT=true pyflyte run xxx Same error with this ^
d
oh nice
so this might not a msgpack problem
maybe related to Union Transformer
h
Is it possible that get_args(python_type) return nothing so we never execute the for loop?
d
I think its imposibble
impossible
h
I see.
Morning @thankful-minister-83577 Any clues from your side?
h
To clarify, this log will be shown when I serialized the workflow.
But it is NOT shown when pyflyte
There is a difference about the pb between two versions: in v1.14.0:
Copy code
union_type {
              variants {
                simple: STRUCT
                annotation {
                  annotations {
                    fields {
                      key: "cache-key-metadata"
                      value {
                        struct_value {
                          fields {
                            key: "serialization-format"
                            value {
                              string_value: "msgpack"
                            }
                          }
                        }
                      }
                    }
                  }
                }
                structure {
                  tag: "SimpleReportPlanTransformer"
                }
              }
              variants {
                simple: STRUCT
                annotation {
                  annotations {
                    fields {
                      key: "cache-key-metadata"
                      value {
                        struct_value {
                          fields {
                            key: "serialization-format"
                            value {
                              string_value: "msgpack"
                            }
                          }
                        }
                      }
                    }
                  }
                }
                structure {
                  tag: "UserDefinedReportPlanTransformer"
                }
              }
in v1.11.0
Copy code
union_type {
              variants {
                simple: STRUCT
                structure {
                  tag: "SimpleReportPlanTransformer"
                }
              }
              variants {
                simple: STRUCT
                structure {
                  tag: "UserDefinedReportPlanTransformer"
                }
              }
d
Hi, @helpful-van-10149 can you give me a local reproducible example?
I can use debugger to find the bug for you
h
@damp-lion-88352 Thanks for the help! I can write one now.
The workflow is bit long. We have custom logic to register our transformer to get the Union type value. Here is the code:
Untitled
pyflyte run example.py report_workflow
deleting
RegisterPolymorphicDataclasses._register_flyte_transfomers(union_type, name)
the workflow will work.. Let me dive deep on this.
^It won't work when register the workflow to Flyte.. It just hang there
h
@helpful-van-10149, I see that you have a custom dataclass type transformer. We recently fixed a bug that caused this regression. Can you install flytekit from master and try again? We're close to releasing flytekit 1.14.5 which will contain this fix, but since you have a local repro it should be pretty easy to verify if this solves your issue.
h
@high-accountant-32689 Will try now, thanks!
Will 1.15.0b0 contain this fix?
h
no, but we'll be releasing 1.14.5 which will.
Actually, there's no harm in getting 1.15.0b1 now. Give me a few min.
@helpful-van-10149, https://pypi.org/project/flytekit/1.15.0b1/ is out. Can you give it a try?
h
Trying
🥲It does not work..
I found the log of this line in k8s pod..
Copy code
[flytekit] UnionTransformer failed attempt to convert from SimpleReportPlan(model_plan=TrainTrainableModelsPlan(upstream=ExperimentModelPlan(upstream=CreateBuiltInModelPlan(create_model_fn=FunctionReference(local_reference='src.python.flyte.ml_exploration.mlp_example.main/create_model/M1PRv3kOk3U4XiMjtCwB1Q9R/jy7dFZaWv+LRjkcxLY=', remote_reference=RemoteReference(project='ml-exploration', domain='adhoc', name='src.python.flyte.ml_exploration.mlp_example.main.create_model', version='upgrade-flytekit-14-7ce2dd26d4e3-zoz7')), model_details=BuiltInModelTypeDetails(model_type=<ModelType.XGBOOST: 'xgboost'>, serialized_kwargs='{"boolean_features":["server_ipmaxmind__is_anonymous"],"categorical_features":["card_bin__bin"],"label":"has_fraud_dispute","numerical_features":["stripe_jstime_on_page","amount_in_usd"]}', _type='built_in_model_type_details'), dataset_plans=DatasetPlanSet(plans={'train': SimpleDatasetPlan(path='<s3://stripe-data/ml-platform/examples/txn-fraud-train/>', is_day_partitioned=False, bounds=TemporalBounds(start_date='2019-11-06', end_date='2019-12-16', _type='temporal'), timestamp_column_for_bounds='created', _type='simple_dataset_plan'), 'eval': SimpleDatasetPlan(path='<s3://stripe-data/ml-platform/examples/txn-fraud-train/>', is_day_partitioned=False, bounds=TemporalBounds(start_date='2019-12-16', end_date='2019-12-21', _type='temporal'), timestamp_column_for_bounds='created', _type='simple_dataset_plan'), 'test': SimpleDatasetPlan(path='<s3://stripe-data/ml-platform/examples/txn-fraud-train/>', is_day_partitioned=False, bounds=TemporalBounds(start_date='2019-12-21', end_date='2019-12-31', _type='temporal'), timestamp_column_for_bounds='created', _type='simple_dataset_plan')}), serialized_kwargs='{"boolean_features":["server_ipmaxmind__is_anonymous"],"categorical_features":["card_bin__bin"],"label":"has_fraud_dispute","numerical_features":["stripe_jstime_on_page","amount_in_usd"]}', _type='create_built_in_model_plan'), experiments=[('baseline', CreateBuiltInModelPlan(create_model_fn=FunctionReference(local_reference='src.python.flyte.ml_exploration.mlp_example.main/create_model/M1PRv3kOk3U4XiMjtCwB1Q9R/jy7dFZaWv+LRjkcxLY=', remote_reference=RemoteReference(project='ml-exploration', domain='adhoc', name='src.python.flyte.ml_exploration.mlp_example.main.create_model', version='upgrade-flytekit-14-7ce2dd26d4e3-zoz7')), model_details=BuiltInModelTypeDetails(model_type=<ModelType.XGBOOST: 'xgboost'>, serialized_kwargs='{"boolean_features":["server_ipmaxmind__is_anonymous"],"categorical_features":["card_bin__bin"],"label":"has_fraud_dispute","numerical_features":["stripe_jstime_on_page","amount_in_usd"]}', _type='built_in_model_type_details'), dataset_plans=DatasetPlanSet(plans={'train': SimpleDatasetPlan(path='<s3://stripe-data/ml-platform/examples/txn-fraud-train/>', is_day_partitioned=False, bounds=TemporalBounds(start_date='2019-11-06', end_date='2019-12-16', _type='temporal'), timestamp_column_for_bounds='created', _type='simple_dataset_plan'), 'eval': SimpleDatasetPlan(path='<s3://stripe-data/ml-platform/examples/txn-fraud-train/>', is_day_partitioned=False, bounds=TemporalBounds(start_date='2019-12-16', end_date='2019-12-21', _type='temporal'), timestamp_column_for_bounds='created', _type='simple_dataset_plan'), 'test': SimpleDatasetPlan(path='<s3://stripe-data/ml-platform/examples/txn-fraud-train/>', is_day_partitioned=False, bounds=TemporalBounds(start_date='2019-12-21', end_date='2019-12-31', _type='temporal'), timestamp_column_for_bounds='created', _type='simple_dataset_plan')}), serialized_kwargs='{"boolean_features":["server_ipmaxmind__is_anonymous"],"categorical_features":["card_bin__bin"],"label":"has_fraud_dispute","numerical_features":["stripe_jstime_on_page","amount_in_usd"]}', _type='create_built_in_model_plan'))], _type='experiment_model_plan'), n=1, random_seed_path=None, resources=TaskResources(mem='200Gi', cpu='30', gpu=None, ephemeral_storage='200Gi'), store_configs_path=None, _type='train_trainable_models_plan'), charts=[MultiComparisonChart(title='Multi Comparison Chart: AUROC', segment=Segment(dataset_plan_name='test', _type='segment'), eval_config=EvalConfig(unique_id_col='charge_id', label_col='has_fraud_dispute'), metric=ROCBasedMetric(fpr_start=0.0, fpr_end=1.0, num_buckets=1000, num_bootstraps=50), gating_config=None, raise_if_baseline_missing=False, grouping_columns=[], selector=None, explanation=None, weight_col=None, created_col='created', time_grouper=None, scores_col='score', output_curves=False, _type='multi_comparison_chart')], report_type=<ReportingType.WANDB: 'wandb'>, snapshot_date='', slack_message=None, _type='simple_report_plan') to <class 'src.python.flyte.ml_pipeline.config.SimpleReportPlan'> error: Field "create_model_fn" of type CreateModelProtocol in CreateBuiltInModelPlan is not serializable
error: Field "create_model_fn" of type CreateModelProtocol in CreateBuiltInModelPlan is not serializable
e
Untitled
^ Hey team, wanting to follow up on this thread as I'm seeing the same error on Flytekit 1.15 with Flyte backend components updated to the 1.15 tag versions. The key error seems to be union types again (
line 32
):
Copy code
Failed to convert type <class 'src.python.flyte.ml_pipeline.config.SimpleReportPlan'> to type typing.Union[src.python.flyte.ml_pipeline.config.SimpleReportPlan, src.python.flyte.ml_pipeline.config.UserDefinedReportPlan, src.python.flyte.ml_pipeline.config.SimpleModelReleasePlan, src.python.flyte.ml_pipeline.config.GetModelShaReleasePlan, src.python.flyte.ml_pipeline.config.MapReleasePlan, src.python.flyte.ml_pipeline.config.SelectModelThresholdsReleasePlan, src.python.flyte.ml_pipeline.config.SetThresholdsReleasePlan, src.python.flyte.ml_pipeline.config.AttachFeatureGeneratorsReleasePlan, src.python.flyte.ml_pipeline.config.ModelDeployReleasePlan, src.python.flyte.ml_pipeline.config.ActivateReleaseGatesReleasePlan].
Any suggestions for resolving this?
c
@freezing-airport-6809 just wanted to bump this issue following our chat. Would be very grateful someone from the team can help unblock @echoing-kilobyte-84070 🙏
f
is this problem still ongoing?
cc @high-accountant-32689 / @average-finland-92144 seems like both of you looked into this?
@crooked-lifeguard-46802 / @echoing-kilobyte-84070 would love a summary