calm-zoo-68637
04/16/2024, 4:11 PMfreezing-airport-6809
icy-tent-98067
04/16/2024, 11:17 PM@dataclass
and DataClassJSONMixin
a lot. I find it extremely helpful to do this routinely. This pattern allows you to pass extremely complex objects around. Additionally, they improve the readability of your code by quite a bit, and enforce Flyte's excellent type checking functionality.
Alternatively, when I want to apply even more strict requirements upon my objects, I use @pydantic.dataclasses.dataclass
and DataClassJSONMixin
icy-tent-98067
04/16/2024, 11:39 PMfrom dataclasses import dataclass
from mashumaro.mixins.json import DataClassJSONMixin
@dataclass
class MyOutput(DataClassJSONMixin):
_value: float
_is_int: bool
@classmethod
def from_value(cls, value: float|int) -> "MyOutput":
assert isinstance(value, (float, int)), \
f"value must be of type int or float, not {type(value)}"
if isinstance(value, int):
return cls(_is_int =True, _value=float(value))
else:
return cls(_is_int =False, _value=value)
@property
def value(self) -> float|int:
if self._is_int:
return int(self._value)
else:
return float(self._value)
def __repr__(self):
_type = "int" if self._is_int else "float"
return f"{self.__class__.__name__}({self.value}: {_type})"
my_output_instance = MyOutput.from_value(3.1232)
serialized = my_output_instance.to_json()
deserialized = MyOutput.from_json(serialized)
print(deserialized) # MyOutput(3.1232: float)
my_output_instance = MyOutput.from_value(33)
serialized = my_output_instance.to_json()
deserialized = MyOutput.from_json(serialized)
print(deserialized) # MyOutput(33: int)
I do frequently use such `@classmethod`s with my data classes to automate complex initiation logic such that I am able to cleanly use the data classes within my tasks.calm-zoo-68637
04/16/2024, 11:50 PMicy-tent-98067
04/16/2024, 11:55 PM@property
and initiating @classmethod
)icy-tent-98067
04/16/2024, 11:59 PMfrom flytekit import workflow, task
@task
def a() -> MyOutput:
return MyOutput.from_value(32)
@task
def b(arg: MyOutput):
print(arg) # MyOutput(32: int)
@workflow
def test():
result = a()
b(arg=result)
test()
Seems to work as expected.calm-zoo-68637
04/17/2024, 2:17 AM_is_int
parameter every time they initialize a value. So where previously they could have done: Foo(_value=5)
they have to do Foo(_value=5, _is_int=true)
calm-zoo-68637
04/17/2024, 2:18 AMicy-tent-98067
04/17/2024, 2:39 AMMyOutput
using the class method (MyOutput.from_value
) which will automatically handle all of that.
For example:
@task
def make_an_integer() -> MyOutput:
return MyOutput.from_value(32)
@task
def make_a_float() -> MyOutput:
return MyOutput.from_value(332.23123)
calm-zoo-68637
04/17/2024, 3:44 AMicy-tent-98067
04/17/2024, 3:49 AMcalm-zoo-68637
04/17/2024, 4:07 AMfreezing-airport-6809
freezing-airport-6809
calm-zoo-68637
04/17/2024, 4:20 AMfrom flytekit import task, workflow, dynamic
from mashumaro.mixins.json import DataClassJSONMixin
from dataclasses import dataclass
@dataclass
class Foo(DataClassJSONMixin):
x: float | int
@task
def print_foo(*, foo: Foo) -> Foo:
print(foo)
print(f"x={type(foo.x)}")
return foo
@dynamic
def _print_foo_dynamic(*, foo: Foo) -> Foo:
return print_foo(foo=foo)
@workflow
def print_foo_wf(*, foo: Foo) -> Foo:
return _print_foo_dynamic(foo=foo)
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
calm-zoo-68637
04/17/2024, 4:23 AMfreezing-airport-6809
calm-zoo-68637
04/17/2024, 4:23 AM>>> from mashumaro.mixins.json import DataClassJSONMixin
>>> from dataclasses import dataclass
>>> from typing import Any
>>>
>>> @dataclass
... class Foo(DataClassJSONMixin):
... x: float | int
...
>>> test = Foo(5)
>>> test1 = Foo.from_json(test.to_json())
>>> test1
Foo(x=5)
>>> type(test1.x)
<class 'int'>
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
calm-zoo-68637
04/17/2024, 4:26 AMcalm-zoo-68637
04/17/2024, 5:19 AMjsonschema has no support for unionWhat makes you say this? When I try:
from mashumaro.jsonschema import build_json_schema
I get:
build_json_schema(Foo).to_json()
'{"type": "object", "title": "Foo", "properties": {"x": {"anyOf": [{"type": "number"}, {"type": "integer"}]}}, "additionalProperties": false, "required": ["x"]}'
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
dict
i do not think both of these are relatedfreezing-airport-6809
def print_type():
from flytekit.core.type_engine import TypeEngine
t = TypeEngine.to_literal_type(Foo)
print(t)
Output
<FlyteLiteral simple: STRUCT metadata { fields { key: "type" value { string_value: "object" } } fields { key: "title" value { string_value: "Foo" } } fields { key: "required" value { list_value { values { string_value: "x" } } } } fields { key: "properties" value { struct_value { fields { key: "x" value { struct_value { fields { key: "anyOf" value { list_value { values { struct_value { fields { key: "type" value { string_value: "number" } } } } values { struct_value { fields { key: "type" value { string_value: "integer" } } } } } } } } } } } } } fields { key: "additionalProperties" value { bool_value: false } } } structure { dataclass_type { key: "x" value { union_type { variants { simple: FLOAT structure { tag: "float" } } variants { simple: INTEGER structure { tag: "int" } } } } } }>
It seems it knows anyof int | float
freezing-airport-6809
freezing-airport-6809
jsonschema
auto-formfreezing-airport-6809
Union
that is the problem
>>> print(Foo.from_json('{"x": 1.0}'))
Foo(x=1.0)
freezing-airport-6809
freezing-airport-6809
@dataclass
class Foo(DataClassJSONMixin):
x: int
works finefreezing-airport-6809
calm-zoo-68637
04/17/2024, 1:10 PMfrom_dict
and/or from_json
methods to persist additional metadata about whether the number is an integer, along the lines of what Grantham was saying. Do you know which methods Flyte will actually call when rehydrating a DataclassJSONMixin object?freezing-airport-6809
freezing-airport-6809
calm-zoo-68637
04/17/2024, 1:26 PMto_json
method gets called here but I don't see any invocations of from_json
- how do user-provided JSON dataclasses get deserialized?calm-zoo-68637
04/17/2024, 1:26 PMfreezing-airport-6809
calm-zoo-68637
04/17/2024, 1:42 PMFlyteRemote
?freezing-airport-6809
freezing-airport-6809
calm-zoo-68637
04/17/2024, 1:44 PMfreezing-airport-6809
calm-zoo-68637
04/17/2024, 1:59 PMfrom flytekit import task, workflow
from dataclasses import dataclass, field
from mashumaro.mixins.json import DataClassJSONMixin
import json
@dataclass
class Foo(DataClassJSONMixin):
x: float
_is_int: bool = field(init=False)
def __post_init__(self):
if not hasattr(self, "_is_int"):
self._is_int = isinstance(self.x, int)
@classmethod
def from_json(cls, input: str) -> "Foo":
input = json.loads(input)
print("from_json was called", flush=True)
new_instance = cls(input["x"])
# Persist metadata about integers if already present
if "_is_int" in input:
new_instance._is_int = input["_is_int"]
return new_instance
@property
def typed_x(self) -> float | int:
if self._is_int:
return int(self.x)
else:
return float(self.x)
And then start the task with `FlyteRemote`:
flyte.execute_remote_wf(execution_name="foo", entity=flyte_task, inputs={"foo": foo})
I get:
│ │
│ /remote.py:1055 in _execute │
│ │
│ ❱ 1055 │ │ │ │ │ lit = TypeEngine.to_literal(ctx, v, hint, variable.type) │
│ │
│ /type_engine.py:1059 in to_literal │
│ │
│ ❱ 1059 │ │ │ transformer.assert_type(python_type, python_val) │
│ │
│ /type_engine.py:354 in assert_type │
│ │
│ ❱ 354 │ │ │ expected_type = expected_fields_dict[f.name] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: '_is_int'
any idea why?freezing-airport-6809
calm-zoo-68637
04/17/2024, 2:09 PMfrom flytekit import task, workflow
from dataclasses import dataclass, field
from mashumaro.mixins.json import DataClassJSONMixin
import json
@dataclass
class Foo(DataClassJSONMixin):
x: float
is_int: bool = field(init=False)
def __post_init__(self):
if not hasattr(self, "is_int"):
self.is_int = isinstance(self.x, int)
@classmethod
def from_json(cls, input: str) -> "Foo":
input = json.loads(input)
print("from_json was called", flush=True)
new_instance = cls(input["x"])
# Persist metadata about integers if already present
if "is_int" in input:
new_instance.is_int = input["is_int"]
return new_instance
@property
def typed_x(self) -> float | int:
if self.is_int:
return int(self.x)
else:
return float(self.x)
still fails