Hi, everyone, I need some attention on the upcomin...
# contribute
d
Hi, everyone, I need some attention on the upcoming feature
JSON IDL
. I’d like to have an async discussion regarding the JSON IDL implementation. Currently, we’re using Solution 1, but I want to compare it with Solution 2. Solution 1 1. Encode: python val → JSON string → msgpack bytes 2. Decode: msgpack bytes → JSON string → python val Solution 2 1. Encode: python val → msgpack bytes 2. Decode: msgpack bytes → python val Comparison 1. Performance: Solution 2 is lighter and faster than Solution 1. 2. Speed: [PR comment](https://github.com/flyteorg/flyte/pull/5607#issuecomment-2329179934) 3. Size: [PR comment](https://github.com/flyteorg/flyte/pull/5607#issuecomment-2329375392) Other Considerations 1. Solution 1 is reliable and proven to avoid additional issues, whereas Solution 2 still needs to resolve certain problems. (See related [PR](https://github.com/flyteorg/flytekit/pull/2613)) There’s an open issue in the mashumaro repo regarding
Solution 2
, but it may take time as the author is currently on vacation. 2. Solution 2 doesn’t support Pydantic at the moment, so one workaround is to first convert the Pydantic
BaseModel
to a JSON string, and then to msgpack bytes (e.g., python val → JSON string → msgpack bytes). However, this introduces complexity, as both the Flyte backend and FlyteConsole would need to support two different methods of deserialization. Looking forward to your thoughts on this! Thank you all. cc @cool-lifeguard-49380 @damp-rain-31363
d
For size+speed comparisons, I'd generally think it'd be important to see how much of the overhead of double serialization is caused by the input size. If the objects are 2x the size (longer strings, more content, etc), is that significantly worse?
❤️ 1
d
Just updated it! It looks like msgpack will not encode json str bytes to a smaller size. https://github.com/flyteorg/flyte/pull/5607#issuecomment-2329375392
for the speed test, I am writing
The official msgpack documentation says: "MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it's faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves." I also found that if we use
msgpack
to serialize strings, we can't get any benefit from it.
👍 2
msgpack works now. in summary here's new implementation. 1. python val -> msgpack bytes 2. python val -> dict -> msgpack bytes 1 is for most cases. 2 is for pydantic BaseModel and case like discrminated class.
🎉 2
This can ensure the backend and frontend use the same way to handle JSON IDL with the same way.
I haven't updated the RFC.
but this way can at least work
Hi, folks, after discussions with Yee and Eduardo, here is the new question about the JSON IDL. In summary, there are 2 ways to support JSON IDL,
Create a new IDL type called JSON
or re-use
Binary IDL
. here is the full description. I would like to know how you think, thank you all. https://github.com/flyteorg/flyte/pull/5607#issuecomment-2333174325
Update: The discussion around JSON IDL is ongoing regarding whether we should provide a better interface. I believe we are getting closer to an answer now. https://github.com/flyteorg/flyte/pull/5607#issuecomment-2339562855