Fabio Grätz
02/05/2024, 12:45 PMDavid Espejo (he/him)
02/09/2024, 9:17 PMByron Hsu
02/09/2024, 10:30 PMFabio Grätz
02/09/2024, 11:45 PML godlike
02/10/2024, 10:09 AMFabio Grätz
02/14/2024, 5:11 PMBernhard Stadlbauer
02/14/2024, 9:08 PMuser
02/14/2024, 10:00 PMHampus Rosvall
02/15/2024, 4:51 PMArvind Singharpuria
02/19/2024, 12:37 PMDavid Espejo (he/him)
02/21/2024, 5:25 PMEthan Brown
02/21/2024, 8:14 PMAn entity tag (ETag) that represents a specific version of an object. For objects that are not uploaded as a multipart upload and are either unencrypted or encrypted by server-side encryption with Amazon S3 managed keys (SSE-S3), the ETag is an MD5 digest of the data.
In other words, a client generated MD5 != ETag when using SSE-KMS. So basic checks around "is this the same file content" will never work properly. I wouldn't be surprised if Azure blob has the same issue with encryption enabled.
Anyone have thoughts about this? Does anyone know off the top of their head what sort of problems we're going to run into without ETags always being MD5?
It seems like a custom header controlled by clients should be used instead like x-flyte-checksum-md5
. Looking at API docs -- S3, Azure Blob Storage, GCS and Minio all support custom metadata (though some may require specific header prefixes, so x-flyte
might not work). So I'm wondering if there's a path to addressing this problem with client controlled checksums? Maintaining backwards compat with this approach seems tricky...Fabio Grätz
02/23/2024, 5:43 PMRafael Raposo
02/26/2024, 2:09 PMDavid Espejo (he/him)
02/28/2024, 4:13 PMKetan (kumare3)
Ketan (kumare3)
Daniel Farrell
03/11/2024, 8:27 PMFabio Grätz
03/13/2024, 8:21 AMDavid Espejo (he/him)
03/13/2024, 6:04 PMflyte
and flytekit
teams.
Thanks for your contributions. Let's keep building together!Fabio Grätz
03/14/2024, 4:10 PMNikki Everett
03/14/2024, 4:53 PMDavid Espejo (he/him)
03/14/2024, 5:56 PML godlike
03/21/2024, 9:20 AM@dataclass
decorator to serialize and deserialize dataclass in flytekit.
(Which means that we don't need to inherit DataClassJSONMixin
or add @dataclasses_json
decorator anymore.)
I am a little bit worrying about will this PR cause potential issue after it's merged.
Can anyone take a look or make some tests on this PR?
Thank you!
https://github.com/flyteorg/flytekit/pull/2279Ethan Brown
03/22/2024, 4:43 PMEthan Brown
03/22/2024, 8:28 PMRob Ulbrich
03/26/2024, 2:47 PMAustin Liu (Austinnn)
03/27/2024, 8:50 AMnested_types
to work in StructuredDataset
dataframe
, both locally and remotely, even via Google Storage or BigQuery. This improvement was motivated by an issue raised by folks at Spotify. cc @Dylan Wilder, @Govind Raghu and @Kevin Su at Union.ai
I've elaborated on many useful cases in the PR. Please take a look, and kindly provide any unexpected behaviors if found. Thanks a lot.Austin Liu (Austinnn)
03/27/2024, 9:01 AMasyncio
. This initiative proved to be extremely beneficial, especially in scenarios with high latency network conditions.
Feel free to give a try and provide any feedback you may have. Your input is greatly appreciated.David Espejo (he/him)
03/27/2024, 11:35 AM