gorgeous-waitress-5026
02/21/2024, 8:14 PMAn entity tag (ETag) that represents a specific version of an object. For objects that are not uploaded as a multipart upload and are either unencrypted or encrypted by server-side encryption with Amazon S3 managed keys (SSE-S3), the ETag is an MD5 digest of the data.
In other words, a client generated MD5 != ETag when using SSE-KMS. So basic checks around "is this the same file content" will never work properly. I wouldn't be surprised if Azure blob has the same issue with encryption enabled.
Anyone have thoughts about this? Does anyone know off the top of their head what sort of problems we're going to run into without ETags always being MD5?
It seems like a custom header controlled by clients should be used instead like x-flyte-checksum-md5
. Looking at API docs -- S3, Azure Blob Storage, GCS and Minio all support custom metadata (though some may require specific header prefixes, so x-flyte
might not work). So I'm wondering if there's a path to addressing this problem with client controlled checksums? Maintaining backwards compat with this approach seems tricky...freezing-airport-6809
thankful-minister-83577
thankful-minister-83577
gorgeous-waitress-5026
02/25/2024, 3:13 AMglamorous-carpet-83516
02/26/2024, 7:11 PM