cuddly-jelly-27016
03/06/2025, 8:31 PMpickle
if the data type doesn't have a registered TypeTransformer
. This format is known to be insecure as it allows remote code execution at the deserialization phase.
If FlyteFile supports a metadata field, we could add a hash to it as an additional control to prevent pickling attacks or other forms of data-at-rest corruption.
It would help us even more to position Flyte as the right system to build a robust and secure ML supply chain.
### Goal: What should the final outcome look like, ideally?
If this would be available, we could do something like:
def calculate_file_hash(file_path: str) -> str:
"""Calculate the SHA256 hash of a file."""
with open(file_path, "rb") as f:
sha256_hash = hashlib.sha256(f.read())
return sha256_hash.hexdigest()
@task
def process_file(file_path: str) -> FlyteFile:
# Calculate the hash of the file
file_hash = calculate_file_hash(file_path)
# Create a FlyteFile with hash as metadata
flyte_file = FlyteFile(path=file_path, metadata={"hash": file_hash})
return flyte_file
### Describe alternatives you've considered
• Create and register a Custom Type like ExtendedFlyteFile
• Encode models into a custom data class with a method that calculates and validates hash
### Propose: Link/Inline OR Additional context
No response
### Are you sure this issue hasn't been raised already?
• Yes
### Have you read the Code of Conduct?
• Yes
flyteorg/flytecuddly-jelly-27016
03/06/2025, 8:31 PM