white-teacher-47376
06/07/2024, 6:15 AMOriginal exception: [Errno 5] An error occurred () when calling the PutObject operation: , cause=[Errno 5] An error occurred () when calling the PutObject operation:
The error seems to be coming from fsspec/s3fs, deep inside botocore, a 408 Request Time-out could be observed, although, I neither suspect botocore, nor our infrastructure to be the cause of this problem, because the error could not be reproduced with aws cli. Did anyone else encounter such issues using FlyteDirectories?
I was able to reproduce this error using the following code. The S3 directory contained roughly 300 files of about 300kB size each.
import multiprocessing
import os
import tempfile
from s3fs import S3FileSystem
s3_endpoint = os.environ.get("S3_ENDPOINT") or os.environ.get("FSSPEC_S3_ENDPOINT_URL")
s3_access_key = os.environ.get("AWS_ACCESS_KEY_ID") or os.environ.get("FSSPEC_S3_KEY")
s3_secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY") or os.environ.get("FSSPEC_S3_SECRET")
src = "<s3://path/to/folder>"
def download_folder(src, dst):
fs = S3FileSystem(
key=s3_access_key,
secret=s3_secret_key,
client_kwargs={"endpoint_url": s3_endpoint},
)
try:
fs.get(src, dst, recursive=True)
except Exception as exc:
print(str(exc))
temp_dir = tempfile.mkdtemp()
processes = []
for i in range(500):
dst = os.path.join(temp_dir, str(i))
process = multiprocessing.Process(target=download_folder, args=(src, dst))
processes.append(process)
process.start()
for process in processes:
process.join()
freezing-airport-6809
freezing-airport-6809
white-teacher-47376
06/10/2024, 11:25 AMconfig_kwargs={
"connect_timeout": 86400,
"read_timeout": 86400,
"retries": {
"total_max_attempts": 100,
"max_attempts": 100,
"mode": "standard",
},
"tcp_keepalive": True,
}
Unfortunately, this doesn't fix the problem. This is the message I am extracting from somewhere inside botocore, it looks like botocore doesn't even retry (RetryAttempts: 0), even though, I can confirm from another log message, that the config parameters have been set properly.
{'Error': {'Message': '', 'Code': ''}, 'body': {'h1': '408 Request Time-out'}, 'ResponseMetadata': {'HTTPStatusCode': 408, 'HTTPHeaders': {'content-length': '110', 'cache-control': 'no-cache', 'content-type': 'text/html', 'connection': 'close'}, 'RetryAttempts': 0}}
freezing-airport-6809
white-teacher-47376
06/10/2024, 2:35 PM