Hello everyone! Has anyone ever seen an issue wher...
# flyte-support
w
Hello everyone! Has anyone ever seen an issue where files in a
FlyteDirectory
are only partially uploaded to the blob store? For me the files max out at 500Mb, but I can't find any information about how this might be configured. In this particular case it means the parquet files can't be read because they're missing their magic end bytes 😢
h
The task report success I presume? This is so odd.. Can you double check that the local files are correct? You can either log that or use @vscode to open a vscode in the browser from the running pod to observe
w
Thanks Haytham - I wondered the same thing and ran some logs. Here's some example output:
Copy code
{
    "time": "2024-10-28T12:07:47.163260856Z",
    "stream": "stdout",
    "_p": "F",
    "log": "[1] \"file size: 472.85mb\""
}

{
    "time": "2024-10-28T12:08:35.32849339Z",
    "stream": "stdout",
    "_p": "F",
    "log": "[1] \"file size: 540.53mb\""
}

{
    "time": "2024-10-28T12:09:19.831678048Z",
    "stream": "stdout",
    "_p": "F",
    "log": "[1] \"file size: 430.79mb\""
}
and so on
h
Can you turn on higher log level for flytekit? Launch with this env bar: FLYTE_SDK_LOGGING_LEVEL=10
It should output more verbose logging for the upload operation
w
👍 will do, will get back when it's done
There doesn't look like anything of particular interest:
Copy code
Execute user level code. [Time: 318.131370s]
Adding trailing sep to
Upload data to s3://[...]/data/lz/a48k7lsvrmxrxljrgx2t-f1fbp2xy-0/370af024b0d41214706087f0c0490e1a. [Time: 4.046042s]
Translate the output to literals. [Time: 4.209805s]
Adding trailing sep to
Upload data to s3://[...]/metadata/propeller/flytesnacks-development-a48k7lsvrmxrxljrgx2t/[...]/data/0. [Time: 0.041005s]
Engine folder written successfully to the output prefix s3://[...]/metadata/propeller/flytesnacks-development-a48k7lsvrmxrxljrgx2t/[...]/data/0
Finished _dispatch_execute
🤔 after a bit of experimenting I think this is not a problem with Flyte, as it seems to happen if I upload the files with
s3fs
directly. It probably has implications for using FlyteDirectory/FlyteFile objects though.
f
Should not, this would break flytes promise. Default it uses s3fs too, much to our dislike
h
@wide-soccer-37846, can you confirm which version of
s3fs
and
fsspec
you're using?
w
Both log version
2024.10.0
when running in the task
I've worked around this for now by explicitly uploading to a random directory in the blob store using
boto3
and using that path in the FlyteDirectory, but I'm curious if anyone can reproduce this - I wasn't able to figure out the root cause (perhaps to do with multi-part chunk size inconsistencies?)
t