faint-smartphone-23356
05/10/2023, 2:45 PMrequests.memory=2Gi, limits.memory=2Gi
under flytekit 1.4.2 fail under 1.5.0
We bumped these tasks to have requests.memory=64Gi, limits.memory=64Gi
and they succeed under 1.5.0.
Here are two graphs that illustrate the difference in RAM usage. The k8s request differences are listed on the graphs. Everything else (inputs, etc) are the same. The only differences are one flytekit 1.4.2 vs 1.5.0.
What changed?faint-smartphone-23356
05/10/2023, 2:47 PMfaint-smartphone-23356
05/10/2023, 2:48 PMfaint-smartphone-23356
05/10/2023, 2:57 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
faint-smartphone-23356
05/10/2023, 3:04 PMfaint-smartphone-23356
05/10/2023, 3:04 PMfreezing-airport-6809
faint-smartphone-23356
05/10/2023, 3:06 PMmicroscopic-furniture-57275
05/10/2023, 3:07 PMdownload
the data to the compute node. But we aren't even getting to the downstream task. The setting of the folder on the FlyteDirectory appears to be causing something, presumably the large amount of data, to get loaded into memory.freezing-airport-6809
freezing-airport-6809
microscopic-furniture-57275
05/10/2023, 3:09 PMmicroscopic-furniture-57275
05/10/2023, 7:58 PMmicroscopic-furniture-57275
05/10/2023, 8:00 PMmicroscopic-furniture-57275
05/10/2023, 8:40 PMmicroscopic-furniture-57275
05/10/2023, 8:43 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
05/10/2023, 9:38 PM>>> import fsspec
>>> fsspec.__version__
'2023.5.0'
I'm not sure how to get the version of s3fs
, and I don't know if we are using flytekit-data-fsspec
- I don't see this in our dependencies.faint-smartphone-23356
05/10/2023, 9:39 PMroot@078cd129a8a3:/app# pip list | grep -E '(fsspec|flytekit|s3fs)'
flytekit 1.4.2
flytekitplugins-pod 1.4.2
fsspec 2023.5.0
faint-smartphone-23356
05/10/2023, 9:40 PMthankful-minister-83577
thankful-minister-83577
faint-smartphone-23356
05/10/2023, 9:40 PMroot@078cd129a8a3:/app# pip show s3fs
WARNING: Package(s) not found: s3fs
thankful-minister-83577
which aws
? (in the container)microscopic-furniture-57275
05/10/2023, 9:41 PMroot@e28c57ef87a7:/app# pip list | grep -E '(fsspec|flytekit|s3fs)'
flytekit 1.5.0
flytekitplugins-pod 1.5.0
fsspec 2023.5.0
s3fs 2023.5.0
faint-smartphone-23356
05/10/2023, 9:42 PMroot@078cd129a8a3:/app# which aws
/app/venv/bin/aws
root@078cd129a8a3:/app# aws --version
aws-cli/1.27.132 Python/3.9.16 Linux/5.10.178-162.673.amzn2.x86_64 botocore/1.29.132
thankful-minister-83577
flytekit-data-fsspec
plugin, you would default to the aws cli.thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
05/10/2023, 9:45 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
import time
import subprocess
from flytekit import task, workflow, Resources
from flytekit.types.directory import FlyteDirectory
@task(requests=Resources(mem="1Gi"), limits=Resources(mem="1Gi"))
def waiter_task(a: int) -> str:
if a == 0:
time.sleep(86400)
else:
time.sleep(a)
return "hello world"
@task(requests=Resources(mem="1Gi"), limits=Resources(mem="1Gi"))
def dd_and_upload() -> FlyteDirectory:
command = ["dd", "if=/dev/random", "of=/root/temp_10GB_file", "bs=1", "count=0", "seek=10G"]
subprocess.run(command)
return FlyteDirectory("/root/temp_10GB_file")
@workflow
def waiter(a: int = 0) -> str:
return waiter_task(a=a)
@workflow
def uploader() -> FlyteDirectory:
return dd_and_upload()
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
05/10/2023, 10:28 PMthankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
05/10/2023, 10:42 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
05/10/2023, 10:44 PMfaint-smartphone-23356
05/11/2023, 12:06 AMmicroscopic-furniture-57275
05/11/2023, 2:16 AMfreezing-airport-6809
microscopic-furniture-57275
05/11/2023, 2:49 PMmicroscopic-furniture-57275
07/10/2023, 7:42 PMthankful-minister-83577
microscopic-furniture-57275
07/11/2023, 5:45 PMmicroscopic-furniture-57275
07/11/2023, 5:49 PMc = Config( folder='', size=1, mem=1, many_files=True )
When folder is blank, it just creates a 'test' folder under current_context().working_directory
.
The other params mean we'll write/upload 1GB worth of files, the POD will request 1G memory, and we'll write many smaller files instead of one large one.thankful-minister-83577
thankful-minister-83577
import fsspec
target_bucket = "<s3://my-bucket/yt/memtest1>"
container_dir = "/tmp/flyte-ox9aa6ku/sandbox/local_flytekit/e69fd8f684d1e5f02eadd7f427aeb2d8/test"
fs = fsspec.filesystem("s3")
fs.put(container_dir, target_bucket, recursive=True)
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
07/12/2023, 1:40 AMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
07/12/2023, 6:08 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
07/12/2023, 6:10 PMmicroscopic-furniture-57275
07/12/2023, 6:11 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
07/12/2023, 6:13 PMthankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
07/12/2023, 6:16 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
import fsspec.config
fsspec.config.conf["gather_batch_size"] = 100
thankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
07/14/2023, 2:58 PMmicroscopic-furniture-57275
07/14/2023, 3:00 PMthankful-minister-83577
thankful-minister-83577
microscopic-furniture-57275
10/20/2023, 7:24 PMmicroscopic-furniture-57275
10/20/2023, 7:29 PMglamorous-carpet-83516
10/20/2023, 7:47 PMglamorous-carpet-83516
10/20/2023, 7:47 PMmicroscopic-furniture-57275
10/20/2023, 11:47 PMimport fsspec.config
fsspec.config.conf["gather_batch_size"] = 100 # or whatever, we keep reducing it!
2. Re: "s3fs will try to read all the files in the directory into memory by default" -- I still don't understand (and haven't studied the code) the need to load all files to memory, even in small batch sizes -- this is an odd pattern for a file copy, isn't it? What if you had huge files that exceed memory size? This is what has been so confusing about this issue all along -- the large memory requirement for just copying files.freezing-airport-6809
freezing-airport-6809