Hi,
I have a ContianerTask as shown below
my-task = ContainerTask(
metadata=TaskMetadata(cache=True, cache_version="1.0"),
name="my-task",
image="my-image",
input_data_dir="/var/inputs",
output_data_dir="/var/outputs",
inputs=kwtypes(inDir=str),
outputs=kwtypes(out=str),
requests=Resources(gpu="1"),
limits=Resources(gpu="1"),
command=[
"/bin/bash",
],
arguments=[
"-c",
"echo \"out\" > /var/outputs/out; ... other commands"
],
....
)
I wanted to cache the task, for which I found that I had to put inputs/outputs even though I don’t need them. So, I just a string “out” in
/var/outputs/out
as shown in the
arguments
and put a string in the
inDir
as below while calling the task.
@workflow
def aeb_sanity_workflow(data: Dict):
## -----------------------------------------------------------------------------
.......
my_task_promise = my-task(inDir="some string")
........
This was working for me with earlier version of Flyte mentioned below
<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>
However, I use the master version of flyte patched with
https://github.com/flyteorg/flyte/pull/3256 and manually built in
docker/sandbox-bundled
using
make build-gpu
because I needed gpu support in sandbox.
I’m seeing that
with this latest version, I saw two issues which were not there with <http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>
tag: v1.8.1
1. For the above mentioned ContainerTask, It’s throwing errors saying output doesn’t exist after workflow execution. I haven’t changed a single line of code in in the
my-task
except the latest flyte image.
2. Also, for the task that needs GPU, since, the image size is huge ~24 GB, k8 node came under disk pressure, and severals pods were evicted.
> kubectl describe pod <gpu-pod>
Warning Evicted 8m10s (x3 over 9m30s) kubelet The node was low on resource: ephemeral-storage.
Warning ExceededGracePeriod 8m (x3 over 9m20s) kubelet Container runtime did not kill the pod within specified grace period.
Normal Pulled 7m59s kubelet Successfully pulled image "my-gpu-image" in 8m40.26240502s
Normal Created 7m59s kubelet Created container primary
Normal Started 7m58s kubelet Started container primary
Normal Killing 7m58s kubelet Stopping container primary
Warning Evicted 7m30s kubelet The node was low on resource: ephemeral-storage. Container primary was using 13516Ki, which exceeds its request of 0.
> kubectl describe nodes <>
Warning FreeDiskSpaceFailed 52m kubelet failed to garbage collect required amount of images. Wanted to free 110758122291 bytes, but freed 155692522 bytes
Warning ImageGCFailed 52m kubelet failed to garbage collect required amount of images. Wanted to free 110758122291 bytes, but freed 155692522 bytes
Warning FreeDiskSpaceFailed 47m kubelet failed to garbage collect required amount of images. Wanted to free 111138763571 bytes, but freed 0 bytes
Warning ImageGCFailed 47m kubelet failed to garbage collect required amount of images. Wanted to free 111138763571 bytes, but freed 0 bytes
Warning EvictionThresholdMet 7m56s (x3 over 11m) kubelet Attempting to reclaim ephemeral-storage
Normal NodeNotReady 7m49s node-controller Node 1fefe346c083 status is now: NodeNotReady
Normal NodeHasSufficientMemory 7m47s (x3 over 57m) kubelet Node 1fefe346c083 status is now: NodeHasSufficientMemory
Normal NodeHasDiskPressure 7m47s (x2 over 11m) kubelet Node 1fefe346c083 status is now: NodeHasDiskPressure
I didn’t observe these issues in this image
<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>
tag: v1.8.1