happy-bird-19790
01/31/2022, 7:43 PMmap_task(move_and_split_txns, concurrency=1)(
move_and_split_txns_config_file=config_files
).with_overrides(retries=5)
config_files
is a list with 3 FlyteFiles returned from a dependent task
move_and_split_txns
is a task that reads the FlyteFile and does the heavy lifting
The nodes I was using didn't have enough memory and the move_and_split_txns
tasks all got oomKilled by kubernetes. The map_task
says it completed successfully with no reties
I would expect the map_task to at least fail. What am I doing wrong?freezing-airport-6809
hallowed-mouse-14616
01/31/2022, 8:01 PMfreezing-airport-6809
hallowed-mouse-14616
01/31/2022, 8:12 PMhappy-bird-19790
01/31/2022, 8:15 PM{
"config": {},
"id": {
"resourceType": 1,
"project": "flytesnacks",
"domain": "development",
"name": "flyte_workflows.xxxxx.xxxxx_bulk_load.mapper_move_and_split_txns_3",
"version": "v41"
},
"type": "container_array",
"metadata": {
"discoverable": true,
"runtime": {
"type": 1,
"version": "0.26.1",
"flavor": "python"
},
"retries": {},
"discoveryVersion": "2"
},
"interface": {
"inputs": {
"variables": {
"move_and_split_txns_config_file": {
"type": {
"collectionType": {
"blob": {}
}
},
"description": "move_and_split_txns_config_file"
}
}
},
"outputs": {
"variables": {
"o0": {
"type": {
"collectionType": {
"simple": 1
}
},
"description": "o0"
}
}
}
},
"custom": {
"fields": {
"parallelism": {
"stringValue": "1"
}
}
},
"taskTypeVersion": 1,
"container": {
"command": [],
"args": [
"pyflyte-map-execute",
"--inputs",
"{{.input}}",
"--output-prefix",
"{{.outputPrefix}}",
"--raw-output-data-prefix",
"{{.rawOutputDataPrefix}}",
"--resolver",
"flytekit.core.python_auto_container.default_task_resolver",
"--",
"task-module",
"flyte_workflows.xxxxx.xxxxx_bulk_load",
"task-name",
"move_and_split_txns"
],
"env": [
{
"key": "FLYTE_INTERNAL_IMAGE",
"value": "<http://518673686532.dkr.ecr.us-west-2.amazonaws.com/flyte/test:v41|518673686532.dkr.ecr.us-west-2.amazonaws.com/flyte/test:v41>"
}
],
"config": [],
"ports": [],
"image": "<http://518673686532.dkr.ecr.us-west-2.amazonaws.com/flyte/test:v41|518673686532.dkr.ecr.us-west-2.amazonaws.com/flyte/test:v41>",
"resources": {
"requests": [],
"limits": []
}
}
}
hallowed-mouse-14616
01/31/2022, 9:07 PM"custom": {
"fields": {
"parallelism": {
"stringValue": "1"
}
}
},
flyteplugins attempts to read the array job taskTemplate custom from the raw protobuf. If it doesn't exist we use default values (ex. 1.0 for min_success_ratio). If it does exist, we read it. Since the custom exists in this case, we use it. However, the min_success_ratio is not defined and by protobuf standards defaults to 0.0. Therefore, when we set the min successes for the map task it uses 0.0 ration. the map task executes and waits for everything to complete and succeeds with 0 minSuccesses .hallowed-mouse-14616
01/31/2022, 9:09 PMhappy-bird-19790
01/31/2022, 9:13 PMhallowed-mouse-14616
01/31/2022, 9:14 PMhallowed-mouse-14616
01/31/2022, 9:15 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
0.0
means 1.0
?high-accountant-32689
01/31/2022, 9:34 PMThanks for looking into this. As a work around can I set min_success_ratio = 1 when I set a concurrency?yes, this should unblock you. IMO we should default to
1.0
in flytekit and think about the 0.0
case separately. This won't solve the problem for other languages obviously, but that's a separate problem.hallowed-mouse-14616
01/31/2022, 9:44 PM1.0
as the flytekit default. i can't think of a scenario where somebody would want to set the min_success_ratio to 0.0
, but in case it would still work as expected.high-accountant-32689
02/01/2022, 12:49 AMconcurrency
from the invocation of map_task
.
Parallelism is not implemented in the back-end yet (it's a feature in the next release).
We'll be fixing this bug in the coming flyte release.happy-bird-19790
02/01/2022, 3:03 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
happy-bird-19790
02/01/2022, 3:25 PMhigh-park-82026
plugins:
k8s-array:
resourceConfig:
primaryLabel: k8sArray
limit: 300
maxArrayJobSize: 10000
This means there can be at most 300 subtasks running the system and any array task can have at most 10K subtasks.freezing-airport-6809