acoustic-carpenter-78188
02/03/2023, 2:10 AMPS:
replicas: 1
template:
spec:
containers:
resources:
limits:
cpu: "1"
Worker:
replicas: 2
template:
spec:
containers:
resources:
limits:
<http://nvidia.com/gpu|nvidia.com/gpu>: 1
memory: "10G"
However, this is not currently supported in Flyte.
Goal: What should the final outcome look like, ideally?
Users should be able to override the resources specified in task definition by providing extra resources configs in task config in TfJob.
@task(
task_config=TfJob(
worker: {num=2, requests=Resources(cpu="1", mem="2Gi"), limits=Resources(cpu="1", mem="4Gi")},
ps: {num=1, requests=Resources(cpu="1", mem="2Gi"), limits=Resources(cpu="1", mem="4Gi"),
)
cache=True,
cache_version="1.0",
)
def mnist_tensorflow_job(hyperparameters: Hyperparameters) -> training_outputs:
Describe alternatives you've considered
We can make the resources field in task function to accept a new type TFJobResources, and implement different handling for related backend plugins. However, this requires lots of code changes and undermine consistencies of task definitions.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyte