creamy-shampoo-53278
12/30/2024, 2:35 PMPythonFunctionTask
) has been implemented. It supports the following three core methods:
1. `create`: Use srun
to run a Slurm job which executes Flyte entrypoints, pyflyte-fast-execute
and pyflyte-execute
2. `get`: Use scontrol show job <job_id>
to monitor the Slurm job state
3. `delete`: Use scancel <job_id>
to cancel the Slurm job (this method is still under test)
We setup an environment to test it locally without running the agent gRPC server. The setup is divided into three components: a client (localhost), a remote tiny Slurm cluster, and an Amazon S3 bucket that facilitates communication between the two. The attached figure below illustrates the interaction between the client and the remote Slurm cluster.creamy-shampoo-53278
12/30/2024, 2:38 PMcreamy-shampoo-53278
12/30/2024, 2:43 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
creamy-shampoo-53278
12/30/2024, 3:50 PMcreamy-shampoo-53278
12/30/2024, 3:56 PMsbatch
combined with the user-defined script (e.g., setting env vars, loading modules, nested srun, etc.).freezing-airport-6809
creamy-shampoo-53278
12/30/2024, 11:42 PMfreezing-airport-6809
freezing-airport-6809
creamy-shampoo-53278
12/31/2024, 2:04 PM@task(
task_config=Slurm(
ssh_conf={
"host": "<ssh_host>",
"port": "<ssh_port>",
"username": "<ssh_username>",
"password": "<ssh_password>",
},
srun_conf={
"partition": "debug",
"job-name": "demo-slurm",
# Remote working directory
"chdir": "<your_remote_working_dir>"
}
)
)
def plus_one(x: int) -> int:
return x + 1
It's now possible to define a task by passing in ssh_conf
!creamy-shampoo-53278
12/31/2024, 2:20 PMdamp-lion-88352
01/02/2025, 3:10 PMdamp-lion-88352
01/02/2025, 3:10 PMdamp-lion-88352
01/02/2025, 3:10 PMdamp-lion-88352
01/02/2025, 3:10 PMdamp-lion-88352
01/02/2025, 3:13 PMdamp-lion-88352
01/02/2025, 3:13 PM