Hello everyone, We're excited to share that a ver...
# slurm-flyte-wg
c
Hello everyone, We're excited to share that a very naive slurm agent with synchronous
create
and
get
methods works fine locally. At this early stage, we setup a single-host (ubuntu 20.04) machine for our local development and testing. Some takeaways are: 1. Run controller daemons (
slurmctld
,
slurmdbd
), one compute node daemon (
slurmd
), and also the REST API daemon (
slurmrestd
) on the same machine. a. The slurm agent can interact with
slurmrestd
through the host base url
<http://localhost:6820>
. b. Authentication is done by JWT. 2. Test the slurm agent locally based on this guide to mimic FlytePropeller’s behavior. We'll keep pushing forward to make this feature a reality!
We test the agent with the following script:
Copy code
import os
from typing import Any, Dict

from flytekit import workflow
from flytekitplugins.slurm import SlurmTask


slurm_tiny_job = SlurmTask(
    name="demo-slurm",
    slurm_config={
        "script": "#!/bin/bash\necho Hello Slurm Agent!",
        "account": "flyte",
        "partition": "debug",
        "name": "hello-slurm-agent",
        "environment": ["PATH=/bin/:/sbin/:/usr/bin/:/usr/sbin/"],
        "current_working_directory": "/tmp"
    },
)


@workflow
def hi_slurm(dummy: str) -> Dict[str, Any]:
    """Return slurm job information."""
    res = slurm_tiny_job(dummy=dummy)

    return res


if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner

    runner = CliRunner()
    path = os.path.realpath(__file__)

    # Local run
    print(f">>> LOCAL EXEC <<<")
    result = runner.invoke(pyflyte.main, ["run", path, "hi_slurm", "--dummy", "dummy_input"])
    print(result.output)
The result is shown as follows:
f
Can we not use the REST API
and use ssh based api to run slurmctl
sbatch
and
arun
c
Sure! I’ll try it! Thanks for the response 😁
f
@creamy-shampoo-53278 you should use
asyncssh
library, we actually discussed this. cc @damp-lion-88352. this is because most people do not use Rest api
d
yes we should use a shell task
c
Thanks for the guidance. Let me handle it!