hi, can we run ray tune experiments in flyte. bcau...
# ray-integration
p
hi, can we run ray tune experiments in flyte. bcause i am getting error while executing.
Copy code
import typing

import ray
from ray import tune
from flytekit import Resources, task, workflow
from flytekitplugins.ray import HeadNodeConfig, RayJobConfig, WorkerNodeConfig


@ray.remote
def objective(config):
    return (config["x"] * config["x"])


ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(ray_start_params={"log-color": "True"}),
    worker_node_config=[WorkerNodeConfig(group_name="ray-group", replicas=2)],
    runtime_env={"pip": ["numpy", "pandas"]},
)


@task(task_config=ray_config, limits=Resources(mem="2000Mi", cpu="1"))
def ray_task(n: int) -> int:
    model_params = {
        "x": tune.randint(-10, 10)
    }

    tuner = tune.Tuner(
        objective,
        tune_config=tune.TuneConfig(
            num_samples=10,
            max_concurrent_trials=n,
        ),

        param_space=model_params,
    )
    results = tuner.fit()
    return results


@workflow
def ray_workflow(n: int) -> int:
    return ray_task(n=n)
is there any other ways to run hyperparameter tuning in a distributed manner like ray tune?
k
This should be possible- seems like something is wrong in pickling
Can you file this as a bug
Also not sure why this is failing - the error is non descriptive
If you drop the distributed part (drop ray config) does it work
Or does it work locally
p
The above error mentioned is got when i ran locally using '*python example.py'*. when i executed using pyflyte run command i am getting this error.
after removing the distributed config also am getting same error.
k
yes, ray tune should work, it could test it locally first. Did you use ray 2.0? I got the same error, but fixed it by upgrading ray.
btw, the type of
result
isn’t int. it’s
ResultGrid
.
p
What is the method to specify the result type as
ResultGrid
.
Copy code
@workflow
def ray_workflow(n: int) -> ResultGrid:
    return ray_task(n=n)
Is this the way?
k
yes, flytekit will serialize it to pickle by default, but you could register new type transformer to serialize it to protobuf. https://docs.flyte.org/projects/cookbook/en/latest/auto/core/extend_flyte/custom_types.html
p
have upgraded the ray version. it is 2.1.0 now. but still i am getting this error when i mention the result type as ResultGrid. is it compulsory to register new type transformer? is the error caused bcause of that? now the ray cluster is getting initiated but after that getting error.
For demo purpose I just returned the length of the previous msg. ray instance is getting initiated.
AttributeError: 'NoneType' object has no attribute 'encode'
ray.tune.error.TuneError: The Ray Tune run failed. Please inspect the previous error messages for a cause. After fixing the issue, you can restart the run from scratch or continue this run.
Copy code
import ray
from ray import tune, air
from ray.air import Result
from ray.tune import ResultGrid
from flytekit import Resources, task, workflow
from flytekitplugins.ray import HeadNodeConfig, RayJobConfig, WorkerNodeConfig


@ray.remote
def objective(config):
    return (config["x"]+2)


ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(ray_start_params={"log-color": "True"}),
    worker_node_config=[WorkerNodeConfig(group_name="ray-group", replicas=2)],
    runtime_env={"pip": ["numpy", "pandas"]},

)


@task(task_config=ray_config, limits=Resources(mem="2000Mi", cpu="1"))
def ray_task() -> int:
    model_params = {
        "x": tune.randint(-10, 10)
    }
    tuner = tune.Tuner(
        objective,
        tune_config=tune.TuneConfig(
            num_samples=10,
            max_concurrent_trials=2,
        ),
        param_space=model_params,
    )
    result_grid = tuner.fit()
    return len(result_grid)

@workflow
def ray_workflow() -> int:
    return ray_task()
k
So your ray run itself is failing
k
Hey @Padma Priya M . IIUC, the function (
objective
) in
tune.Tuner
should be a regular function instead of a ray remote function.
158 Views