hi can we run ray tune experiments in flyte bcause i am gett Flyte #ray-integration

hi, can we run ray tune experiments in flyte. bcau...

future-notebook-79388

11/10/2022, 4:21 AM

hi, can we run ray tune experiments in flyte. bcause i am getting error while executing.

Copy code

import typing

import ray
from ray import tune
from flytekit import Resources, task, workflow
from flytekitplugins.ray import HeadNodeConfig, RayJobConfig, WorkerNodeConfig


@ray.remote
def objective(config):
    return (config["x"] * config["x"])


ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(ray_start_params={"log-color": "True"}),
    worker_node_config=[WorkerNodeConfig(group_name="ray-group", replicas=2)],
    runtime_env={"pip": ["numpy", "pandas"]},
)


@task(task_config=ray_config, limits=Resources(mem="2000Mi", cpu="1"))
def ray_task(n: int) -> int:
    model_params = {
        "x": tune.randint(-10, 10)
    }

    tuner = tune.Tuner(
        objective,
        tune_config=tune.TuneConfig(
            num_samples=10,
            max_concurrent_trials=n,
        ),

        param_space=model_params,
    )
    results = tuner.fit()
    return results


@workflow
def ray_workflow(n: int) -> int:
    return ray_task(n=n)

is there any other ways to run hyperparameter tuning in a distributed manner like ray tune?

freezing-airport-6809

11/10/2022, 6:19 AM

This should be possible- seems like something is wrong in pickling

freezing-airport-6809

11/10/2022, 6:19 AM

Can you file this as a bug

freezing-airport-6809

11/10/2022, 6:21 AM

Also not sure why this is failing - the error is non descriptive

freezing-airport-6809

11/10/2022, 6:21 AM

If you drop the distributed part (drop ray config) does it work

freezing-airport-6809

11/10/2022, 6:21 AM

Or does it work locally

future-notebook-79388

11/10/2022, 6:26 AM

The above error mentioned is got when i ran locally using '*python example.py'*. when i executed using pyflyte run command i am getting this error.

future-notebook-79388

11/10/2022, 6:32 AM

after removing the distributed config also am getting same error.

glamorous-carpet-83516

11/10/2022, 5:46 PM

yes, ray tune should work, it could test it locally first. Did you use ray 2.0? I got the same error, but fixed it by upgrading ray.

glamorous-carpet-83516

11/10/2022, 5:47 PM

btw, the type of

result

isn’t int. it’s

ResultGrid

future-notebook-79388

11/11/2022, 5:14 AM

What is the method to specify the result type as

ResultGrid

Copy code

@workflow
def ray_workflow(n: int) -> ResultGrid:
    return ray_task(n=n)

Is this the way?

glamorous-carpet-83516

11/11/2022, 7:43 AM

yes, flytekit will serialize it to pickle by default, but you could register new type transformer to serialize it to protobuf. https://docs.flyte.org/projects/cookbook/en/latest/auto/core/extend_flyte/custom_types.html

future-notebook-79388

11/11/2022, 7:58 AM

have upgraded the ray version. it is 2.1.0 now. but still i am getting this error when i mention the result type as ResultGrid. is it compulsory to register new type transformer? is the error caused bcause of that? now the ray cluster is getting initiated but after that getting error.

future-notebook-79388

11/11/2022, 8:38 AM

For demo purpose I just returned the length of the previous msg. ray instance is getting initiated.

AttributeError: 'NoneType' object has no attribute 'encode'

ray.tune.error.TuneError: The Ray Tune run failed. Please inspect the previous error messages for a cause. After fixing the issue, you can restart the run from scratch or continue this run.

Copy code

import ray
from ray import tune, air
from ray.air import Result
from ray.tune import ResultGrid
from flytekit import Resources, task, workflow
from flytekitplugins.ray import HeadNodeConfig, RayJobConfig, WorkerNodeConfig


@ray.remote
def objective(config):
    return (config["x"]+2)


ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(ray_start_params={"log-color": "True"}),
    worker_node_config=[WorkerNodeConfig(group_name="ray-group", replicas=2)],
    runtime_env={"pip": ["numpy", "pandas"]},

)


@task(task_config=ray_config, limits=Resources(mem="2000Mi", cpu="1"))
def ray_task() -> int:
    model_params = {
        "x": tune.randint(-10, 10)
    }
    tuner = tune.Tuner(
        objective,
        tune_config=tune.TuneConfig(
            num_samples=10,
            max_concurrent_trials=2,
        ),
        param_space=model_params,
    )
    result_grid = tuner.fit()
    return len(result_grid)

@workflow
def ray_workflow() -> int:
    return ray_task()

freezing-airport-6809

11/11/2022, 4:09 PM

So your ray run itself is failing

glamorous-carpet-83516

11/11/2022, 6:53 PM

Hey @future-notebook-79388 . IIUC, the function (

objective

) in

tune.Tuner

should be a regular function instead of a ray remote function.

160 Views

Open in Slack

Previous Next