Hello! What’s the best way to define a task config...
# announcements
n
Hello! What’s the best way to define a task config that launches a custom SageMaker hyperparameter tuning job. I am trying to create a task similar to this example. i see i can provide the
task_config
as a
HPOJob
but how do i specify which
training_task
it’s supposed to execute? and also where should i specify the parameters like
HyperparameterTuningJobConfig
, and
ParameterRanges
?
Also, how will different values of hyperparameters be passed into the custom train script? will i need to read it from SageMaker’s
hyperparameter.json
?
s
Hi, @Nada Saiyed! If you already have a training task defined, be it a custom or a builtin training task, you can send that task as the
training_task
to
SagemakerHPOTask
. You can send
HyperparameterTuningJobConfig
and
ParameterRanges
as inputs to a HPO task when executing it, as can be seen here. Here’s how the task inputs would render on the UI:
Also, how will different values of hyperparameters be passed into the custom train script?
I guess you can send them as inputs to it directly? Like how you send to a Flyte task? Lemme know if I misunderstood your question.
n
can i define the task as a python function? E.g.
Copy code
@task(
    task_config=HPOJob(
        max_number_of_training_jobs=3,
        max_parallel_training_jobs=2,
        tunable_params=["num_round", "max_depth", "gamma"],
    ),

)
def my_hpo_task(x:int):
    print(x)
in this case where do i define
HyperparameterTuningJobConfig
and
ParameterRanges
?
and i have my training task defined like this:
Copy code
@task(
    task_config=SagemakerTrainingJobConfig(
        algorithm_specification=AlgorithmSpecification(
            input_mode=InputMode.FILE,
            algorithm_name=AlgorithmName.CUSTOM,
            algorithm_version="",
            metric_definitions = [MetricDefinition(name="score", regex="score: ([0-9\\.]+)")],
            input_content_type=InputContentType.TEXT_CSV,
        ),
        training_job_resource_config=TrainingJobResourceConfig(
            instance_type="ml.m4.4xlarge",
            instance_count=1,
            volume_size_in_gb=25,
        ),
    ),
    interruptible=True
)
def custom_training_task(x:int):
    print(x)
so in this case will the
training_task
=
custom_training_task
?
also how can i use this task as part of a workflow?
@katrina any thoughts?
Also what inputs does this task expect? in the last line on the page you linked it says
inputs=hpo_inputs
what are
hpo_inputs
?
i was able to create a HPO task with hyperparameters as inputs, but the task fails to launch a SageMaker HPO Job with this error msg:
Copy code
[BadTaskSpecification] Error occurred when checking if all the required inputs exist, caused by: [SAGEMAKER_ERROR] Required input not specified: [train]
This is one of the parameters that
SagemakerBuiltinAlgorithmsTask
expects, but i am working with a
SagemakerCustomTrainingTask
, not sure why its expecting these builtIn parameters.
i also tried giving dummy values to these “expected” parameters i.e.
static_hyperparameters
,
train
,
validation
but ofcourse it failed with incorrect type
k
hey sorry can you summarize what you're running into here?
a
I think we’re running into basically HPO + Custom Sagemaker Training is not implemented. The error above is coming from here: https://github.com/flyteorg/flyteplugins/blob/v1.0.8/go/tasks/plugins/k8s/sagemaker/hyperparameter_tuning.go#L68-L73 It is checking for are only the inputs defined by a builtin sagemaker algorithm. That’s fairly confusing because the flytekit code suggests both are supported but the backend seems to not have this yet. Here’s an old PR that aims to add that support: https://github.com/flyteorg/flyteplugins/pull/137
This init signature and validation logic made me think both types should be supported: https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-aws-sagemaker/flytekitplugins/awssagemaker/hpo.py#L48-L58
s
OH NO! @Ketan (kumare3), any idea why we haven’t merged https://github.com/flyteorg/flyteplugins/pull/137?
k
This is because at lyft we decided to not use this as it was not a great api, but that being said we can help here. So IIRC the way it works - you need to send metrics from custom training task which are sent to cloudwatch and then it should work.
@Andrew Achkar / @Nada Saiyed this is terrible experience and we are extremely sorry about that
We also don't implement things that people don't use, they easily go stale
If you folks will use it we can implement this
@Samhita Alla let's merge this one - https://github.com/flyteorg/flytekit/pull/1112. Cc @Yee
1
merged 1
126 Views