n

    Nada Saiyed

    1 month ago
    Hello! What’s the best way to define a task config that launches a custom SageMaker hyperparameter tuning job. I am trying to create a task similar to this example. i see i can provide the
    task_config
    as a
    HPOJob
    but how do i specify which
    training_task
    it’s supposed to execute? and also where should i specify the parameters like
    HyperparameterTuningJobConfig
    , and
    ParameterRanges
    ?
    Also, how will different values of hyperparameters be passed into the custom train script? will i need to read it from SageMaker’s
    hyperparameter.json
    ?
    Samhita Alla

    Samhita Alla

    1 month ago
    Hi, @Nada Saiyed! If you already have a training task defined, be it a custom or a builtin training task, you can send that task as the
    training_task
    to
    SagemakerHPOTask
    . You can send
    HyperparameterTuningJobConfig
    and
    ParameterRanges
    as inputs to a HPO task when executing it, as can be seen here. Here’s how the task inputs would render on the UI:
    Also, how will different values of hyperparameters be passed into the custom train script?
    I guess you can send them as inputs to it directly? Like how you send to a Flyte task? Lemme know if I misunderstood your question.
    n

    Nada Saiyed

    1 month ago
    can i define the task as a python function? E.g.
    @task(
        task_config=HPOJob(
            max_number_of_training_jobs=3,
            max_parallel_training_jobs=2,
            tunable_params=["num_round", "max_depth", "gamma"],
        ),
    
    )
    def my_hpo_task(x:int):
        print(x)
    in this case where do i define
    HyperparameterTuningJobConfig
    and
    ParameterRanges
    ?
    and i have my training task defined like this:
    @task(
        task_config=SagemakerTrainingJobConfig(
            algorithm_specification=AlgorithmSpecification(
                input_mode=InputMode.FILE,
                algorithm_name=AlgorithmName.CUSTOM,
                algorithm_version="",
                metric_definitions = [MetricDefinition(name="score", regex="score: ([0-9\\.]+)")],
                input_content_type=InputContentType.TEXT_CSV,
            ),
            training_job_resource_config=TrainingJobResourceConfig(
                instance_type="ml.m4.4xlarge",
                instance_count=1,
                volume_size_in_gb=25,
            ),
        ),
        interruptible=True
    )
    def custom_training_task(x:int):
        print(x)
    so in this case will the
    training_task
    =
    custom_training_task
    ?
    also how can i use this task as part of a workflow?
    @katrina any thoughts?
    Also what inputs does this task expect? in the last line on the page you linked it says
    inputs=hpo_inputs
    what are
    hpo_inputs
    ?
    i was able to create a HPO task with hyperparameters as inputs, but the task fails to launch a SageMaker HPO Job with this error msg:
    [BadTaskSpecification] Error occurred when checking if all the required inputs exist, caused by: [SAGEMAKER_ERROR] Required input not specified: [train]
    This is one of the parameters that
    SagemakerBuiltinAlgorithmsTask
    expects, but i am working with a
    SagemakerCustomTrainingTask
    , not sure why its expecting these builtIn parameters.
    i also tried giving dummy values to these “expected” parameters i.e.
    static_hyperparameters
    ,
    train
    ,
    validation
    but ofcourse it failed with incorrect type
    k

    katrina

    1 month ago
    hey sorry can you summarize what you're running into here?
    a

    Andrew Achkar

    1 month ago
    I think we’re running into basically HPO + Custom Sagemaker Training is not implemented. The error above is coming from here: https://github.com/flyteorg/flyteplugins/blob/v1.0.8/go/tasks/plugins/k8s/sagemaker/hyperparameter_tuning.go#L68-L73 It is checking for are only the inputs defined by a builtin sagemaker algorithm. That’s fairly confusing because the flytekit code suggests both are supported but the backend seems to not have this yet. Here’s an old PR that aims to add that support: https://github.com/flyteorg/flyteplugins/pull/137
    This init signature and validation logic made me think both types should be supported: https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-aws-sagemaker/flytekitplugins/awssagemaker/hpo.py#L48-L58
    Samhita Alla

    Samhita Alla

    1 month ago
    OH NO! @Ketan (kumare3), any idea why we haven’t merged https://github.com/flyteorg/flyteplugins/pull/137?
    Ketan (kumare3)

    Ketan (kumare3)

    1 month ago
    This is because at lyft we decided to not use this as it was not a great api, but that being said we can help here. So IIRC the way it works - you need to send metrics from custom training task which are sent to cloudwatch and then it should work.
    @Andrew Achkar / @Nada Saiyed this is terrible experience and we are extremely sorry about that
    We also don't implement things that people don't use, they easily go stale
    If you folks will use it we can implement this
    @Samhita Alla let's merge this one - https://github.com/flyteorg/flytekit/pull/1112. Cc @Yee