https://flyte.org logo
#announcements
Title
# announcements
r

Robin Kahlow

06/29/2022, 11:06 AM
Anyone know a good way to do non-grid (eg. some kind of bayesian) hyperparameter optimization with Flyte, with multiple trials in parallel? (ie. is there some library that makes this easy or do I have to implement most of the optimization stuff myself? eg. a library which spits out parameters to try and I give it back the results would be pretty easy to use with Flyte, rather than the library calling an objective function like hyperopt does)
n

Niels Bantilan

06/29/2022, 1:30 PM
hi @Robin Kahlow there’s currently no canonical way of doing this, although I believe it’s technically possible with Flyte +
<some bayes opt library>
. One would have to use dynamic workflows to collect results and feed it back to the bayesopt sampler for subsequent trials. Do you have a bayesopt library in mind?
r

Robin Kahlow

06/29/2022, 1:31 PM
i dont have a specific one in mind, i used hyperopt before though is there a way to limit concurrency with dynamic? // i guess that automatically happens since the sampling is sequential
n

Niels Bantilan

06/29/2022, 1:59 PM
Using BayesianOptimization (basically using the suggest-evaluate-register loop in the advanced guide) something like this might work.
Copy code
from bayes_opt import BayesianOptimization, UtilityFunction

@task
def black_box_function(points: Dict):
    ...  # inner training loop here

@task
def suggest_points(
    optimizer: BayesianOptimization,
    utility: UtilityFunction,
    concurrency: int,
) -> List[Dict]:
    return [optimizer.suggest(utility) for _ in range(concurrency)]

@task
def register_targets(
    optimizer: BayesianOptimization,
    points: List[Dict],
    targets: List[float],
) -> BayesianOptimization:
    for point, target in zip(points, targets):
        optimizer.register(params=point, target=target)
    return optimizer

@dynamic
def concurrent_trials(points: dict) -> List[float]:
    targets = []
    for _ in points:
        targets.append(black_box_function(**points))
    return targets

@dynamic
def bayesopt(n_iter: int = 5, concurrency: int = 3) -> Dict:
    optimizer = BayesianOptimization(...)
    utility = UtilityFunction(kind="ucb", kappa=2.5, xi=0.0)
    for _ in range(n_iter):
        points = suggest_points(optimizer=optimizer, utility=utility, concurrency=concurrency)
        targets = concurrent_trials(points=points)
        optimizer = register_targets(optimizer, points=points, targets=targets)
    # return point that maximized the target
    return optimizer.max
caveat: this extensively uses the
PythonPickle
type for types that Flyte doesn’t know how to natively handle, like
BayesianOptimization
and
UtilityFunction
types
r

Robin Kahlow

06/29/2022, 2:00 PM
oh great, that's really helpful Niels thank you 🙂
n

Niels Bantilan

06/29/2022, 2:01 PM
also, I’m not entirely sure whether the
optimizer = register_targets(optimizer, points=points, targets=targets)
line in
bayesopt
dynamic will work as intended… I do believe this will unroll the dynamic graph correctly, but will have to confirm in practice
r

Robin Kahlow

06/29/2022, 2:02 PM
ya I'll try it out!
n

Niels Bantilan

06/29/2022, 2:04 PM
great ! please let me know if this works, would love to work in a canonical example in our tutorials.
👍 1
also happy to help debug if you can share a minimally-repo-example
also as an FYI we’re working on a Flyte-Ray integration, so when that happens RayTune will open up to Flyte users. however, I do think it’s still worth it to explore using Flyte exclusively for hyperparam optimization use cases
Hey @Robin Kahlow I got excited to try it out myself, so here’s a working example 🙂
Copy code
pip install flytekit bayesian-optimization scipy==1.7.0
need to install specific version of scipy, as
1.8.0
causes issues
this works locally ^^ testing on a demo cluster now
r

Robin Kahlow

06/30/2022, 10:19 AM
cool, got it working too!
🦜 1
n

Niels Bantilan

06/30/2022, 2:07 PM
great @Robin Kahlow! let me know if you have any other questions on this front… would love to know how this works out for your use case
r

Robin Kahlow

06/30/2022, 2:09 PM
yea will do! one small issue right now: if the trainings take different amounts of time, we're always waiting for all of them to complete (vs there always being N workers up that just fetch more work when theyre done)
n

Niels Bantilan

06/30/2022, 7:43 PM
Right, that’s definitely a limitation of this approach.
there always being N workers up that just fetch more work when theyre done
The pure Flyte execution model won’t allow for this, hence the integrations with Spark (and Ray, [coming soon]). With those, you can just wrap everything in a single
@task
and use the underlying Spark/Ray cluster to distribute the computation, while having access to all of the state in the hyperopt routine.
if the trainings take different amounts of time
What’s the min, max, and mean runtime of each trial in your case? i.e. are they in the order of minutes, hours, days, (or 😖 weeks)? The benefit of the pure Flyte approach is all trials are subject to Flyte’s data lineage tracking, cache-able with (
@task(cache=True, …)
) and recoverable under the Flyte system.
@Ketan (kumare3) perhaps
@eager
would help in this case, where there’s a central eager workflow that asynchronously spins up
N
workers at any given time per trial, and when
x
trials complete the hyperparam sampler updates and samples
x
parameters and spins up another trial task.
k

Ketan (kumare3)

06/30/2022, 10:07 PM
yes
@eager
when we build it should allow for this. @Sebastian Schulze is actually working on a prototype in his company
cc @Robin Kahlow / @Sebastian Schulze maybe you folks can get together to try it out?
s

Sebastian Schulze

07/01/2022, 9:19 AM
Hi, I'd be happy to have a chat about this. We adopted an approach in which a central "master"-task starts, monitors and terminates trial-workflows as necessary. We use optuna as our hyperopt library but I imagine other choices would work just as well.
r

Robin Kahlow

07/01/2022, 9:38 AM
@Niels Bantilan the model i was training right now takes about an hour (plus minus a couple minutes), i fixed the number of epochs (keeping only the best tho) so they took all around the same time, but making number of epochs another hyperparameter would be nice, and some of our other models take days to train the caching is indeed very cool, i liked that i could train for 5 hyperparameter optimization iterations, then come back later and see how it did to maybe do more iterations while keeping the progress of the first 5 eager sounds interesting, is that already on a branch somewhere?
n

Niels Bantilan

07/01/2022, 1:25 PM
Awesome @Sebastian Schulze! Will you be free some time next week (Tue or after)? Would love to learn the approach you describe. @Robin Kahlow would you be interested in joining? We don’t have an implementation of
eager
yet, but it sounds like @Sebastian Schulze’s solution is an early shot at something like it.
r

Robin Kahlow

07/01/2022, 1:26 PM
yea sure! although I probably won't be working much on this, have a lot of other stuff on my plate that has higher prio unfortunately
n

Niels Bantilan

07/01/2022, 1:31 PM
@Sebastian Schulze @Robin Kahlow what time zones are y’all at? Does Tue 7/5 11AM EST work for you?
r

Robin Kahlow

07/01/2022, 1:33 PM
works for me unless something at my job comes up
s

Sebastian Schulze

07/01/2022, 1:33 PM
that should work for me as well
r

Robin Kahlow

07/01/2022, 1:33 PM
im in the UK, but probably closer to being on US-east timezone 😄
n

Niels Bantilan

07/01/2022, 1:41 PM
great! just sent invite
k

Ketan (kumare3)

07/01/2022, 1:58 PM
Nice
n

Niels Bantilan

07/05/2022, 3:02 PM
hey @Sebastian Schulze friendly ping: https://meet.google.com/tne-dhjf-nfs
oh, btw @Sebastian Schulze I forgot to ask: was there a particular reason y’all didn’t decide trying out RayTune for hyperopt?
32 Views