brash-honey-82710
05/07/2024, 2:27 PMancient-wolf-19325
05/07/2024, 2:27 PMimport pandas as pd
from flytekit import task, workflow, Resources
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression
@task(requests=Resources(mem="500Mi"))
def get_data() -> pd.DataFrame:
"""Fetches the wine dataset."""
return load_wine(as_frame=True).frame
@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
"""Converts a 3-class to a binary classification problem."""
return data.assign(target=lambda x: x["target"].where(x["target"] == 0, 1))
@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
"""Trains a logistic regression model."""
features = data.drop("target", axis="columns")
target = data["target"]
return LogisticRegression(max_iter=5000, **hyperparameters).fit(features, target)
@workflow
def ml_workflow(hyperparameters: dict) -> LogisticRegression:
"""Defines the ML workflow."""
data = get_data()
processed_data = process_data(data=data)
model = train_model(data=processed_data, hyperparameters=hyperparameters)
return model
This workflow includes tasks for data retrieval, data processing, and model training. The workflow is orchestrated by Flyte, ensuring each step is executed in the correct order and with the necessary resources.
Would you like more details on any specific part of this workflow?
Sources:
- GitHub Pull Request #2340
- Running and Scheduling Workflows
- GitHub Pull Request #1181 👍 1 button 👎 0 button 💡 Hint: Mention @ancient-wolf-19325 in the thread for followups.