Hi team wave I am new to Flyte smiley I have a couple of que Flyte #flyte-support

Hi team! :wave: I am new to Flyte :smiley: . I ha...

handsome-sandwich-69169

09/27/2023, 4:31 PM

Hi team! 👋 I am new to Flyte 😃 . I have a couple of questions (apologies if they were answered before, I couldn't find my answers in the channel): 1. How best to deal with code dependencies? I see from the examples that all the task/workflows code lives in a single file. However, I am thinking about some of our current training pipelines and it would be good to have some code living outside the main entry point for better readability etc. I can't seem to make it work though 🤔 . Will paste a code example in a 🧵 . 2. Maybe related to the first question, I see that there is a python projects template here. Is this structure what people is following atm as best practise? Thank you in advance 🙂

handsome-sandwich-69169

09/27/2023, 4:39 PM

Let's say I have a simple pipeline which is working

Copy code

# This is the code I want to move out
def do_something_with_data(data: pd.DataFrame):
   # Some complex logic
   return data


@task
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame

@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    new_data = do_something_with_data(data)
    return new_data

@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
   #model.fit()
   return model

@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        hyperparameters=hyperparameters,
    )

but if I move the function out to a separate module, I can't get it to work:

Copy code

try:
   from module import do_something_with_data
except ImportError:
   from .module import do_something_with_data

@task
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame

@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    new_data = do_something_with_data(data)
    return new_data

@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
   #model.fit()
   return model

@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        hyperparameters=hyperparameters,
    )

gray-ocean-62145

09/27/2023, 5:26 PM

You might be running into issues with how your registering your workflow. If you’ve not already, I’d recommend having a read through this Registering Workflows documentation.

👀 1

handsome-sandwich-69169

09/27/2023, 7:08 PM

Ah, maybe because I was running them using pyflyte run —remote

glamorous-carpet-83516

09/27/2023, 10:47 PM

when you use pyflyte run, flytekit will copy your code to s3, and download it once pod is running. you could try

pyflyte run --remote --copy-all …

, it will copy your other module to s3 as well.

👀 1

handsome-sandwich-69169

09/28/2023, 8:56 AM

@glamorous-carpet-83516 Thanks Kevin, I managed to run it correctly with

copy-all

👍 1

4 Views

Open in Slack

Previous Next