https://flyte.org logo
#ask-the-community
Title
# ask-the-community
j

Jose Navarro

09/27/2023, 4:31 PM
Hi team! 👋 I am new to Flyte 😃 . I have a couple of questions (apologies if they were answered before, I couldn't find my answers in the channel): 1. How best to deal with code dependencies? I see from the examples that all the task/workflows code lives in a single file. However, I am thinking about some of our current training pipelines and it would be good to have some code living outside the main entry point for better readability etc. I can't seem to make it work though 🤔 . Will paste a code example in a 🧵 . 2. Maybe related to the first question, I see that there is a python projects template here. Is this structure what people is following atm as best practise? Thank you in advance 🙂
Let's say I have a simple pipeline which is working
Copy code
# This is the code I want to move out
def do_something_with_data(data: pd.DataFrame):
   # Some complex logic
   return data


@task
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame

@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    new_data = do_something_with_data(data)
    return new_data

@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
   #model.fit()
   return model

@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        hyperparameters=hyperparameters,
    )
but if I move the function out to a separate module, I can't get it to work:
Copy code
try:
   from module import do_something_with_data
except ImportError:
   from .module import do_something_with_data

@task
def get_data() -> pd.DataFrame:
    """Get the wine dataset."""
    return load_wine(as_frame=True).frame

@task
def process_data(data: pd.DataFrame) -> pd.DataFrame:
    new_data = do_something_with_data(data)
    return new_data

@task
def train_model(data: pd.DataFrame, hyperparameters: dict) -> LogisticRegression:
   #model.fit()
   return model

@workflow
def training_workflow(hyperparameters: dict) -> LogisticRegression:
    data = get_data()
    processed_data = process_data(data=data)
    return train_model(
        data=processed_data,
        hyperparameters=hyperparameters,
    )
m

Michael Tinsley

09/27/2023, 5:26 PM
You might be running into issues with how your registering your workflow. If you’ve not already, I’d recommend having a read through this Registering Workflows documentation.
j

Jose Navarro

09/27/2023, 7:08 PM
Ah, maybe because I was running them using pyflyte run —remote
k

Kevin Su

09/27/2023, 10:47 PM
when you use pyflyte run, flytekit will copy your code to s3, and download it once pod is running. you could try
pyflyte run --remote --copy-all …
, it will copy your other module to s3 as well.
j

Jose Navarro

09/28/2023, 8:56 AM
@Kevin Su Thanks Kevin, I managed to run it correctly with
copy-all
4 Views