Hello all! I was hoping to get some help on a, wha...
# flyte-support
e
Hello all! I was hoping to get some help on a, what I believe to be trivial, issue. I currently have a workflow in my local repo that I can successfully run locally against the remote cluster using the following
pyflyte
command with respect to project root in my terminal.
Copy code
pyflyte run --remote --project some-project --domain development tests/regression/workflows/test_deidentify_workflow.py test_deidentify_clinical_note_file --storage_account 'abc12345' --base_path 'container_name/validation/clinical_note/'
However when trying to run this same workflow from a local jupyter notebook using the following code against the remote cluster using
flytekit.FlyteRemote
(this notebook file is located in the
root/tests/
directory of my project):
Copy code
from flytekit import Config, FlyteRemote
from tests.regression.workflows.test_deidentify_workflow import test_deidentify_clinical_note_file

remote = FlyteRemote(
    config=Config.auto(),
    default_project="some-project",
    default_domain="development",
    interactive_mode_enabled=True,
)

remote.fast_register_workflow(entity=test_deidentify_clinical_note_file)

execution = remote.execute(test_deidentify_clinical_note_file, inputs={"storage_account": "abc12345","base_path": "container-name/validation/clinical_note/"}, wait=True)
print(execution.outputs)
I can see the attempt of the workflow execution in the UI but it results in the following ERROR:
Copy code
FlyteAssertion: USER:AssertionError: error=Outputs could not be found because the execution ended in failure. Error
message: Trace:

    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/flytekit/bin/entrypoint.py", line 163, in _dispatch_execute
        task_def = load_task()
                   ^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/flytekit/bin/entrypoint.py", line 578, in load_task
        return resolver_obj.load_task(loader_args=resolver_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/flytekit/core/utils.py", line 312, in wrapper
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/flytekit/core/python_auto_container.py", line 332, in load_task
        loaded_data = cloudpickle.load(f)
                      ^^^^^^^^^^^^^^^^^^^
    ModuleNotFoundError: No module named 'tests'

Message:

    ModuleNotFoundError: No module named 'tests'
The module
tests
it’s referring to is a local directory located immediately under the root project directory that houses test code and is where the workflow I am trying to run is located (albeit a directories down). It seems like the project code isn’t being packaged and registered correctly before attempting to execute the workflow. I have tried manually setting the sys path to the project root path in the notebook before registering and executing the workflow but that seems to make no difference. I suspect I am misconfiguring
FlyteRemote
or need to further configure Jupyter for Flyte usage in some way. Anyone have any insight or could help me solve this problem?
c
hey @echoing-park-83350 how does your folder structure look like? Also I think your
tests
folder should include an
__init.py__
file that flytekit can use to understand it should copy from this folder into the container image. BTW, are you using ImageSpec?
e
Hey @average-finland-92144 Here is the current project structure with respect to the notebook and test workflow I am attempting to run:
Copy code
.
├── tests/                      # Tests directory
│   ├── __init__.py             
│   ├── regression/             # Regression tests directory
│   │   ├── __init__.py
│   │   ├── # Other directories that contain classes related to the test workflow (models, utilities, etc.)        
│   │   └── workflows/          # Regression test workflows directory
│   │   	├── __init__.py    
│   │       └── test_deidentify_workflow.py <- The test workflow I am attempting to run from the Jupyter notebook
│   ├── integration/
│   │   ├── __init__.py
│   │   └── # Other test related classes
│   └── flyte_test_notebook.ipynb <- Jupyter notebook I'm attempting to run the test workflow from using FlyteKit
└── workflows/
    ├── __init__.py 
    └── # Other classes that hold the base code for the service
I am not using ImageSpec, is this something that is necessary to run workflows remotely with FlyteKit?
a
it's not required, and I thought could help here but it's not clear
you could also pass to the
remote.execute
method a
serialization_settings
(ref) argument, including a
source_root
(ref) to include the path to the folder where your workflow is
e
Thanks David, I’ll give that a shot!