https://flyte.org logo
#ask-the-community
Title
# ask-the-community
a

Alex Beach

02/06/2024, 1:54 AM
I have been unable to
remote run
my workflow. I am running:
pyflyte run --remote workflows/training.py pytorch_training_wf
But the workflow fails with:
Copy code
│ │ /opt/conda/envs/envd/lib/python3.10/importlib/__init__.py:126 in             │                                                                                                                                     │
│ │ import_module                                                                │                                                                                                                                     │
│ │                                                                              │                                                                                                                                     │
│ │ ❱ 126 │   return _bootstrap._gcd_import(name[level:], package, level)        │                                                                                                                                     │
│ │ in _gcd_import:1050                                                          │                                                                                                                                     │
│ │ in _find_and_load:1027                                                       │                                                                                                                                     │
│ │ in _find_and_load_unlocked:1004                                              │                                                                                                                                     │
│ ╰──────────────────────────────────────────────────────────────────────────────╯                                                                                                                                     │
│ ModuleNotFoundError: No module named 'training'
l

L godlike

02/06/2024, 1:58 AM
Try another file name
Maybe will help
a

Alex Beach

02/06/2024, 1:59 AM
i haven't not gotten any examples from the docs working with 1.10.3, that above error has happened to me a few times
with different examples
l

L godlike

02/06/2024, 2:00 AM
I amen change the file name From training.py to another
Maybe will help
a

Alex Beach

02/06/2024, 2:00 AM
trying it
has to rebuild
same issue
Copy code
pyflyte run --remote --copy-all workflows/mypyworkflow.py pytorch_training_wf
Copy code
│ ModuleNotFoundError: No module named 'mypyworkflow'
i think it might be that the image is not getting rebuilt, its still using the same docker image verion
l

L godlike

02/06/2024, 2:29 AM
How about go into the directory?
Run again
a

Alex Beach

02/06/2024, 2:36 AM
same issue
l

L godlike

02/06/2024, 2:37 AM
Will take a look 4 hours later
Thx
a

Alex Beach

02/06/2024, 2:38 AM
also for some reason, --image doesn't use the image that i passed in
l

L godlike

02/06/2024, 5:22 AM
--image
doesn't work because you had already specified image in
container_image
no idea about why the file doens't work, need someone else help
a

Alex Beach

02/06/2024, 5:25 AM
i think i know why
l

L godlike

02/06/2024, 5:25 AM
why
a

Alex Beach

02/06/2024, 5:25 AM
https://docs.flyte.org/en/latest/api/flytekit/pyflyte.html#pyflyte-run
Copy code
Note: This command is compatible with regular Python packages, but not with namespace packages. When determining the root of your project, it identifies the first folder without an __init__.py file.
l

L godlike

02/06/2024, 5:25 AM
oh nice
try it
a

Alex Beach

02/06/2024, 5:26 AM
it worked for a different image spec workflow. I am trying it for my pytorch flow
l

L godlike

02/06/2024, 5:26 AM
ok nice
thank you
a

Alex Beach

02/06/2024, 6:36 AM
issue is not solved the only difference between my buildspecs is
Copy code
custom_image = ImageSpec(
    name="kfpytorch-flyte",
    packages=["torch", "torchvision", "flytekitplugins-kfpytorch", "matplotlib", "tensorboardX"],
    registry="xxxxxxxx/flyte",
    cuda="11.2.2",
    cudnn="8", 
    python_version="3.10"
)
and the other does not contain cuda or cudn
so i suspect the cuda or cudnn settings are the culprit, causing some workflows to fail for the
module not found
issue
l

L godlike

02/06/2024, 6:38 AM
thank you for telling me that
cc @Kevin Su, help
a

Alex Beach

02/06/2024, 7:29 AM
I was able to replicate it with this workflow: if i run other workflows that dont' have cuda or cudnn, those don't have the module error
Copy code
from flytekit import ImageSpec, task, workflow

custom_image = ImageSpec(
    name="kfpytorch-flyte",
    packages=["flytekit", "torch", "torchvision", "flytekitplugins-kfpytorch", "matplotlib", "tensorboardX"],
    registry="xxxxx",
    cuda="11.2.2",
    cudnn="8", 
    python_version="3.10"
)

@task(container_image=custom_image)
def bar() -> str:
    return 'foo'

@workflow
def foo_wf(
): 
    bar()


if __name__ == "__main__":
    foo_wf()
This should produce:
Copy code
ModuleNotFoundError: No module named 'workflows'
If that is run from `pyflyte run --remote workflows/example.py foo_fw
k

Kevin Su

02/06/2024, 7:30 AM
which version of envd you are using?
a

Alex Beach

02/06/2024, 7:40 AM
envd==0.3.45
flytekit==1.10.3 flytekitplugins-envd==1.10.3
this is a show stopping bug, basically i can't run any of the pytorch examples
g

Gaurav Panta

02/22/2024, 8:49 PM
@Alex Beach did you figure it out?
a

Alex Beach

02/22/2024, 9:00 PM
no i haven't looked at this in a while