I have been unable to `remote run` my workflow. I...
# flyte-support
h
I have been unable to
remote run
my workflow. I am running:
pyflyte run --remote workflows/training.py pytorch_training_wf
But the workflow fails with:
Copy code
│ │ /opt/conda/envs/envd/lib/python3.10/importlib/__init__.py:126 in             │                                                                                                                                     │
│ │ import_module                                                                │                                                                                                                                     │
│ │                                                                              │                                                                                                                                     │
│ │ ❱ 126 │   return _bootstrap._gcd_import(name[level:], package, level)        │                                                                                                                                     │
│ │ in _gcd_import:1050                                                          │                                                                                                                                     │
│ │ in _find_and_load:1027                                                       │                                                                                                                                     │
│ │ in _find_and_load_unlocked:1004                                              │                                                                                                                                     │
│ ╰──────────────────────────────────────────────────────────────────────────────╯                                                                                                                                     │
│ ModuleNotFoundError: No module named 'training'
d
Try another file name
Maybe will help
h
i haven't not gotten any examples from the docs working with 1.10.3, that above error has happened to me a few times
with different examples
d
I amen change the file name From training.py to another
Maybe will help
h
trying it
has to rebuild
same issue
Copy code
pyflyte run --remote --copy-all workflows/mypyworkflow.py pytorch_training_wf
Copy code
│ ModuleNotFoundError: No module named 'mypyworkflow'
i think it might be that the image is not getting rebuilt, its still using the same docker image verion
d
How about go into the directory?
Run again
h
same issue
d
Will take a look 4 hours later
Thx
h
also for some reason, --image doesn't use the image that i passed in
d
--image
doesn't work because you had already specified image in
container_image
no idea about why the file doens't work, need someone else help
h
i think i know why
d
why
h
https://docs.flyte.org/en/latest/api/flytekit/pyflyte.html#pyflyte-run
Copy code
Note: This command is compatible with regular Python packages, but not with namespace packages. When determining the root of your project, it identifies the first folder without an __init__.py file.
d
oh nice
try it
h
it worked for a different image spec workflow. I am trying it for my pytorch flow
d
ok nice
thank you
h
issue is not solved the only difference between my buildspecs is
Copy code
custom_image = ImageSpec(
    name="kfpytorch-flyte",
    packages=["torch", "torchvision", "flytekitplugins-kfpytorch", "matplotlib", "tensorboardX"],
    registry="xxxxxxxx/flyte",
    cuda="11.2.2",
    cudnn="8", 
    python_version="3.10"
)
and the other does not contain cuda or cudn
so i suspect the cuda or cudnn settings are the culprit, causing some workflows to fail for the
module not found
issue
d
thank you for telling me that
cc @glamorous-carpet-83516, help
h
I was able to replicate it with this workflow: if i run other workflows that dont' have cuda or cudnn, those don't have the module error
Copy code
from flytekit import ImageSpec, task, workflow

custom_image = ImageSpec(
    name="kfpytorch-flyte",
    packages=["flytekit", "torch", "torchvision", "flytekitplugins-kfpytorch", "matplotlib", "tensorboardX"],
    registry="xxxxx",
    cuda="11.2.2",
    cudnn="8", 
    python_version="3.10"
)

@task(container_image=custom_image)
def bar() -> str:
    return 'foo'

@workflow
def foo_wf(
): 
    bar()


if __name__ == "__main__":
    foo_wf()
This should produce:
Copy code
ModuleNotFoundError: No module named 'workflows'
If that is run from `pyflyte run --remote workflows/example.py foo_fw
g
which version of envd you are using?
h
envd==0.3.45
flytekit==1.10.3 flytekitplugins-envd==1.10.3
this is a show stopping bug, basically i can't run any of the pytorch examples
s
@hallowed-dog-74273 did you figure it out?
h
no i haven't looked at this in a while