I have run into a strange bug a few times over the...
# ask-the-community
d
I have run into a strange bug a few times over the last six months and I'm curious if anyone else has any ideas about this or tips for debugging the issue. The manifestation is that the first task in my workflow gets an error
Copy code
ModuleNotFoundError: No module named 'site-packages'
Looking at the task JSON in the UI, I see it does in fact contain "site-packages":
Copy code
--resolver site-packages.flytekit.core.python_auto_container.default_task_resolver
This has happened about three times in the last few months after I have upgraded the version of
flytekit
and others. This has happened in both fast mode and from scheduled launch plans in production. I've spent some time looking into the flyte code, but each time I've resorted to downgrading flytekit and the problem resolves. This has been reported in our team slack by two other people at my company. I'm currently on 1.7.0 but this has occurred on previous versions as well. I suspect something is going wrong in our registration process, but I don't know how to debug this easily. Any advice on tracking down why "site-packages" is sometimes getting appended to the resolver path?
k
do you have site-packages module in your directory?
which pyflyte command are you using? pyflyte run?
d
Yes, there is a site-packages directory. It's the standard location for Python modules. And I didn't change anything with PYTHONPATH or sys.path between the working release and the broken release. Here's a section from the task json from the UI for a failed task run as a scheduled launch plan:
Copy code
"container": {
        "command": [],
        "args": [
            "pyflyte-execute",
            "--inputs",
            "{{.input}}",
            "--output-prefix",
            "{{.outputPrefix}}",
            "--raw-output-data-prefix",
            "{{.rawOutputDataPrefix}}",
            "--checkpoint-path",
            "{{.checkpointOutputPrefix}}",
            "--prev-checkpoint",
            "{{.prevCheckpointPrefix}}",
            "--resolver",
            "site-packages.flytekit.core.python_auto_container.default_task_resolver",
            "--",
            "task-module",
            "...",
            "task-name",
            "task_get_inference_dataset_params"
        ],
        "env": [],
        "config": [],
        "ports": [],
        "image": "...",
        "resources": {
            "requests": [],
            "limits": [
                {
                    "name": 1,
                    "value": "1"
                },
                {
                    "name": 3,
                    "value": "2Gi"
                }
            ]
        }
    }
You can see "site-packages" in
--resolver
argument. Diffing the task json between the previous working run and this failed run, the only differences are in the
--resolver
argument as well and some image/version strings.
s
Looks like having an
__init.py__
in site-packages can result in this error. https://discuss.flyte.org/t/2177664/hello-when-i-am-packaging-using-pyflyte-package-command-foll
d
Oh, that's interesting. I'll check that out. I have not found the root cause, but I have been able to work around this. One of my colleagues pointed out that I can quickly test this in an environment that is having this problem.
Copy code
$ python3 -c 'from flytekit.core.python_auto_container import default_task_resolver; print(default_task_resolver.location)'
site-packages.flytekit.core.python_auto_container.default_task_resolver
We use poetry for package magagement, so I commented out everything in poetry's
pyproject.toml
, created a new environment, installed, tested, added a package, updated the environment, tested, etc. until I could pinpoint the package that introduced the problem. Unfortunately, I'm still at the point where it is a group of packages included in a metapackage, so I'm not sure which module is responsible.