I'm having issues when running a fast-registered w...
# announcements
s
I'm having issues when running a fast-registered workflow. It looks like flytekit (0.30.3) can't find the code archive:
Copy code
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.

[vej152kn4b-n1-0] terminated with exit code (1). Reason [Error]. Message: 
tar: /root/8143d4634b6d53c26072284ce429e69af1278102-fast1-fast32b90af686df921bd623ae6df68f5c48.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
Traceback (most recent call last):
  File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
    sys.exit(fast_execute_task_cmd())
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/opt/venv/lib/python3.9/site-packages/flytekit/bin/entrypoint.py", line 467, in fast_execute_task_cmd
    _download_distribution(additional_distribution, dest_dir)
  File "/opt/venv/lib/python3.9/site-packages/flytekit/tools/fast_registration.py", line 112, in download_distribution
    result.check_returncode()
  File "/usr/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['tar', '-xvf', '/root/8143d4634b6d53c26072284ce429e69af1278102-fast1-fast32b90af686df921bd623ae6df68f5c48.tar.gz', '-C', '/root']' returned non-zero exit status 2.
.
Here's what's executing from the pod spec
Copy code
spec:
  containers:
  - args:
    - pyflyte-fast-execute
    - --additional-distribution
    - <gs://my-bucket/fast/8143d4634b6d53c26072284ce429e69af1278102-fast1-fast32b90af686df921bd623ae6df68f5c48.tar.gz>
...
I checked the bucket and the object exists so uploading during registration seems to be working correctly. Perhaps something wrong with the download or the path it is copied to in the container. Any ideas?
k
cc @Yee?
y
is there a chance you can get the pod logs?
if you can access it could you print out the whole pod log?
s
@Yee I checked but this actually is the full log. Perhaps I have to increase the loglevel?
Is there a way to enable more verbose logging for a task?
y
sorry are you trying to run this on sandbox?
no right?
s
No this is on GCP
Oh wait, I'm using fsspec/gcsfs and I don't have gsutil installed. Do I need gsutil for downloading the archive?
k
ohh you should not
let me check
y
are you using the fsspec plugin?
s
Yes. It is all working fine if I do normal registration. Here's what my poetry deps look like:
Copy code
[tool.poetry.dependencies]
python = "^3.9"
flytekit = "^0.30.3"
flytekitplugins-data-fsspec = "^0.30.3"
gcsfs = "^2022.1.0"
...
k
so I checked, it is using the persistence layer
y
having trouble replicating this locally…
and unf it doesn’t look like there’s logging we can turn on.
is there a way you can replicate the permissions locally and just run the pyflyte-fast-execute command? that command shouldn’t do anything beyond download the tar file
s
Good idea I'll try that tomorrow.
@Yee another observation: When switching to gsutil instead of the fsspec plugin the download works fine.
y
oh so the issue is in flytekitplugins-data-fsspec?
s
Looks like. Could also be in fsspec/gcsfs of course. I get the same error if I run pyflyte-fast-execute locally. Seems like it chooses the right plugin, but skips the actual download. Couldn't find the root cause yet.
y
i need to dig up my gcp account, will do that and try to debug this
👍 1
265 Views