hey y'all we have kind of an urgent issue related ...
# flytekit
d
hey y'all we have kind of an urgent issue related to raw output data prefix again. We are setting both in flyte config and when running via flyte cli (to the same value). however it's not being picked up
Error
Copy code
Failed to convert return value for var o0 for function workflows.utils.get_params.get_param_tab_dfs with error <class 'flytekit.common.exceptions.user.FlyteAssertion'>: Failed to put data from /tmp/flyteys2jd582/local_flytekit/3a3db53a418d3972a766afd9674492ad to /kd/f4406b4e311fe405c99d-fgyaxjti-3/f42b11a70f3f5c6abe4df5efe49f14bf (recursive=True).

Original exception: could not create '/kd': Permission denied

SYSTEM ERROR! Contact platform administrators.
message has been deleted
proof ^
cc @John Russell
More context: this might be related to a backend upgrade that happened this morning and caused an outage. however the outage was resolved and we are still experiencing issues
flyte version: 0.19.4 flytekit version: 0.25
y
hey @Dylan Wilder just seeing this
feel free to tag us next time
this is happening on a backend run correct? not local
d
yes
y
can you confirm what is being set in the command to the pod for that switch?
d
it is being set incorrectly
y
got it.
d
Copy code
pyflyte-execute --inputs <gs://flytepropeller-production-storage/metadata/propeller/sp-one-model-staging-f4406b4e311fe405c99d/msr-forecast/data/0/model-params-2-of-2-grab-params/inputs.pb> --output-prefix <gs://flytepropeller-production-storage/metadata/propeller/sp-one-model-staging-f4406b4e311fe405c99d/msr-forecast/data/0/model-params-2-of-2-grab-params/0> --raw-output-data-prefix /9c/f4406b4e311fe405c99d-fgyaxjti-0 --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module workflows.utils.get_params task-name get_param_tab_dfs"
--raw-output-data-prefix /9c/f4406b4e311fe405c99d-fgyaxjti-0
what i have no clarity on is where/when/how it's being set given there's been a few changes in this functionality across versions
y
can you check this entry in the flyte config?
Copy code
core.yaml: |
    propeller:
      rawoutput-prefix: <s3://bucket-name/>
k
hey @Dylan Wilder separately, do you mind letting us know what the outage from your upgrade was
d
ask @Babis Kiosidis @katrina
👍 1
@Yee is that flyte admin config?
k
re: your issue, are you setting the role, max parallelism or labels/annotations in the CreateExecutionRequest?
y
no this is propeller config
it’s set in both places… but propeller is ultimately the code that is doing the replacement
d
it's not set in propeller config
rawoutput-prefix: ''
is that not overrident by workflow setting?
k
@Dylan Wilder are you setting the role, max parallelism or labels/annotations in the CreateExecutionRequest?
d
no
at least not intentionally anywhere
k
how about in the launch plan?
d
nope, not there or at launch time
k
oh interesting, so you never run with a role in general?
b
The outage is unrelated to this and we can discuss it separately. We are planning to take another look tomorrow morning before.
d
it's related to the upgarde i believe though
sorry this issue is related to the upgrade
b
Yeah sounds like it. The raw output prefix used to work on the previous version
Could it be a mismatch with the flytekit version used in the code? Or is it purely up to propeller?
d
that's a possibility, we're a bit behind
k
oh! what version of flyteadmin are you on?
d
flytekit 0.25.0
b
We just upgraded the system to the milestone v0.19.4
1
d
however, there have been a number of breaking changes since then so upgrading is difficult
k
@Babis Kiosidis by just you mean this morning? we had some snafus and actually had to re-release v0.19.4 a few times 😅
b
Yes we upgraded this morning, roughly 10h ago
d
yes that's when this issue started
k
and just to be extra paranoid, your flyteadmin version is 0.6.148?
b
I will have to check brb
flyteadmin 0.6.147
k
ah okay, that has exactly the fix i was thinking might be the root cause. nevermind then!
could I ask you to share the execution spec and corresponding launch plan spec for one of these failing executions (with whatever sensitive info you need redacted)
b
we will trigger an execution so we can see the spec and to see if the spec contains the prefix
(hypothesis) if yes, then there is probably an issue with propeller, and maybe you could help us identify how far back we could roll safely to avoid the bug
k
there were some changes to the handling of overridable values in general, so it would be helpful if you could share the entire spec for both
b
which both do you mean?
k
launch plan & execution spec, sorry
b
👍
d
we use
LaunchPlan.from_config("...").fetch_launch_plan()
to launch
we do not explicitly set the prefix at launch
b
and also in the execution itself we see the prefix configured in flyte console. So it looks like it's there
k
@Prafulla Mahindrakar has a fix! can you share
@Prafulla Mahindrakar can you build a docker image that the spotify team can try out once you have the PR ready?
b
flytepropeller image?
k
flyteadmin
p
I am building the image give me a few mins
b
👍
p
Until the official release fix can you try this image
Copy code
docker push <http://ghcr.io/flyteorg/flyteadmin:v0.6.147-fix|ghcr.io/flyteorg/flyteadmin:v0.6.147-fix> 
The push refers to repository [<http://ghcr.io/flyteorg/flyteadmin|ghcr.io/flyteorg/flyteadmin>]
e9d2c88c7662: Pushed 
99c33ff2aa60: Pushed 
26b69aebec89: Pushed 
4fc242d58285: Layer already exists 
v0.6.147-fix: digest: sha256:5ab74964cb02aececd9fe165ae954dbe55e3cda8a2cf07680279fa9504162bb0 size: 1159
b
we are trying it thank you
🙏 1
it works thank you very much 🙂
mvps
thank you very much for the quick response, great work
k
awesome, thank you @Prafulla Mahindrakar
p
cool . Sorry for the trouble . Its also now part of this official build https://github.com/flyteorg/flyteadmin/releases/tag/v0.6.149 We will plan to add more tests to catch such issues earlier.
162 Views