https://flyte.org logo
d

Dylan Wilder

04/21/2022, 5:06 PM
hey y'all we have kind of an urgent issue related to raw output data prefix again. We are setting both in flyte config and when running via flyte cli (to the same value). however it's not being picked up
Error
Copy code
Failed to convert return value for var o0 for function workflows.utils.get_params.get_param_tab_dfs with error <class 'flytekit.common.exceptions.user.FlyteAssertion'>: Failed to put data from /tmp/flyteys2jd582/local_flytekit/3a3db53a418d3972a766afd9674492ad to /kd/f4406b4e311fe405c99d-fgyaxjti-3/f42b11a70f3f5c6abe4df5efe49f14bf (recursive=True).

Original exception: could not create '/kd': Permission denied

SYSTEM ERROR! Contact platform administrators.
message has been deleted
proof ^
cc @John Russell
More context: this might be related to a backend upgrade that happened this morning and caused an outage. however the outage was resolved and we are still experiencing issues
flyte version: 0.19.4 flytekit version: 0.25
y

Yee

04/21/2022, 5:53 PM
hey @Dylan Wilder just seeing this
feel free to tag us next time
this is happening on a backend run correct? not local
d

Dylan Wilder

04/21/2022, 5:57 PM
yes
y

Yee

04/21/2022, 5:57 PM
can you confirm what is being set in the command to the pod for that switch?
d

Dylan Wilder

04/21/2022, 5:57 PM
it is being set incorrectly
y

Yee

04/21/2022, 5:57 PM
got it.
d

Dylan Wilder

04/21/2022, 5:57 PM
Copy code
pyflyte-execute --inputs <gs://flytepropeller-production-storage/metadata/propeller/sp-one-model-staging-f4406b4e311fe405c99d/msr-forecast/data/0/model-params-2-of-2-grab-params/inputs.pb> --output-prefix <gs://flytepropeller-production-storage/metadata/propeller/sp-one-model-staging-f4406b4e311fe405c99d/msr-forecast/data/0/model-params-2-of-2-grab-params/0> --raw-output-data-prefix /9c/f4406b4e311fe405c99d-fgyaxjti-0 --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module workflows.utils.get_params task-name get_param_tab_dfs"
--raw-output-data-prefix /9c/f4406b4e311fe405c99d-fgyaxjti-0
what i have no clarity on is where/when/how it's being set given there's been a few changes in this functionality across versions
y

Yee

04/21/2022, 6:00 PM
can you check this entry in the flyte config?
Copy code
core.yaml: |
    propeller:
      rawoutput-prefix: <s3://bucket-name/>
k

katrina

04/21/2022, 6:01 PM
hey @Dylan Wilder separately, do you mind letting us know what the outage from your upgrade was
d

Dylan Wilder

04/21/2022, 6:02 PM
ask @Babis Kiosidis @katrina
👍 1
@Yee is that flyte admin config?
k

katrina

04/21/2022, 6:04 PM
re: your issue, are you setting the role, max parallelism or labels/annotations in the CreateExecutionRequest?
y

Yee

04/21/2022, 6:05 PM
no this is propeller config
it’s set in both places… but propeller is ultimately the code that is doing the replacement
d

Dylan Wilder

04/21/2022, 6:07 PM
it's not set in propeller config
rawoutput-prefix: ''
is that not overrident by workflow setting?
k

katrina

04/21/2022, 6:09 PM
@Dylan Wilder are you setting the role, max parallelism or labels/annotations in the CreateExecutionRequest?
d

Dylan Wilder

04/21/2022, 6:11 PM
no
at least not intentionally anywhere
k

katrina

04/21/2022, 6:14 PM
how about in the launch plan?
d

Dylan Wilder

04/21/2022, 6:15 PM
nope, not there or at launch time
k

katrina

04/21/2022, 6:17 PM
oh interesting, so you never run with a role in general?
b

Babis Kiosidis

04/21/2022, 6:17 PM
The outage is unrelated to this and we can discuss it separately. We are planning to take another look tomorrow morning before.
d

Dylan Wilder

04/21/2022, 6:17 PM
it's related to the upgarde i believe though
sorry this issue is related to the upgrade
b

Babis Kiosidis

04/21/2022, 6:18 PM
Yeah sounds like it. The raw output prefix used to work on the previous version
Could it be a mismatch with the flytekit version used in the code? Or is it purely up to propeller?
d

Dylan Wilder

04/21/2022, 6:20 PM
that's a possibility, we're a bit behind
k

katrina

04/21/2022, 6:21 PM
oh! what version of flyteadmin are you on?
d

Dylan Wilder

04/21/2022, 6:21 PM
flytekit 0.25.0
b

Babis Kiosidis

04/21/2022, 6:22 PM
We just upgraded the system to the milestone v0.19.4
1
d

Dylan Wilder

04/21/2022, 6:22 PM
however, there have been a number of breaking changes since then so upgrading is difficult
k

katrina

04/21/2022, 6:23 PM
@Babis Kiosidis by just you mean this morning? we had some snafus and actually had to re-release v0.19.4 a few times 😅
b

Babis Kiosidis

04/21/2022, 6:23 PM
Yes we upgraded this morning, roughly 10h ago
d

Dylan Wilder

04/21/2022, 6:24 PM
yes that's when this issue started
k

katrina

04/21/2022, 6:24 PM
and just to be extra paranoid, your flyteadmin version is 0.6.148?
b

Babis Kiosidis

04/21/2022, 6:24 PM
I will have to check brb
flyteadmin 0.6.147
k

katrina

04/21/2022, 6:28 PM
ah okay, that has exactly the fix i was thinking might be the root cause. nevermind then!
could I ask you to share the execution spec and corresponding launch plan spec for one of these failing executions (with whatever sensitive info you need redacted)
b

Babis Kiosidis

04/21/2022, 6:36 PM
we will trigger an execution so we can see the spec and to see if the spec contains the prefix
(hypothesis) if yes, then there is probably an issue with propeller, and maybe you could help us identify how far back we could roll safely to avoid the bug
k

katrina

04/21/2022, 6:37 PM
there were some changes to the handling of overridable values in general, so it would be helpful if you could share the entire spec for both
b

Babis Kiosidis

04/21/2022, 6:38 PM
which both do you mean?
k

katrina

04/21/2022, 6:38 PM
launch plan & execution spec, sorry
b

Babis Kiosidis

04/21/2022, 6:38 PM
👍
d

Dylan Wilder

04/21/2022, 6:56 PM
we use
LaunchPlan.from_config("...").fetch_launch_plan()
to launch
we do not explicitly set the prefix at launch
b

Babis Kiosidis

04/21/2022, 6:58 PM
and also in the execution itself we see the prefix configured in flyte console. So it looks like it's there
k

katrina

04/21/2022, 7:02 PM
@Prafulla Mahindrakar has a fix! can you share
@Prafulla Mahindrakar can you build a docker image that the spotify team can try out once you have the PR ready?
b

Babis Kiosidis

04/21/2022, 7:06 PM
flytepropeller image?
k

katrina

04/21/2022, 7:09 PM
flyteadmin
p

Prafulla Mahindrakar

04/21/2022, 7:17 PM
I am building the image give me a few mins
b

Babis Kiosidis

04/21/2022, 7:18 PM
👍
p

Prafulla Mahindrakar

04/21/2022, 7:27 PM
Until the official release fix can you try this image
Copy code
docker push <http://ghcr.io/flyteorg/flyteadmin:v0.6.147-fix|ghcr.io/flyteorg/flyteadmin:v0.6.147-fix> 
The push refers to repository [<http://ghcr.io/flyteorg/flyteadmin|ghcr.io/flyteorg/flyteadmin>]
e9d2c88c7662: Pushed 
99c33ff2aa60: Pushed 
26b69aebec89: Pushed 
4fc242d58285: Layer already exists 
v0.6.147-fix: digest: sha256:5ab74964cb02aececd9fe165ae954dbe55e3cda8a2cf07680279fa9504162bb0 size: 1159
b

Babis Kiosidis

04/21/2022, 7:32 PM
we are trying it thank you
🙏 1
it works thank you very much 🙂
mvps
thank you very much for the quick response, great work
k

katrina

04/21/2022, 7:49 PM
awesome, thank you @Prafulla Mahindrakar
p

Prafulla Mahindrakar

04/21/2022, 8:02 PM
cool . Sorry for the trouble . Its also now part of this official build https://github.com/flyteorg/flyteadmin/releases/tag/v0.6.149 We will plan to add more tests to catch such issues earlier.
2 Views