Hi community! I'm trying to run a dummy python jo...
# ask-the-community
r
Hi community! I'm trying to run a dummy python job on a 🧱 job cluster (_new_cluster_) using
pyflyte
. For some reason, when I add
policy_id
to the cluster config, the Flyte job fails with an error like this:
k
the
Databricks Console
link looks like this:
is that ink correct? are you able to open it?
r
no, it's an invalid link, the run id is
nil
I guess the request is not even submitted to Databricks, let me check the audit logs
ok, I think I found the root cause... When we are submitting a valid job run, 🧱 API returns:
HTTP 200
Copy code
{
    "run_id": <valid_run_id>
}
When we add a cluster policy to the cluster config and the cluster config conflicts with the policy, 🧱 API returns:
HTTP 400
Copy code
{
  "error_code": "INVALID_PARAMETER_VALUE",
  "message": "Cluster validation error: The instance profile arn (arn:aws:iam::<account_id>:instance-profile/<instance_profile>) has been removed from Databricks, Please contact your administrator."
}
I think this error case is not handled properly in Flyte, probably it's still trying to parse the
run_id
from the response, that's why it returns an url with
nil
run_id.
I don't think this is a big problem, but it would be nice to see the above error message displayed in the
Flyte
UI.
Probably the error can be found in pod logs via
kubectl
, but that's available only for admins as far as I know.
170 Views