Did I miss some pre-requisite steps?
# ask-the-community
r
Did I miss some pre-requisite steps?
I guess the secret is not autogenerated in case of Sandbox cluster and I need to create the secret manually, like
$  kubectl create secret generic -n flyte flyte-secret-auth
can anyone confirm?
but why do you need auth on the sandbox?
r
I guess that can be stored in an env variable also, right?
I mean I don't need to strictly follow this method in local env: https://docs.flyte.org/en/latest/deployment/plugins/webapi/databricks.html#get-an-api-token
just add a
Copy code
FLYTE_DATABRICKS_API_TOKEN
env variable
j
you can just create the secret in that case.
r
ok
I also stuck with this last step: https://docs.flyte.org/en/latest/deployment/plugins/webapi/databricks.html#upgrade-the-flyte-helm-release As I understand, when I use local
sandbox
cluster (started by flytectl demo start), I don't have helm chart release that could be upgraded. So this last does not seem to be compatible with the
Sandbox
cluster. Is that correct?
j
ah yea, alas these instructions don’t appear to be compatible with flyte-binary or the sandbox.
but it should be doable with some minor tweaks though.
r
ah, ok, thanks
j
@Kevin Su do we have a more current example of this? i feel like we did this recently on a sandbox.
e
Flagging that it is possible to pass a token to the
Databricks
config: flytekitplugins/spark/task.py#LL44C7-L44C17. It might work for testing purposes before @Kevin Su gets a chance to weigh in 🙂
Not sure it iwll work, but potentially worth a shot!
r
@Evan Sadler thanks for the hint. nice, I did not know I can pass a token directly to the config. However, I think I still need configure the Databricks plugin, so probably the last step is still needed. is that correct?
ah, ok, just realized I can pass the
databricks_instance
also
e
You still need to enable the plugin in the config, but hopefully not the secret part!
Or instance!
r
ok, thanks
e
Yeah! Let us know if that doesn’t work. The OSS team should be able to help later today when they are up. They are PST.
r
all right, will let you know, thanks a lot
@Evan Sadler tried to pass a token to the
Databricks
config, but the run failed with:
Copy code
{"json":{"exec_id":"f5a4876209ef64132940","ns":"flyte-user-flytesnacks-development","res_ver":"5168558","routine":"worker-38","wf":"flytesnacks:development:dbx_example.my_databricks_job"},"level":"error","msg":"Error when trying to reconcile workflow. Error [failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [databricks]: secrets not found - file [/etc/secrets/FLYTE_DATABRICKS_API_TOKEN], Env [FLYTE_SECRET_FLYTE_DATABRICKS_API_TOKEN]].
so it seems the secret is still needed
e
Oh nooo
k
ahh, that’s my bad. if you didn’t set the default token, the propeller will directly return error. could you set a dummy value for
FLYTE_SECRET_FLYTE_DATABRICKS_API_TOKEN
r
oh, I see, let me give it a try
k
add a env to the propeller, and set a random value. i’ll create a pr to fix it
r
cool, thanks
@Kevin Su Thanks for the tip, it worked like a charm. I have a follow-up question. Flyte is able to start a DataBricks job, but Flyte is unable to update the job status (keeps running forever) no matter if the DataBricks job fails or succeeds. I suspect it's because the follow up request that is responsible for updating the Flyte job's status does not use the token in task configuration. Does it make sense?
k
we do save the token, and propeller will use it when calling GET / DELETE request. are you able to abort workflow
r
hmm, weird, ok, let me do some more testing in this area
will get back to you with the results
Flyte job can be aborted, but it does not terminate the Databricks job
Hi @Kevin Su I'm still struggling with the above issue (Flyte job status does not get updated). I moved the token from Task Config to Secret 👉 did not help I'm wondering whether we can monitor the HTTP requests sent from
Flyte
to
Databricks
, probably that would shed some light on the issue. Is there an option for debug logging? Please note that I have already added the
--verbose
flag to
pyflyte
command, like this:
pyflyte --verbose run -i <docker_registry>.<http://dkr.ecr.us-east-1.amazonaws.com/|dkr.ecr.us-east-1.amazonaws.com/><prefix>/<image>:<version> --remote --destination-dir . dbx_example_job_cluster.py my_databricks_job
👉 did not help
It looks similar than this issue
we are also on flyte 1.5.0
could it be related?
148 Views