aloof-painting-18735
07/10/2023, 3:57 PMthankful-minister-83577
thankful-minister-83577
salmon-refrigerator-32115
07/10/2023, 7:54 PMglamorous-carpet-83516
07/11/2023, 1:08 AMglamorous-carpet-83516
07/11/2023, 4:21 AMglamorous-carpet-83516
07/11/2023, 6:18 AMcurl --netrc --request GET --header "Authorization: Bearer $DATABRICKS_TOEN" \
'<https://dbc-32fcad04-13c2.cloud.databricks.com/api/2.0/jobs/runs/get?run_id=306>'
glamorous-carpet-83516
07/11/2023, 6:19 AMaloof-painting-18735
07/11/2023, 7:21 AMglamorous-carpet-83516
07/11/2023, 10:22 AMloader must define exec_module() when running Databricks taskwhich version of python are you using
glamorous-carpet-83516
07/11/2023, 10:22 AMglamorous-carpet-83516
07/11/2023, 10:25 AM#3855 [BUG] Flyte task keeps running forever when running a Databricks jobbtw, Does the databricks job job succeed or fail?
aloof-painting-18735
07/11/2023, 11:10 AMbtw, Does the databricks job job succeed or fail?Databricks job succeeded
aloof-painting-18735
07/11/2023, 11:12 AMwhich version of python are you usingI'm running this job on DBR 11.3 LTS (in both cases), it has Python 3.9.5 (added this info to the ticket also)
aloof-painting-18735
07/11/2023, 11:47 AMbtw, could you try to send a get request to dbx by using curl?
{
"attempt_number": 0,
"cleanup_duration": 0,
"cluster_instance": {
"cluster_id": "<my-cluster-id>",
"spark_context_id": "<my-spark-context-id>"
},
"cluster_spec": {
"existing_cluster_id": "<my-cluster-id>"
},
"creator_user_name": "<my-username>",
"end_time": 1688987784820,
"execution_duration": 223000,
"format": "SINGLE_TASK",
"job_id": 1060720031042619,
"number_in_job": 574539,
"run_id": 574539,
"run_name": "dbx simplified example",
"run_page_url": "<my-run-page-url>",
"run_type": "SUBMIT_RUN",
"setup_duration": 41000,
"start_time": 1688987520036,
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "SUCCESS",
"state_message": "",
"user_cancelled_or_timedout": false
},
"task": {
"spark_python_task": {
"parameters": [
"pyflyte-fast-execute",
"--additional-distribution",
"s3://<my-s3-bucket>/flytesnacks/development/UMZ6XPNM4L6KL4YALV56QDMSX4======/script_mode.tar.gz",
"--dest-dir",
".",
"--",
"pyflyte-execute",
"--inputs",
"s3://<my-s3-bucket>/metadata/propeller/flytesnacks-development-ff83ea058624d44ddbe9/n0/data/inputs.pb",
"--output-prefix",
"s3://<my-s3-bucket>/metadata/propeller/flytesnacks-development-ff83ea058624d44ddbe9/n0/data/0",
"--raw-output-data-prefix",
"s3://<my-s3-bucket>/raw_data/sh/ff83ea058624d44ddbe9-n0-0",
"--checkpoint-path",
"s3://<my-s3-bucket>/raw_data/sh/ff83ea058624d44ddbe9-n0-0/_flytecheckpoints",
"--prev-checkpoint",
"\"\"",
"--resolver",
"flytekit.core.python_auto_container.default_task_resolver",
"--",
"task-module",
"dbx_simplified_example",
"task-name",
"print_spark_config"
],
"python_file": "dbfs:/tmp/flyte/entrypoint.py"
}
}
}
(added to the ticket also)aloof-painting-18735
07/11/2023, 11:49 AMDid you see any error in the propeller pod while running databricks task?No, I didn't. It's pretty weird, I also expected some error logs, but haven't seen any - let me double-check
aloof-painting-18735
07/11/2023, 3:40 PMDid you see any error in the propeller pod while running databricks task?It's weird - I've just triggered a run (11/07/2023), but can't see any new logs in
flyteproperrel
. The latest logs are 4 days old.aloof-painting-18735
07/11/2023, 3:41 PMglamorous-carpet-83516
07/11/2023, 4:02 PMI’ve just triggered a run (11/07/2023)so the task is still running? and the databricks job is already completed.
aloof-painting-18735
07/12/2023, 7:17 AMglamorous-carpet-83516
07/12/2023, 7:24 AMaloof-painting-18735
07/12/2023, 7:24 AMflyteproperrel
is responsible for the task management. is there a way to monitor the HTTP traffic between flyteproperrel
and Databricks
?aloof-painting-18735
07/12/2023, 7:25 AMglamorous-carpet-83516
07/12/2023, 7:28 AMis there a way to monitor the HTTP traffic betweenneed to add more logs to the plugin
glamorous-carpet-83516
07/12/2023, 7:29 AMaloof-painting-18735
07/12/2023, 7:30 AMglamorous-carpet-83516
07/12/2023, 7:30 AMaloof-painting-18735
07/12/2023, 7:32 AMaloof-painting-18735
07/12/2023, 7:32 AMaloof-painting-18735
07/12/2023, 7:33 AMaloof-painting-18735
07/12/2023, 7:33 AMaloof-painting-18735
07/12/2023, 7:33 AMflyteadmin
logsaloof-painting-18735
07/12/2023, 7:34 AMglamorous-carpet-83516
07/12/2023, 7:34 AMaloof-painting-18735
07/12/2023, 7:34 AMaloof-painting-18735
07/12/2023, 7:35 AMflyteadmin
logglamorous-carpet-83516
07/12/2023, 8:11 AMpingsutw/flytepropeller:c04b9260a4f1fe17f30283b470525807357a01ec
aloof-painting-18735
07/12/2023, 8:13 AMflyteproperrel
image reference in our setup with the one you sharedaloof-painting-18735
07/12/2023, 8:18 AMflytepropeller:
enabled: true
manager: false
# -- Whether to install the flyteworkflows CRD with helm
createCRDs: true
# -- Replicas count for Flytepropeller deployment
replicaCount: 1
image:
# -- Docker image for Flytepropeller deployment
repository: pingsutw/flytepropeller # FLYTEPROPELLER_IMAGE
tag: c04b9260a4f1fe17f30283b470525807357a01ec # FLYTEPROPELLER_TAG
pullPolicy: IfNotPresent
glamorous-carpet-83516
07/12/2023, 8:25 AMglamorous-carpet-83516
07/12/2023, 8:26 AMaloof-painting-18735
07/12/2023, 8:26 AMaloof-painting-18735
07/12/2023, 8:55 AMtaskCtx.ResourceMeta
is initialized
The POST request that is creating the job is successfully completed, so probably we can presume that this part is completed successfully:
resp, err := <http://p.client.Do|p.client.Do>(req)
if err != nil {
return nil, nil, err
}
Probably something goes wrong here, right?
data, err := buildResponse(resp)
if err != nil {
return nil, nil, err
}
if data["run_id"] == "" {
return nil, nil, pluginErrors.Wrapf(pluginErrors.RuntimeFailure, err,
"Unable to fetch statementHandle from http response")
}
It is quite strange that we do not have any errors in the logs - my guess is that these errors should be propagated to the flyteproperrel
logs. Right?aloof-painting-18735
07/12/2023, 10:12 AMbillions-midnight-10687
07/12/2023, 10:14 AMbillions-midnight-10687
07/12/2023, 10:14 AMbillions-midnight-10687
07/12/2023, 10:14 AMflytepropeller-8574c869bb-d8bzv 0/1 Error 2 (23s ago) 33s
flytepropeller-8574c869bb-srwqd 0/1 CrashLoopBackOff 1 (16s ago) 24s
billions-midnight-10687
07/12/2023, 10:15 AMk logs flytepropeller-8574c869bb-srwqd -n flyte
exec /bin/flytepropeller: exec format error
billions-midnight-10687
07/12/2023, 10:15 AMbillions-midnight-10687
07/12/2023, 10:15 AMELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=yrkx_WEsTHfXYES1W5qM/G0DQz0Khz26svL-2chDO/BgntAhD_JOMfoTcNhtoP/JaCUuM92wwK3w0KL3Pzo, with debug_info, not stripped
billions-midnight-10687
07/12/2023, 10:15 AMbillions-midnight-10687
07/12/2023, 10:15 AMglamorous-carpet-83516
07/12/2023, 10:16 AMbillions-midnight-10687
07/12/2023, 10:16 AMglamorous-carpet-83516
07/12/2023, 10:31 AMbillions-midnight-10687
07/12/2023, 11:11 AMaloof-painting-18735
07/12/2023, 11:33 AMflyteproperrel
from the image, we got this in the logs again:
time="2023-07-12T11:24:12Z" level=info msg=------------------------------------------------------------------------
time="2023-07-12T11:24:12Z" level=info msg="App [flytepropeller], Version [unknown], BuildSHA [unknown], BuildTS [2023-07-12 11:24:12.690099108 +0000 UTC m=+0.023105387]"
time="2023-07-12T11:24:12Z" level=info msg=------------------------------------------------------------------------
time="2023-07-12T11:24:12Z" level=info msg="Detected: 8 CPU's\n"
{"json":{},"level":"warning","msg":"defaulting max ttl for workflows to 23 hours, since configured duration is larger than 23 [23]","ts":"2023-07-12T11:24:12Z"}
{"json":{},"level":"warning","msg":"stow configuration section missing, defaulting to legacy s3/minio connection config","ts":"2023-07-12T11:24:12Z"}
I0712 11:24:13.017134 1 leaderelection.go:248] attempting to acquire leader lease flyte/propeller-leader...
I0712 11:24:29.591775 1 leaderelection.go:258] successfully acquired lease flyte/propeller-leader
{"json":{"routine":"databricks-worker-1"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-1"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-2"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-2"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-4"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-4"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-0"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-0"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-6"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-6"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-8"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-8"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-5"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-5"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-9"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-9"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-3"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-3"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-7"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
{"json":{"routine":"databricks-worker-7"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-12T11:24:59Z"}
aloof-painting-18735
07/12/2023, 11:33 AMaloof-painting-18735
07/12/2023, 11:35 AMaloof-painting-18735
07/12/2023, 11:38 AMflyteproperrel
logsaloof-painting-18735
07/12/2023, 11:48 AMaloof-painting-18735
07/12/2023, 11:48 AMflyteproperrel
is trying to refresh the status of these tasksaloof-painting-18735
07/12/2023, 11:48 AMaloof-painting-18735
07/12/2023, 11:49 AMaloof-painting-18735
07/12/2023, 12:09 PMaloof-painting-18735
07/12/2023, 12:09 PMaloof-painting-18735
07/12/2023, 12:10 PMaloof-painting-18735
07/12/2023, 12:10 PMglamorous-carpet-83516
07/12/2023, 2:11 PMglamorous-carpet-83516
07/12/2023, 2:17 PMaloof-painting-18735
07/12/2023, 2:18 PMaloof-painting-18735
07/12/2023, 2:18 PMbillions-midnight-10687
07/13/2023, 9:56 AMbillions-midnight-10687
07/13/2023, 9:56 AMbillions-midnight-10687
07/13/2023, 9:57 AMbillions-midnight-10687
07/13/2023, 9:57 AM{"json":{"routine":"databricks-worker-0"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-13T09:51:46Z"}
{"json":{"routine":"databricks-worker-9"},"level":"error","msg":"worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-13T09:51:46Z"}
{"json":{"routine":"databricks-worker-9"},"level":"error","msg":"Failed to sync. Error: worker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper","ts":"2023-07-13T09:51:46Z"}
billions-midnight-10687
07/13/2023, 9:57 AMaloof-painting-18735
07/25/2023, 3:24 PMworker panic'd and is shutting down. Error: interface conversion: interface {} is databricks.ResourceMetaWrapper, not *databricks.ResourceMetaWrapper
is happening?glamorous-carpet-83516
09/26/2023, 8:07 PM