anantharaman janakiraman
01/17/2024, 4:01 PMKetan (kumare3)
Ketan (kumare3)
anantharaman janakiraman
01/17/2024, 4:21 PManantharaman janakiraman
01/17/2024, 4:21 PMKetan (kumare3)
Ketan (kumare3)
anantharaman janakiraman
01/17/2024, 4:56 PManantharaman janakiraman
01/17/2024, 4:57 PMKetan (kumare3)
Kevin Su
01/17/2024, 8:10 PManantharaman janakiraman
01/17/2024, 9:11 PMKevin Su
01/17/2024, 9:12 PManantharaman janakiraman
01/17/2024, 9:23 PManantharaman janakiraman
01/25/2024, 12:09 AMKevin Su
01/25/2024, 12:18 AMKevin Su
01/25/2024, 12:28 AMKetan (kumare3)
delete_cluster(cluster_id: str):
...
@workflow(on_failure=delete_cluster())
def wf(cluster_id: str):
create_cluster()
do_job_1(cluster_id)
do_job_2(cluster_id)
delete_cluster(cluster_id)
anantharaman janakiraman
01/25/2024, 12:33 AMKetan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
project-domain-execution-id
anantharaman janakiraman
01/25/2024, 12:36 AManantharaman janakiraman
01/25/2024, 12:37 AMKetan (kumare3)
job_clusters
and tghen you can set the new_cluster
which allows setting autotermination_minutes
anantharaman janakiraman
01/25/2024, 1:09 AManantharaman janakiraman
01/25/2024, 1:14 AMKevin Su
01/25/2024, 1:14 AManantharaman janakiraman
01/25/2024, 1:17 AManantharaman janakiraman
01/25/2024, 1:17 AM{
"job_id": 53,
"settings": {
"name": "A job with multiple tasks",
"email_notifications": {},
"timeout_seconds": 0,
"max_concurrent_runs": 1,
"job_clusters": [
{
"job_cluster_key": "default_cluster",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "i3.xlarge",
"spark_conf": {
"spark.speculation": true
},
"aws_attributes": {
"availability": "SPOT",
"zone_id": "us-west-2a"
},
"autoscale": {
"min_workers": 2,
"max_workers": 8
}
}
},
{
"job_cluster_key": "data_processing_cluster",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "r4.2xlarge",
"spark_conf": {
"spark.speculation": true
},
"aws_attributes": {
"availability": "SPOT",
"zone_id": "us-west-2a"
},
"autoscale": {
"min_workers": 8,
"max_workers": 16
}
}
}
],
"tasks": [
{
"task_key": "ingest_orders",
"description": "Ingest order data",
"depends_on": [],
"job_cluster_key": "auto_scaling_cluster",
"spark_jar_task": {
"main_class_name": "com.databricks.OrdersIngest",
"parameters": [
"--data",
"dbfs:/path/to/order-data.json"
]
},
"libraries": [
{
"jar": "dbfs:/mnt/databricks/OrderIngest.jar"
}
],
"timeout_seconds": 86400,
"max_retries": 3,
"min_retry_interval_millis": 2000,
"retry_on_timeout": false
},
{
"task_key": "clean_orders",
"description": "Clean and prepare the order data",
"notebook_task": {
"notebook_path": "/Users/user@databricks.com/clean-data"
},
"job_cluster_key": "default_cluster",
"max_retries": 3,
"min_retry_interval_millis": 0,
"retry_on_timeout": true,
"timeout_seconds": 3600,
"email_notifications": {}
},
{
"task_key": "analyze_orders",
"description": "Perform an analysis of the order data",
"notebook_task": {
"notebook_path": "/Users/user@databricks.com/analyze-data"
},
"depends_on": [
{
"task_key": "clean_data"
}
],
"job_cluster_key": "data_processing_cluster",
"max_retries": 3,
"min_retry_interval_millis": 0,
"retry_on_timeout": true,
"timeout_seconds": 3600,
"email_notifications": {}
}
],
"format": "MULTI_TASK"
},
"created_time": 1625841911296,
"creator_user_name": "<mailto:user@databricks.com|user@databricks.com>",
"run_as_user_name": "<mailto:user@databricks.com|user@databricks.com>"
}
anantharaman janakiraman
01/25/2024, 1:19 AMKevin Su
01/25/2024, 1:20 AMShared job cluster for jobs/runs/submit API is not supported at the moment. (edited)if that’s the case, we need to add a task to create spark cluster first. need to playaround with it to know what they support right now.
Kevin Su
01/25/2024, 1:20 AManantharaman janakiraman
01/25/2024, 1:21 AManantharaman janakiraman
01/25/2024, 1:21 AMKevin Su
01/25/2024, 1:22 AMdbx task -> python task on k8s -> dbx task
anantharaman janakiraman
01/25/2024, 1:24 AMKevin Su
01/25/2024, 1:26 AMdbx -> dbx -> dbx
or
dbx
/
start-node -- dbx
\
dbx
anantharaman janakiraman
01/25/2024, 1:31 AMKevin Su
01/25/2024, 1:33 AManantharaman janakiraman
01/25/2024, 1:34 AManantharaman janakiraman
01/25/2024, 1:35 AManantharaman janakiraman
01/25/2024, 1:38 AManantharaman janakiraman
01/25/2024, 2:02 AMKetan (kumare3)
anantharaman janakiraman
01/25/2024, 3:17 AManantharaman janakiraman
01/25/2024, 3:19 AMKetan (kumare3)
anantharaman janakiraman
01/25/2024, 6:42 AManantharaman janakiraman
01/25/2024, 6:44 AMKetan (kumare3)
anantharaman janakiraman
01/25/2024, 7:13 AMKetan (kumare3)
anantharaman janakiraman
01/25/2024, 7:20 AMKetan (kumare3)
anantharaman janakiraman
01/25/2024, 6:07 PMKetan (kumare3)