salmon-refrigerator-32115
04/18/2024, 11:49 PMinput_data_query = f'select ... FROM bolt_feature_engg_prod.gold.audience_segments_feature_set where ...'
input_data = spark.sql(input_data_query)
feature_data=input_data.toPandas()
I need to convert the Databricks process to a Flyte workflow and I would like to re-use the SQL query because it is a very complicated query.
Also because my upstream to produce the feature data cannot be changed and must be stored in the same Databricks Catalog table.
How can I do that?
Thanks!proud-answer-87162
04/19/2024, 1:18 AMTask
? if so, you can use the python databricks SDK to run arbitrary queries that you have already defined:
"""
Doc's found here. <https://learn.microsoft.com/en-us/azure/databricks/dev-tools/python-sql-connector>
"""
from databricks import sql
stmt = "SELECT ..."
with sql.connect(server_hostname=server_hostname,
http_path=http_path,
access_token=access_token,
**kwargs) as connection:
with connection.cursor() as cursor:
cursor.execute(stmt)
return cursor.fetchall()
salmon-refrigerator-32115
04/19/2024, 3:34 AMproud-answer-87162
04/19/2024, 1:01 PMsalmon-refrigerator-32115
04/21/2024, 8:52 PMproud-answer-87162
04/22/2024, 2:23 PMsalmon-refrigerator-32115
04/22/2024, 3:45 PMsalmon-refrigerator-32115
05/04/2024, 12:07 AMsalmon-refrigerator-32115
05/04/2024, 12:08 AMproud-answer-87162
05/06/2024, 3:29 PMUSING some_staging_table as S
proud-answer-87162
05/06/2024, 3:30 PMproud-answer-87162
05/06/2024, 3:30 PMsalmon-refrigerator-32115
05/06/2024, 5:07 PMproud-answer-87162
05/06/2024, 5:08 PMsalmon-refrigerator-32115
05/06/2024, 5:10 PMproud-answer-87162
05/06/2024, 5:20 PMmerge
. if your source system guarantees write-once then copy to might work