Hi Team, i am trying to integrate the databricks w...
# flyte-support
h
Hi Team, i am trying to integrate the databricks with flyte.. Data is coming and UI is coming when i am using task and plugin code in single file. while i am creating ECR image and trying to integrate than it is connecting on local system while i am running pyflyte --remote --image<ECR image name> myfly.py wf it is throwing the following Error
[1/1] currentAttempt done. Last Error: UNKNOWN::Outputs not generated by task execution
Can anyone help me to solve this issue.
t
@handsome-noon-32363, the error basically means that no outputs have been generated by the task. So I guess there’s a problem with your task.
👍 1
h
This is our task details, i didn't find any error in this task and workflow,
Copy code
import typing
import pandas as pd
import numpy as np
#from databricks import sql
import os
from flytekit import task, workflow
import flyte_db_plugin as fdp

result= fdp.DatabricksTask("","").get_sql("select * from student")
@task
def compute_result(df:pd.DataFrame)-> pd.DataFrame:

    return df

@workflow
def wf1()->pd.DataFrame:
#    df = pd.DataFrame(result)

    return (compute_result(df=result))
#if __name__ == "__main__":
print(wf1())
t
Have you checked the pod logs? Are you able to retrieve the DataFrame successfully? It seems to me that the DataFrame generation is somehow failing.
I believe it’s working locally, right?
h
Yeah, it is working locally, it is also working if we create the separate task for the result like as
Copy code
@task
def generate_normal_df():
   l=[]
    
   result= DatabricksTask("","").get_sql(<any sql query>)
   print(result)
   for i in result:
         l=l.append(i)
         print(l)
         return l

@task
def compute_stats(df:pd.DataFrame) -> pd.DataFrame:
    return df

@workflow
def wf():
    return generate_normal_df()

if __name__ == "__main__":
    print(wf())
But if we are not creating a separate task for result and trying the result as an individual statement outside the task, it is giving error
t
Um, it should work when used as a separate task. Is there a way for you to look at the databricks logs?
h
ok @tall-lock-23197
It is working when used as a separate task but it is not working if we are not creating the task for retrieving the result
t
Have you checked the databricks logs?
h
@handsome-noon-32363, can you also share the code you're using for
DatabricksTask
? It feels a bit weird that we're calling
get_sql
and that itself is reaching out to databricks. Ideally
DatabricksTask
should behave similarly to SQLite3Task. Note how it's used in a workflow: https://github.com/flyteorg/flytesnacks/blob/master/cookbook/integrations/flytekit_plugins/sql/sqlite3_integration.py
157 Views