Tiansu Yu
04/07/2022, 9:39 AMYuvraj
04/07/2022, 10:11 AMTiansu Yu
04/07/2022, 12:29 PM@task
def get_file(url) -> FlyteFile:
return FlyteFile(url)
@task(task_config=Spark(...))
def spark_task(file: FlyteFile):
spark = flytekit.current_session().spark_session
df = spark.read.text(file)
@workflow
def pipeline(url="<gs://xxxx>"):
file = get_file(url)
spark_task(file)
Is this the right way to do it?Yuvraj
04/07/2022, 12:34 PMTiansu Yu
04/07/2022, 12:37 PMYuvraj
04/07/2022, 12:37 PMTiansu Yu
04/07/2022, 12:38 PM'FlyteFile' object has no attribute '_get_object_id'
Yuvraj
04/07/2022, 1:20 PMfile.download()
, @Samhita Alla Can you confirm the logic ?Samhita Alla
file.download()
is required.Tiansu Yu
04/07/2022, 1:46 PMfile.download()
should be feed into spark.read.text(..)
right?Yuvraj
04/07/2022, 1:54 PMKetan (kumare3)
Tiansu Yu
04/07/2022, 2:01 PMKetan (kumare3)
Tiansu Yu
04/07/2022, 2:08 PMKetan (kumare3)