I’m curious if there is a way to configure Structu...
# ask-the-community
e
I’m curious if there is a way to configure StructuredDatasets to use an s3 bucket for the remote directories during local executions of tasks / workflows? Maybe there is a setting that I can override and pass in a base s3 path?
n
I don’t think this is currently supported… @Kevin Su @Eduardo Apolinario (eapolinario)? To make sure I understand, you want to use s3 directly (instead of the local fs) when you’re running flyte tasks locally, correct?
I know FlyteFile has a
remote_path
argument, but even for that I’m not sure if that only applies when the task is executed on a Flyte cluster.
e
This is a super weird case. So it’s really all good if this is not supported.
Basically running Flyte tasks in a Databricks notebook fail because they don’t let spark write to the driver’s local file system. They mount everything to a file store called dbfs. So if you write to /temp/, the file is actually stored /dbfs/temp.
It’s just an odd choice on their end to do it like that
This has nothing to do with the plug-in / working with a remote execution @Kevin Su.
k
I think you can, but you have to export aws credentials.
Copy code
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_ENDPOINT=<https://s3.amazonaws.com>
export AWS_DEFAULT_REGION=us-east-2
and use something like @task def t1(a: StructuredDataset) : … t1(a=StructuredDataset(uri=“s3://bucket/key”))
e
I’ll give that a shot. Thank you!
k
no problem
n
we should definitely document this use case… @Evan Sadler would you mind creating a docs issue for this? [flyte-docs] 👇
e
I haven’t forgotten about this! Gonna do it tomorrow.
151 Views