I’m curious if there is a way to configure Structu...
# flyte-support
a
I’m curious if there is a way to configure StructuredDatasets to use an s3 bucket for the remote directories during local executions of tasks / workflows? Maybe there is a setting that I can override and pass in a base s3 path?
b
I don’t think this is currently supported… @glamorous-carpet-83516 @high-accountant-32689? To make sure I understand, you want to use s3 directly (instead of the local fs) when you’re running flyte tasks locally, correct?
I know FlyteFile has a
remote_path
argument, but even for that I’m not sure if that only applies when the task is executed on a Flyte cluster.
a
This is a super weird case. So it’s really all good if this is not supported.
Basically running Flyte tasks in a Databricks notebook fail because they don’t let spark write to the driver’s local file system. They mount everything to a file store called dbfs. So if you write to /temp/, the file is actually stored /dbfs/temp.
It’s just an odd choice on their end to do it like that
This has nothing to do with the plug-in / working with a remote execution @glamorous-carpet-83516.
g
I think you can, but you have to export aws credentials.
Copy code
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_ENDPOINT=<https://s3.amazonaws.com>
export AWS_DEFAULT_REGION=us-east-2
and use something like @task def t1(a: StructuredDataset) : … t1(a=StructuredDataset(uri=“s3://bucket/key”))
🙌 1
a
I’ll give that a shot. Thank you!
g
no problem
b
we should definitely document this use case… @abundant-hamburger-66584 would you mind creating a docs issue for this? [flyte-docs] 👇
👍 2
a
I haven’t forgotten about this! Gonna do it tomorrow.
💯 1
152 Views