I m curious if there is a way to configure StructuredDataset Flyte #flyte-support

I’m curious if there is a way to configure Structu...

abundant-hamburger-66584

01/25/2023, 9:34 PM

I’m curious if there is a way to configure StructuredDatasets to use an s3 bucket for the remote directories during local executions of tasks / workflows? Maybe there is a setting that I can override and pass in a base s3 path?

broad-monitor-993

01/25/2023, 10:04 PM

I don’t think this is currently supported… @glamorous-carpet-83516 @high-accountant-32689? To make sure I understand, you want to use s3 directly (instead of the local fs) when you’re running flyte tasks locally, correct?

broad-monitor-993

01/25/2023, 10:07 PM

I know FlyteFile has a

remote_path

argument, but even for that I’m not sure if that only applies when the task is executed on a Flyte cluster.

abundant-hamburger-66584

01/25/2023, 10:43 PM

This is a super weird case. So it’s really all good if this is not supported.

abundant-hamburger-66584

01/25/2023, 10:46 PM

Basically running Flyte tasks in a Databricks notebook fail because they don’t let spark write to the driver’s local file system. They mount everything to a file store called dbfs. So if you write to /temp/, the file is actually stored /dbfs/temp.

abundant-hamburger-66584

01/25/2023, 10:47 PM

It’s just an odd choice on their end to do it like that

abundant-hamburger-66584

01/25/2023, 10:47 PM

This has nothing to do with the plug-in / working with a remote execution @glamorous-carpet-83516.

glamorous-carpet-83516

01/25/2023, 10:50 PM

I think you can, but you have to export aws credentials.

Copy code

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_ENDPOINT=<https://s3.amazonaws.com>
export AWS_DEFAULT_REGION=us-east-2

and use something like @task def t1(a: StructuredDataset) : … t1(a=StructuredDataset(uri=“s3://bucket/key”))

🙌 1

abundant-hamburger-66584

01/25/2023, 10:52 PM

I’ll give that a shot. Thank you!

glamorous-carpet-83516

01/25/2023, 10:52 PM

no problem

broad-monitor-993

01/25/2023, 11:01 PM

we should definitely document this use case… @abundant-hamburger-66584 would you mind creating a docs issue for this? [flyte-docs] 👇

👍 2

user

01/25/2023, 11:01 PM

📘 Create a new Flyte Docs issue: https://github.com/flyteorg/flyte/issues/new?assignees=&labels=documentation%2Cuntriaged&template=docs_issue.yaml&title=%5BDocs%5D+

abundant-hamburger-66584

01/27/2023, 12:23 AM

I haven’t forgotten about this! Gonna do it tomorrow.

💯 1

152 Views

Open in Slack

Previous Next