Howdy I am working to bring Azure AD authentication in stow Flyte #flyte-support

Howdy I am working to bring Azure AD authenticatio...

proud-answer-87162

10/25/2023, 4:08 PM

Howdy I am working to bring Azure AD authentication in stow (here) into flyte. As part of that, I'm adding a

RemoteURLInterface

for Azure (example). It seems the interface is only called when

remoteDataConfig.SignedURL.Enabled == true

, is that right? Also, I noticed that the aws implementation uses the s3 client directly to fetch a signed URL from a created Request object which does not appear to be used (beyond fetching the URL). I found that surprising; does anyone know why

RawStore.CreateSignedURL

isn't used for that purpose? It feels like offloading that responsibility to the store implementation (likely Stow) would work, while avoiding having to re-implement the logic in each concrete RemoteURL type.

thankful-minister-83577

10/28/2023, 1:49 AM

hey are you still working on this? I can reach out internally to find out more of the history of the code there.

thankful-minister-83577

10/28/2023, 1:50 AM

you are correct that this appears to use the s3 client directly and not go through stow. can’t speak to why that is.

thankful-minister-83577

10/28/2023, 1:51 AM

as for the url that’s returned however, that is used. this is part of the fast-registration/upload of offloaded data types flow used in flytekit.

thankful-minister-83577

10/28/2023, 1:52 AM

this is how your updated code gets sent to blob store.

👍 1

proud-answer-87162

10/30/2023, 2:40 PM

as for the url that’s returned however, that is used. this is part of the fast-registration/upload of offloaded data types flow used in flytekit.

ok, that makes sense. but the URL returned from s3 client and the URL returned from stow should be the same, right? i think stow has an s3 implementation, which would likely use the s3 client in the same manner as the flyte code

proud-answer-87162

10/30/2023, 6:04 PM

as for the url that’s returned however, that is used. this is part of the fast-registration/upload of offloaded data types flow used in flytekit.

right, the URL is used to fetch data from the store. but only when

remoteDataConfig.SignedURL.Enabled == true

, correct?

proud-answer-87162

11/29/2023, 5:36 PM

@thankful-minister-83577 - what are your thoughts on me adding presigned url support for the azure use case? is that a feature that the community wants to continue to support, or is relying on a workload identity reasonable for the azure integration?

thankful-minister-83577

11/30/2023, 2:02 AM

that would be awesome, but i don’t think we have the resources to do/test that.

thankful-minister-83577

11/30/2023, 2:02 AM

is that something you could help with?

proud-answer-87162

11/30/2023, 2:55 PM

yeah, i can work on that and try to put together some tests

🙇 1

proud-answer-87162

12/14/2023, 10:11 PM

> as for the url that’s returned however, that is used. this is part of the fast-registration/upload of offloaded data types flow used in flytekit. i did a little digging and I think the SAS url used by flytekit is actually generated from the

CreateUploadLocation

flow. flytekit has a method

get_upload_signed_url

which seems to be responsible for fetching the SAS and using it to upload the blob. i do see that the

WorkflowExecutionGetDataResponse

gets returned ~~but I don't think the urlblob (sas url) is actively used. i see it marked as deprecated and returning an empty string does not seem to affect the normal registration/run behavior.~~ if above is true, it seems that the
UrlBlob.bytes
is actively used, and presumably that could be handled differently. i think calling
remoteDataStoreClient.CreateSignedURL
is unnecessary. ~~thoughts on the above? i see you're on vacation, so no rush on a response~~

proud-answer-87162

12/14/2023, 10:19 PM

(disclaimer with everything above - i'm an amateur dev when it comes to python and go, so i easily could have misread or missed something)

proud-answer-87162

12/21/2023, 12:23 AM

@thankful-minister-83577 after a little more research I see some scenarios where the code attempts to use the SAS (from UrlBlob) but it doesn't appear to work. the URL is included in various responses to flytekit (and possibly other clients). If the blob size is larger than the configured

maxSizeInBytes

, and flyteremote tries to access an flyte resource (e.g.,

remote.fetch_execution()

_get_input_literal_map

attempts to use the SAS token. However, in both the existing AWS case and in the new Azure implementation, flytekit throws an error when trying to use the SAS token. In the existing AWS case flytekit reports the "file name is too long"; the service seems to use the SAS token and other query params when copying data to the local dir. For the new Azure implementation, the error is "IsADirectoryError: [Errno 21] Is a directory: '/var/folders/2m/lj3gvn411sbff510w8lgnv480000gv/T/flytemcf_achx/control_plane_me tadata/local_flytekit/outputs.pb'"

proud-answer-87162

12/21/2023, 12:28 AM

it's entirely possible i am misunderstanding the intended use case for

RemoteURLInterface

, and i just stumbled upon a bug or an unexpected configuration. but it doesn't seem like the SAS produced by the remote-url feature is successfully used by either flyte or the clients i have tested with. do you have any insight into my findings above? if this is a deprecated feature as some comments suggest, does it need to be supported in the az implementation?

thankful-minister-83577

12/21/2023, 1:16 AM

sorry what’s SAS?

thankful-minister-83577

12/21/2023, 1:21 AM

each signed url is used to upload one file, not a directory. if you want to upload a whole directory, you’d have to call it once per file unf.

thankful-minister-83577

12/21/2023, 1:35 AM

if i were to guess, the isadirectoryerror is just azure saying that the request should only have

outputs.pb

not the whole path but that’s just a guess. do you have a draft pr that we could take a look at? copy here the request payload that is being sent?

proud-answer-87162

12/21/2023, 3:10 AM

sorry, SAS = Shared Access Signature (pre-signed URL)

proud-answer-87162

12/21/2023, 3:11 AM

here is the draft PR with azure impl

proud-answer-87162

12/21/2023, 3:15 AM

i didn't think

IsADirectoryError

comes from the python code rather than azure, but i'll track that down tomorrow morning.

proud-answer-87162

12/21/2023, 3:16 AM

but it seemed strange to me that the AWS flow had a different error in the same

_assign_inputs_and_outputs

flow. is that expected to work with presigned urls and the flyteremote api?

thankful-minister-83577

12/21/2023, 4:56 AM

can you take a look at https://github.com/flyteorg/flyte/blob/master/flyteadmin/pkg/data/implementations/aws_remote_url.go

thankful-minister-83577

12/21/2023, 4:56 AM

that’s the one for aws

thankful-minister-83577

12/21/2023, 4:57 AM

i don’t see the azure client anywhere in the pr

proud-answer-87162

12/21/2023, 2:43 PM

right, i'm relying on the stow client to construct the signed url

proud-answer-87162

12/21/2023, 2:43 PM

i can confirm that the URL is generated and can be used to access the file

proud-answer-87162

12/21/2023, 3:47 PM

and maybe before diving into the azure case - is AWS supposed to work? because using

remoteData

inline config with flyte-binary does not work for me locally with my configuration

thankful-minister-83577

12/22/2023, 2:04 AM

aws definitely works

thankful-minister-83577

12/22/2023, 2:04 AM

that’s the default one.

thankful-minister-83577

12/22/2023, 7:26 AM

i don’t understand the setup exactly though in your case.

thankful-minister-83577

12/22/2023, 7:27 AM

the default production deployment with single binary definitely works.

thankful-minister-83577

12/22/2023, 7:28 AM

it’s been a while since i’ve tried running the single binary locally, using real AWS instead of minio, but that also works (you just need to add in creds). and certainly using minio as s3 standin definitely works.

proud-answer-87162

12/22/2023, 3:07 PM

to be clear, AWS works for me in 90% of use cases. it works for running launchplans and passing data between tasks, it works when triggered from

pyflyte

, and it works when loading data in the UI. i only see a failure when setting

signedUrls: true

and

maxByteSize

to a low limit, then fetching a resource using flyteremote. this is the config i added to flyte-binary values.yaml:

Copy code

inline:
    remoteData:
      region: us-west-2
      scheme: aws
      maxSizeInBytes: 1
      signedUrls:
        enabled: true
        durationMinutes: 5

proud-answer-87162

12/22/2023, 11:21 PM

i figured out the s3 failure. s3 presigned urls can be quite long, and the call in

data_persistence

dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs)

will fail if

from_path

is over some char limit (at least on macOS). fsspec does not appear to have a configuration to use the full

from_path

to fetch the data but only a portion of it for the

dst

result. it's simple to repro this with a local test, assuming the url is long enough to trigger the failure.

proud-answer-87162

12/22/2023, 11:24 PM

i'm still tracking down the underlying az issue

proud-answer-87162

01/05/2024, 5:31 PM

@thankful-minister-83577 i opened a question on the fsspec board about the presigned URL not working. they confirmed that it's an issue but they aren't sure how they would resolve. i also found the issue with the az implementation. it gets past the long URL issue because az creates a smaller URL but then fails when calling

_get_output_literal_map

. this is due to another fsspec quirk: calling

get

with a URL for

rpath

and a filename for

lpath

results in

lpath

being used as a path dir. e.g., if

rpath

http://mydomain/outpub.pb

and

lpath

/var/output.pb

a file gets written at

/var/outpub.pb/output.pb

. this causes an issue in the flytekit code because

remote._get_output_literal_map()

creates

tmp_name

with the filename ("output.pb") and then uses

tmp_name

as the location for the resulting file. in reality, that path is a dir that houses the actual file. to demonstrate what works with fsspec i put together a hacky update to

remote._get_output_literal_map()

for my particular use case:

Copy code

elif execution_data.outputs.bytes > 0:
            with self.remote_context() as ctx:
                tmp_name = os.path.join(ctx.file_access.local_sandbox_dir)
                file_name = execution_data.outputs.url.rsplit("/", 1)[1]
                ctx.file_access.get_data(execution_data.outputs.url, tmp_name)
                return literal_models.LiteralMap.from_flyte_idl(
                    utils.load_proto_from_file(literals_pb2.LiteralMap, tmp_name + "/" + file_name)
                )

👀 1

proud-answer-87162

01/05/2024, 5:32 PM

two changes were necessary: 1.

tmp_name

represents the dir the local file is written to (as opposed to a filename) 2. I construct the resulting file location from

tmp_name

and the file from remote url

proud-answer-87162

01/05/2024, 5:33 PM

this is obviously a bad prod solution, but i think it demonstrates the issue and gives some hints at a real fix

proud-answer-87162

01/08/2024, 9:46 PM

@thankful-minister-83577 here is the PR with an explanation of my findings and links for additional context.

excellent 1

thankful-minister-83577

01/09/2024, 6:37 AM

clarification, when you say “`rpath` is

<http://mydomain/outpub.pb>

and

lpath

/var/output.pb

” do you mean that the other way around? that is, in this call here in the function of interest,

execution_data.outputs.url

<http://mydomain/outpub.pb>

and

tmp_name

/var/outpubs.pb

thankful-minister-83577

01/09/2024, 6:38 AM

if that’s the case that really shouldn’t be happening. treating the right argument (the destination argument) as a directory should only happen if recursive is on (the

is_multipart

arg).

proud-answer-87162

01/09/2024, 2:51 PM

execution_data.outputs.url

<http://mydomain/output.pb>

and

tmp_name

/var/output.pb/output.pb

, which results in

rpath

(remote) ==

<http://mydomain/output.pb>

and

lpath

(local) ==

/var/output.pb

proud-answer-87162

01/09/2024, 2:54 PM

> that really shouldn’t be happening. treating the right argument (the destination argument) as a directory should only happen if recursive is on can you point me to the flytekit code that enforces that behavior? I think it's the fsspec code (

utils.py:other_paths()

) that does the concatenation

proud-answer-87162

01/09/2024, 3:03 PM

data_persistence

get_data

and

get

are just passthroughs if

recursive == false

thankful-minister-83577

01/09/2024, 4:45 PM

why is

tmp_name

getting set to

/var/output.pb/output.pb

thankful-minister-83577

01/09/2024, 4:46 PM

it should be getting set in this line https://github.com/flyteorg/flytekit/blob/b5dbd857a5b81757d13d03116574ba705ea8798c/flytekit/remote/remote.py#L1893

thankful-minister-83577

01/09/2024, 4:46 PM

so it should be the local sandbox dir.

proud-answer-87162

01/09/2024, 4:46 PM

tmp_name

is set to

/var/output.pb

thankful-minister-83577

01/09/2024, 4:46 PM

this one rather: https://github.com/flyteorg/flytekit/blob/b5dbd857a5b81757d13d03116574ba705ea8798c/flytekit/remote/remote.py#L1906

proud-answer-87162

01/09/2024, 4:47 PM

that gets used as input into

other_paths

thankful-minister-83577

01/09/2024, 4:47 PM

it sounds like the setting for the local sandbox dir is getting messed up?

proud-answer-87162

01/09/2024, 4:51 PM

i don't think so - the flytekit code works correctly if

other_paths

behaves as you expect. but if

exists == true

, instead of using

lpath

with a filename as the destination, it uses it to form the final path. you can see this by looking at a test in fsspec

test_utils.py

(["/path1"], "/path2", True, ["/path2/path1"])

proud-answer-87162

01/09/2024, 4:52 PM

and

exists == true

in this use case

thankful-minister-83577

01/09/2024, 4:59 PM

thankful-minister-83577

01/09/2024, 5:00 PM

so if you remove

/var/outputs.pb

then the problem doesn’t happen?

proud-answer-87162

01/09/2024, 5:02 PM

what do you mean by remove?

thankful-minister-83577

01/09/2024, 5:02 PM

rm /var/outputs.pb

proud-answer-87162

01/09/2024, 5:02 PM

it gets recreated on the next run

proud-answer-87162

01/09/2024, 5:03 PM

and

/var/outputs.pb

is a dir. the file is at

/var/outputs.pb/outputs.pb

thankful-minister-83577

01/09/2024, 5:07 PM

but if you

rm

then exists will be false the first time right?

proud-answer-87162

01/09/2024, 5:15 PM

exists is based on `rpath`:

Copy code

exists = source_is_str and (
                (has_magic(rpath) and source_is_file)
                or (not has_magic(rpath) and dest_is_dir and source_not_trailing_sep)

proud-answer-87162

01/09/2024, 5:15 PM

for the

get

thankful-minister-83577

01/09/2024, 5:17 PM

but rpath is the destination isn’t it? rpath is

/var/outputs.pb

thankful-minister-83577

01/09/2024, 5:17 PM

if this is a get… the lpath, which is the origin, should always exist

proud-answer-87162

01/09/2024, 5:18 PM

rpath

is the remote file/source in the

_get

operation

proud-answer-87162

01/09/2024, 5:18 PM

lpath is the dest

thankful-minister-83577

01/09/2024, 5:52 PM

oh got it, sorry, see their naming convention now

thankful-minister-83577

01/09/2024, 5:52 PM

but still curious, if you

rm

the local file, make sure it doesn’t exist either as a directory or a file, does the problem persist?

proud-answer-87162

01/09/2024, 5:56 PM

it does because

exists

is a test for the

rpath

(source)

proud-answer-87162

01/09/2024, 5:57 PM

the file gets written to

/var/outputs.pb/output.pb

but flytekit expects the file to be at

/var/outputs.pb

proud-answer-87162

01/09/2024, 5:58 PM

so when it tries to open

/var/outputs.pb

IsADirectoryError

gets thrown

proud-answer-87162

01/09/2024, 5:59 PM

Copy code

def load_proto_from_file(pb2_type, path):
    with open(path, "rb") as reader:
        out = pb2_type()
        out.ParseFromString(reader.read())
        return out

thankful-minister-83577

01/09/2024, 6:04 PM

not sure if it matters but what fsspec filesystem are you using? http?

proud-answer-87162

01/09/2024, 6:04 PM

yes, http

thankful-minister-83577

01/09/2024, 6:04 PM

the example rpath you gave had http

thankful-minister-83577

01/09/2024, 6:04 PM

thankful-minister-83577

01/09/2024, 6:05 PM

let me try to repro this without any flytekit code

👍 1

proud-answer-87162

01/09/2024, 6:05 PM

fsspec has fairly good unit tests to demonstrate the behavior

thankful-minister-83577

01/09/2024, 6:05 PM

i guess…

thankful-minister-83577

01/09/2024, 6:06 PM

but like, what you’re describing is a pretty big problem…

thankful-minister-83577

01/09/2024, 6:06 PM

basically it renders flyte remote unable to download inputs/outputs

thankful-minister-83577

01/09/2024, 6:06 PM

i feel like we would’ve seen it come up before

proud-answer-87162

01/09/2024, 6:08 PM

only under certain conditions: a)

remoteData

is configured to set

signedUrls:true

b) ~~the presigned URL is larger than n (file system limit) characters (common for S3)~~ this is only needed for the lpath size issue. not necessary for the incorrect path bug c) the payload is larger than

maxSizeInBytes

FlyteRemote

is used to interact with the execution - I am not 100% confident in this, but it's the only use case I could find

proud-answer-87162

01/09/2024, 6:09 PM

but i could also have something configured incorrectly that is exposing improper behavior. i'd love that to be true!

proud-answer-87162

01/09/2024, 6:14 PM

and this is a simple test to demonstrate the fsspec behavior:

Copy code

def test_get_output():
    kwargs = {}
    spec = fsspec.implementations.http.HTTPFileSystem(fsspec.filesystem("https", **kwargs))
    frompath = linkToUrlFileile
    topath = "/var/inputs10.txt"
    response = spec.get(frompath, topath, **kwargs)
    print(response)

proud-answer-87162

01/09/2024, 6:15 PM

note that this behavior also depends on from/to paths being strings instead of lists. if they are both lists the lists are not mutated so the dest location remains valid

thankful-minister-83577

01/09/2024, 6:15 PM

trying…

thankful-minister-83577

01/09/2024, 6:15 PM

but i still don’t get it.

thankful-minister-83577

01/09/2024, 6:16 PM

condition c above: payload larger than max size, how does that affect the fsspec function call inputs?

proud-answer-87162

01/09/2024, 6:16 PM

it doesn't, but it affects how flytekit interacts with fsspec. if the payload is inline then flytekit doesn't try to fetch the remote file

thankful-minister-83577

01/09/2024, 6:17 PM

oh got it

proud-answer-87162

01/09/2024, 6:20 PM

and i haven't tested this theory, but it might also require tasks to pass raw data rather than a FlyteFile

proud-answer-87162

01/09/2024, 6:23 PM

and as i reminder, this issue only pops if you get past the URL issue discussed here. i suspect no one is using the above configuration with S3 or that bug would have been reported already.

thankful-minister-83577

01/09/2024, 6:25 PM

well this is the test i’m using.

thankful-minister-83577

01/09/2024, 6:25 PM

Copy code

import fsspec
import shutil
import os


def test_file_handling():
    s3kwargs = {'cache_regions': False, 'key': 'minio', 'secret': 'miniostorage',
                'client_kwargs': {'endpoint_url': '<http://localhost:30002>'}}

    s3_file = "<s3://my-s3-bucket/data/5w/acwl6plp7ps4fwpsgrzb-n1-0/264513408320143fe0a02bf95c023505/00000>"
    sss = fsspec.filesystem("s3", **s3kwargs)

    local_dir = "/Users/ytong/temp/fss_output"
    try:
        shutil.rmtree(local_dir)
    except FileNotFoundError:
        ...

    sss.get(s3_file, os.path.join(local_dir, "from_s3"))


def test_file_handling_http():
    local_dir = "/Users/ytong/temp/fss_output"
    try:
        shutil.rmtree(local_dir)
    except FileNotFoundError:
        ...

    http_fs = fsspec.filesystem("http")

    http_signed_url = "<http://localhost:9000/my-s3-bucket/data/5w/acwl6plp7ps4fwpsgrzb-n1-0/264513408320143fe0a02bf95c023505/00000?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AIY0FO8J1300DN5XZSXF%2F20240109%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240109T181127Z&X-Amz-Expires=604800&X-Amz-Security-Token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJBSVkwRk84SjEzMDBETjVYWlNYRiIsImV4cCI6MTcwNDg2NzAxNSwicGFyZW50IjoibWluaW8ifQ.MmX5eqMsw2e77Pd90EiYaQtyFZRDg0jZJRWo4QEefxV8PeLvszALvCiXW8HRenpamY-Y-roSwGhoYDKRKGcCHQ&X-Amz-SignedHeaders=host&versionId=null&X-Amz-Signature=72d2bae5a221adb9d7044a38210e365dbf969939b7ba9b8fa9d44117cc041f8e>"
    http_fs.get(http_signed_url, os.path.join(local_dir, "from_http"))

thankful-minister-83577

01/09/2024, 6:25 PM

this is using the local setup.

thankful-minister-83577

01/09/2024, 6:26 PM

and i’m running

kf port-forward svc/flyte-sandbox-minio 9000:9000

to get the port 9000 to show up

thankful-minister-83577

01/09/2024, 6:26 PM

and yeah, can confirm i’m running into the lpath too long issue, because it’s incorrectly computing lpath. this doesn’t happen in the s3 case.

👍 1

thankful-minister-83577

01/09/2024, 6:26 PM

we do use signed urls though.

proud-answer-87162

01/09/2024, 6:27 PM

yeah, for the

put

case. i haven't debugged that because i know it works, but i suspect it's related to how fsspec uses

lpath

to construct

rpath

in a different manner

proud-answer-87162

01/09/2024, 6:28 PM

fsspec can fetch the file, it just can't write it locally (

_get

use case)

thankful-minister-83577

01/09/2024, 6:52 PM

i’ve had issues understanding other_paths before

thankful-minister-83577

01/09/2024, 6:53 PM

thankful-minister-83577

01/09/2024, 6:53 PM

did you understand this?

thankful-minister-83577

01/09/2024, 6:54 PM

cuz in my case, the http doesn’t end in a

thankful-minister-83577

01/09/2024, 6:54 PM

Copy code

"<http://localhost:9000/my-s3-bucket/data/5w/acwl6plp7ps4fwpsgrzb-n1-0/264513408320143fe0a02bf95c023505/00000?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AIY0FO8J1300DN5XZSXF%2F20240109%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240109T181127Z&X-Amz-Expires=604800&X-Amz-Security-Token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJBSVkwRk84SjEzMDBETjVYWlNYRiIsImV4cCI6MTcwNDg2NzAxNSwicGFyZW50IjoibWluaW8ifQ.MmX5eqMsw2e77Pd90EiYaQtyFZRDg0jZJRWo4QEefxV8PeLvszALvCiXW8HRenpamY-Y-roSwGhoYDKRKGcCHQ&X-Amz-SignedHeaders=host&versionId=null&X-Amz-Signature=72d2bae5a221adb9d7044a38210e365dbf969939b7ba9b8fa9d44117cc041f8e>"

proud-answer-87162

01/09/2024, 7:00 PM

yeah, some of their documentation is wrong, or at least misleading

proud-answer-87162

01/09/2024, 7:01 PM

but he was also incorrect in describing my problem initially. even without a trailing

the behavior is the same if

exists == true

proud-answer-87162

01/09/2024, 7:01 PM

i think he was originally talking about the

put

use case, where

exists == false

proud-answer-87162

01/09/2024, 7:02 PM

the first thing

other_paths

does is strip trailing

, heh

thankful-minister-83577

01/09/2024, 7:08 PM

i don’t think exists is a good variable name…

proud-answer-87162

01/09/2024, 7:08 PM

it is not

proud-answer-87162

01/09/2024, 7:09 PM

i think they need to break apart that method - it's doing too much and handles too many use cases

thankful-minister-83577

01/09/2024, 7:09 PM

though i guess i’m not sure what has magic is doing

thankful-minister-83577

01/09/2024, 7:09 PM

i guess in the meantime, we can them these single lists to work around the issue.

thankful-minister-83577

01/09/2024, 7:09 PM

but i really don’t like that

proud-answer-87162

01/09/2024, 7:09 PM

> though i guess i’m not sure what has magic is doing that confused me as well - seems to be looking for specific characters and comes back

True

for URLs

thankful-minister-83577

01/09/2024, 7:11 PM

http_fs.get([http_signed_url], [os.path.join(local_dir, "from_http")])

works, not sure if you mentioned that already above

proud-answer-87162

01/09/2024, 7:11 PM

yeah, i did test that. but agree that it's a kinda ugly sln

proud-answer-87162

01/09/2024, 7:11 PM

but might be the only path until fsspec fixes their behavior

thankful-minister-83577

01/09/2024, 7:12 PM

yeah

proud-answer-87162

01/09/2024, 7:13 PM

alternatively... does flyte{kit} need to support signed URLs for these use cases? no one appears to be using it and the api response model suggests the property (url) is deprecated

thankful-minister-83577

01/09/2024, 7:17 PM

it should still work for sure. we use signed urls a fair bit and it’s the correct thing to use if the literal map is too big

👍 1

thankful-minister-83577

01/09/2024, 7:18 PM

would you mind/did you already put in a flyte org issue?

proud-answer-87162

01/09/2024, 7:19 PM

i have not yet but will file something today

thankful-minister-83577

01/09/2024, 7:20 PM

tyty

thankful-minister-83577

01/09/2024, 7:20 PM

sorry for my confusion earlier

thankful-minister-83577

01/09/2024, 7:20 PM

i thought rpath was right path, not remote path, and lpath left, rather than local

proud-answer-87162

01/09/2024, 7:22 PM

yeah, that naming convention took me a while to wrap my head around, hah

proud-answer-87162

01/09/2024, 7:22 PM

so i totally understand the confusion

🙇 1

proud-answer-87162

01/09/2024, 7:49 PM

i created a bug for each issue: file too long and misuse of lpath

thankful-minister-83577

01/09/2024, 7:50 PM

if fsspec fixes the behavior, that will resolve both of these right?

thankful-minister-83577

01/09/2024, 7:51 PM

thank you

proud-answer-87162

01/09/2024, 7:54 PM

if fsspec fixes the behavior, that will resolve both of these right?

4701 will be fixed if fsspec fixes the bug. but 4700 will still exist because flytekit expects lpath to be

/var/output.pb

but instead it has been updated to

/var/output.pb/output.pb

proud-answer-87162

01/09/2024, 7:58 PM

i think there are a few reasonable solutions for 4701. a small change to how

_get_output_literal_map

constructs/uses

tmp_name

is probably the simplest

thankful-minister-83577

01/09/2024, 8:07 PM

but if i

cp /file/a /other/b

linux doesn’t put it under /other/b/a

thankful-minister-83577

01/09/2024, 8:07 PM

let me think about it, i think the [] hack also works

👍 1

proud-answer-87162

01/09/2024, 8:09 PM

yeah, i think fsspec can/should clean up their API. but it will be a breaking change so my guess is they'll add a different interface or throw on another bool flag, and that's if they agree with our position...

👌 1

proud-answer-87162

01/10/2024, 8:05 PM

I added another comment to the fsspec question to make it clear that we think there are two bugs (or at least unexpected behaviors) in their lib. let me know if you think it represents our position accurately

👌 1

40 Views

Open in Slack

Previous Next