Hi, everyone! I was wondering: is there a simple w...
# flytekit
m
Hi, everyone! I was wondering: is there a simple way to execute file persistence in Flyte? My use case: I have a workflow that generates a dataset. I want to save that dataset somewhere so I can call it on my training workflow without regenerating it. Is it possible? When I return a
FlyteFile
, the path is not really obvious/human-friendly.
j
you can forcefully map the remote_path of the FlyteFile when you generate one, like
Copy code
return FlyteFile(path=local_path, remote_path=somewhere_remote)
it should put your data there and you can configre paths and names to be human readable
m
The remote path can be the S3 bucket Flyte uses? Can I do, for instance,
return FlyteFile(path='test.pickle', remote_path='<s3://my-s3-bucket/flyte-example/say_hello.pickle>')
and then call another function to retrieve the file on the bucket?
I'm asking that because I tried just that, with the function:
Copy code
@task
def retrieve_from_s3(uri: PythonPickledFile) -> str:
    uri.download()
    with open(uri.path, 'r') as handle:
        x = pickle.load(handle)
    return x
and it doesn't even run; in fact, I'm receiving a 500 error from the server.
j
is PythonPickedFile extends FlyteFile?
if so you can just open it like a file
Copy code
with open(uri, 'r') as handle:
FLyte should be able to automically download it for you
you should see a log saying flyte is downloading file
Copy code
2022-03-10 08:25:02,782 [INFO] Entering timed context: Copying (<gs://remote_file> -> /tmp/flytequjfq95e/local_flytekit/local_file)
something like above
m
Copy code
Message:

    Failed to get data from <s3://my-s3-bucket/flyte-example/say_hello.pkl> to /tmp/flytek5pbstuv/local_flytekit/5186be74bee2a9b806fa2c15e358371b/say_hello.pkl (recursive=False).

Original exception: Called process exited with error code: 1.  Stderr dump:

b'fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden\n'

User error.
the 500 error happened because I wasn't doing fast serialization. now i am, and this just happened
j
oh 403, that is a permission error right?
maybe flyte does not have write or read access to your path
m
I thought so... It wouldn't make sense to be able to mess around on the S3 bucket like that.
But, at the same time, it's weird, because I think I was able to write, but I can't read files.
j
is there a bucket specific permissions for your buckets?
m
I did not change any permissions. I'm trying to execute this inside the sandbox bucket
s
Hi, @Matheus Moreno! Are you still seeing this error?
m
Hey @Samhita Alla! I actually gave up trying to use this method 😅 it was for our hackathon project, and since we are on a tight schedule, I ended up deciding to save the desired file (a generated dataset) on a public GCS bucket for now
But I'll try again soon
214 Views