• Matheus Moreno

    Matheus Moreno

    4 months ago
    Hi, everyone! I was wondering: is there a simple way to execute file persistence in Flyte? My use case: I have a workflow that generates a dataset. I want to save that dataset somewhere so I can call it on my training workflow without regenerating it. Is it possible? When I return a
    FlyteFile
    , the path is not really obvious/human-friendly.
  • Jay Ganbat

    Jay Ganbat

    4 months ago
    you can forcefully map the remote_path of the FlyteFile when you generate one, like
    return FlyteFile(path=local_path, remote_path=somewhere_remote)
    it should put your data there and you can configre paths and names to be human readable
  • Matheus Moreno

    Matheus Moreno

    4 months ago
    The remote path can be the S3 bucket Flyte uses? Can I do, for instance,
    return FlyteFile(path='test.pickle', remote_path='<s3://my-s3-bucket/flyte-example/say_hello.pickle>')
    and then call another function to retrieve the file on the bucket?
  • I'm asking that because I tried just that, with the function:
    @task
    def retrieve_from_s3(uri: PythonPickledFile) -> str:
        uri.download()
        with open(uri.path, 'r') as handle:
            x = pickle.load(handle)
        return x
    and it doesn't even run; in fact, I'm receiving a 500 error from the server.
  • Jay Ganbat

    Jay Ganbat

    4 months ago
    is PythonPickedFile extends FlyteFile?
  • if so you can just open it like a file
    with open(uri, 'r') as handle:
    FLyte should be able to automically download it for you
  • you should see a log saying flyte is downloading file
    2022-03-10 08:25:02,782 [INFO] Entering timed context: Copying (<gs://remote_file> -> /tmp/flytequjfq95e/local_flytekit/local_file)
    something like above
  • Matheus Moreno

    Matheus Moreno

    4 months ago
    Message:
    
        Failed to get data from <s3://my-s3-bucket/flyte-example/say_hello.pkl> to /tmp/flytek5pbstuv/local_flytekit/5186be74bee2a9b806fa2c15e358371b/say_hello.pkl (recursive=False).
    
    Original exception: Called process exited with error code: 1.  Stderr dump:
    
    b'fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden\n'
    
    User error.
  • the 500 error happened because I wasn't doing fast serialization. now i am, and this just happened
  • Jay Ganbat

    Jay Ganbat

    4 months ago
    oh 403, that is a permission error right?
  • maybe flyte does not have write or read access to your path
  • Matheus Moreno

    Matheus Moreno

    4 months ago
    I thought so... It wouldn't make sense to be able to mess around on the S3 bucket like that.
  • But, at the same time, it's weird, because I think I was able to write, but I can't read files.
  • Jay Ganbat

    Jay Ganbat

    4 months ago
    is there a bucket specific permissions for your buckets?
  • Matheus Moreno

    Matheus Moreno

    4 months ago
    I did not change any permissions. I'm trying to execute this inside the sandbox bucket
  • Samhita Alla

    Samhita Alla

    4 months ago
    Hi, @Matheus Moreno! Are you still seeing this error?
  • Matheus Moreno

    Matheus Moreno

    4 months ago
    Hey @Samhita Alla! I actually gave up trying to use this method 😅 it was for our hackathon project, and since we are on a tight schedule, I ended up deciding to save the desired file (a generated dataset) on a public GCS bucket for now
  • But I'll try again soon