Hello. I'm new to Flyte, trying to test setup on l...
# flytekit
p
Hello. I'm new to Flyte, trying to test setup on local sandbox before attempting AWS deploy. Running WSL2 Ubuntu on Windows with Docker Desktop to produce a localhost remote cluster. What I don't quite understand is how files are stored. e.g. running this within a workflow:
Copy code
# write to local path
out_path =  flytekit.current_context().working_directory
with open(os.path.join(out_path, 'test2.txt'), 'w') as f:
    f.write('blah')
        pth = f.name
return pth
If i run this locally (in Windows terminal) I get something like:
C:\Users\<user>\AppData\Local\Temp\flytel123\user_space123\test2.txt
If I run with --remote I get e.g.
/tmp/flyte-jp9gzohn/sandbox/local_flytekit/4bae375f282029a45f14f397429df2ff/test2.txt
I'd like to set up to use a folder (e.g. C:\Users\<user>\projectA\data and C:\Users\<user>\projectA\output) and run locally and then with --remote on the WSL2 hosted flytectl. Eventually will move to S3 but useful to have the local option for testing. C: is available in wsl under \mnt\c Ideas? Help?
k
hi @Peter Davidson firstly warmest welcome to the community. Thank you for joining
when running locally, files are stored in a temp directory.
but i dont quite understand the question, maybe a little more of a working example can help
p
Running the code, localy from windows returns:
C:\Users\<user>\test.txt
Running it --remote in the sandbox returns:
"\wsl.localhost\docker-desktop-data\data\docker\volumes\aeb5ab0e1bf3f4e829dc928ef3e407044f1b11ab1978dbcda4689a2bcacae606_data\overlay2\6b40fb222b5a39e3c0e167965f289f398522078b4374c427d5b271b2493426f0\diff\root\test.txt"
The remote path is not very useful. How do you direct the output of flyte to a specific folder? Is this a flyteconfig setting or should I uzse FlyteFile or? Similarly, I would like to be able to point it to input files. --- Code:
Copy code
import os
import flytekit
from flytekit import task, workflow


@task
def put_file() -> str:
    # write to local path
    out_path = os.path.expanduser('~')
    # out_path = flytekit.current_context().working_directory
    # out_path = '//root'
    with open(os.path.join(out_path, 'test.txt'), 'w') as f:
        f.write('blah')
        pth = f.name
    # with open(os.path.join(home, 'test.txt'), 'r') as f:
    #     pth = f.read()
    return pth


@workflow
def wf() -> str:
    return put_file()
s
You can send
--raw-data-prefix
to
pyflyte register
command to store the offloaded data.
p
Thanks. I need a bit more hand-holding. Say I run this:
pyflyte register flytest.py --raw-data-prefix C:\Users\<user>\flytest
What does this enable?
s
It’ll store the flyte offloaded data, i.e., the data returned from the flyte tasks in the given path.
p
thanks -> so how do I access this folder from within a workflow?
k
So when you pass a flytefile or a dataframe etc it is automatically stored and retrieved from the location in the background
This is done transparently so that it can be stored efficiently and relieves the burden of creating new locations. Also avoids corruption and makes the entire system portable
265 Views