Hi all I am investigating Flyte and are stuck on a...
# ask-the-community
e
Hi all I am investigating Flyte and are stuck on a problem. I uploaded a file to the minio s3 hosted by flytectl demo. I then supply the uri s3://my-s3-bucket/dataset/file.asf as Flytefile("s3://my-s3-bucket/dataset/file.asf") I get: Protocol not found Did you mean file:s3://my-s3-bucket/dataset/file.asf? I am new to S3 so maybe I am doing something wrong here Edit added task snippet:
Copy code
@task(cache=True, cache_version="2.0")
def process_video() -> FlyteFile:
    """
    Run clip_extract on video
    """
    input_path = FlyteFile("<s3://my-s3-bucket/dataset/test.asf>")
    output_file = FlyteFile("output.mjpg")
    command = ['ffmpeg', '-i', str(input_path), '-c:v', 'mjpeg', '-q:v', '3', '-an', str(output_file)]
    subprocess.check_call(command)
    return output_file
k
Can you elaborate a bit more on the error message- where do you see it
e
In the log of the failing node in the kubernetes dashboard.
k
You want to Download input_file first before passing the s3 path to ffmpeg
Can you do input_path.download
Aah I guess It won’t know how to download
Why don’t you make it as an input into the task
e
Is there any example how to work with flytefiles and s3?
k
Ya flytefiles will automatically be backed by s3
They are persistent
But if you want to explicitly use it by not passing it - cc @Yee - we need a downloaded right
s
I don't think we might want to use
FlyteFile
within a task. It's usually to handle the communication between the tasks, right?
download()
won't work here because it's be a noop in this case. So if there's a file in minio and you aren't passing that as an input to a Flyte task, I believe it needs to be downloaded manually using, say,
get_data
. https://github.com/flyteorg/flytekit/blob/34f80ba12eda64431be4c21c78df81b7afbe2758/flytekit/types/file/file.py#L361
Also,
output_file = FlyteFile("output.mjpg")
needs to be
output_file = "output.mjpg"
and you'll need to return
FlyteFile(output_file)
.
153 Views