<!channel> It’s time for our weekly Office Hours! ...
# events
h
<!channel> It’s time for our weekly Office Hours! I’ll be on the zoom link for the next 30 mins. Zoom link Add it to your calendar
m
I’ll take you up on it. I’m a bit of a Flyte noob but getting stuck on some core concepts
f
@Mike Seid did you end up joining? Curious to know what concepts you're struggling with because I've been struggling too! Lol
h
Have you had an issue joining @Mike Seid? happy to chat separately if you are still around
m
I joined at 10am but nobody was there so I left. I can share my issue I need help with. I was struggling with working with FlyteFiles. I’m using the flytkit-papermill plug in and I’m trying to move those files to my cloud storage location (GCS). Issue is when I use path, the run says the file isn’t there. When I do remote_path it fails saying remote_path is None. Here is my code:
Copy code
import os
import pathlib

import gcsfs
from flytekit import kwtypes, task
from flytekit.core.context_manager import FlyteContextManager
from flytekit.types.file import HTMLPage, PythonNotebook
from flytekitplugins.papermill import NotebookTask

notebook = NotebookTask(
    name="run-notebook",
    notebook_path=os.path.join(pathlib.Path(__file__).parent.absolute(), "train.ipynb"),
    inputs=kwtypes(a=int, b=int, uri=str),
    outputs=kwtypes(sum=int),
)


@task(cache_version="1", cache=True)
def store_notebook_run(
    notebook: PythonNotebook, html: HTMLPage, output_uri: str
) -> str:
    context = FlyteContextManager.current_context()

    fs = gcsfs.GCSFileSystem()

    fs.put(notebook.path, output_uri + "/notebook.ipynb")
    fs.put(html.path, output_uri + "/notebook.html")

    <http://context.user_space_params.logging.info|context.user_space_params.logging.info>("Writing files")

    return output_uri
h
If you want the local path, you should do:
Copy code
@task(cache_version="1", cache=True)
def store_notebook_run(
    notebook: PythonNotebook, html: HTMLPage, output_uri: str
) -> str:
    html.download()
    # use html.path
If you want the original remote path (“gcs:/….“) then you can use
remote_source
: https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.types.file.FlyteFile.html#flytekit.types.file.FlyteFile.remote_source
m
Hi @Haytham Abuelfutuh, thanks for the response. I just tried the remote_source and it’s causing the same issue, sayings it’s None. I also tried the download function but got the error in tests saying:
ValueError: Attempting to trigger download on non-downloadable file /tmp/flyte/20220302_145646/raw/a0bfeb4e5acae41e9c7b531ab4e295e5/train-out.ipynb
k
cc @Yee
y
how are you calling
store_notebook_run
@Mike Seid ? can you paste the workflow?
m
Copy code
import datetime

from flytekit import LaunchPlan, workflow
...

from mseid_schedule_notebook.tasks import notebook, store_notebook_run


@workflow
def train_notebook_workflow(
    styx_parameter: datetime.datetime, hades_overwrite: bool, uri_prefix: str
) -> int:

    ....

    # Use remote task to generate the storage uri
    generated_uri = ...

    result = notebook(a=100, b=200, uri=generated_uri)

    stored_uri = store_notebook_run(
        notebook=result.out_nb, html=result.out_rendered_nb, output_uri=generated_uri
    )

    return result.sum


lp_train_notebook = LaunchPlan.create(
    "train_notebook_workflow",
    train_notebook_workflow,
    fixed_inputs={
        "uri_prefix": "<gs://mseid-schedule-notebook-storage>",
    },
    default_inputs={
        "hades_overwrite": False,
    },
)
Thanks @Yee, I ommited some company specific code
y
do you want to hop on a call at 11:30?
h
i.e. in 25mins ^
🎯 1
m
Yeah. Sounds good. Let’s hop on a call @ 11:30. Mind sharing a link?
y
sorry for the delay, got busy yesterday @Mike Seid - https://github.com/flyteorg/flyte/issues/2070 was the issue we were talking about. https://github.com/flyteorg/flytekit/pull/877/files will clean up the code comment a bit.
also ran the stock papermill example on my local cluster using flytekit v0.31.0b4 (the latest beta) - couldn’t replicate the issue. but i heard you had it fixed?
m
yeah, we got it fixed. Root issue was not having an output location set up in my launch plan. The default was not set up for our deployment.
163 Views