I’m encountering a problem during workflow registr...
# ask-the-community
f
I’m encountering a problem during workflow registration when task arguments (e.g. an
nn.Module
) need to be serialized into blob storage. Instead of a bucket uri, a
/tmp/flyte-xyz
path is shown in the console: I expect that my flytekit/remote config is not correct and would appreciate some help to figure out where exactly. Thanks 🙏 More in the 🧵
I didn’t notice this problem so far since we typically pass dataclass_jsons with references to models in a bucket to our workflows/tasks instead of actual files. But this minimal working example results in the screenshot shown above:
Copy code
from flytekit import task, workflow
from torch import nn

class Config:  # enforce pickle transport
    def __init__(self):
        self.a = 1

@task
def train(cfg: Config, model: nn.Module):
    print(cfg)

@workflow
def wf():
    train(cfg=Config(), model=nn.Linear(10, 10))
I put a print statement into the
to_literal
method of the pickle transformer (here). It results in:
Copy code
execution state: ExecutionState(mode=None, working_dir=PosixPath('/tmp/flyte-z15pwqet/sandbox/local_flytekit'), engine_dir='/tmp/flyte-z15pwqet/sandbox/local_flytekit/engine_dir', branch_eval_mode=None, user_space_params=<flytekit.core.context_manager.ExecutionParameters object at 0x7f7a445d6520>)
So the
working_dir
and the
engine_dir
of the
ExecutionState
are the same as in the Flyte console screenshot.
I’m registering with this command:
Copy code
pyflyte register --project sandbox     --version ...     --destination-dir /home/flyte     --image ... test_pickle.py
My
~/.flyte/config.yaml
contains this:
Copy code
admin:
  ...
logger:
  show-source: true
  level: 0
storage:
  type: stow
  stow:
    kind: google
    config:
      json: ""
      project_id: <project_id>
      scopes: <https://www.googleapis.com/auth/devstorage.read_write>
Here and here it looks like the
default_local_file_access_provider
always uses a
/tmp/flyte-
path, without taking any config into consideration 🤔
s
Have you tried setting
--raw-data-prefix
in your
pyflyte register
command to the s3 bucket?
f
Yes,
--raw-data-prefix
(with a
gs://
uri) doesn’t have an effect on this, the console still shows
/tmp/flyte-
inputs 🤔
Should
--raw-data-prefix
have changed this? In the links I pasted here, it looks like the
/tmp/flyte-
paths are hardcoded.
s
Yeah.
--raw-data-prefix
should have changed this. What's the raw output prefix in the propeller config? cc @Eduardo Apolinario (eapolinario)
f
It is set to the same bucket as `userSettings.bucketName`:
rawoutput-prefix: gs://...
s
Eduardo / @Kevin Su, any idea what might be causing this issue?
Hey @Eduardo Apolinario (eapolinario) / @Yee, can you help Fabio please?
e
@Fabio Grätz, can you separate the default values for those inputs into a separate task? The thing is that at serialization time we have to evaluate those values (
Config
and
nn.Module
in the definition of the workflow in your example) locally. Essentially I'm saying:
Copy code
import pandas as pd
from flytekit import task, workflow
from torch import nn

class Config:  # enforce pickle transport
    def __init__(self):
        self.a = 1

@task
def get_default_config() -> Config:
    return Config()

@task
def get_default_model() -> nn.Module:
    return nn.Linear(10, 10)

@task
def train(cfg: Config, model: nn.Module):
    print(cfg)

@workflow
def wf():
    train(cfg=get_default_config(), model=get_default_model())
f
Hey @Eduardo Apolinario (eapolinario), yes that would of course circumvent the problem but it is not what the engineer is trying to do 🤔 Their goal was to take a file that exists locally (e.g. a config file or a model checkpoint) and during registration have the type transformer serialize it into blob storage so that it is then available during remote execution. Especially for experimentation this can be helpful. My example with the config was just to enforce the pickle transformer. I feel it would also be totally valid if this behaviour wasn’t supported but in this case the registration should fail since any
/tmp/…
dir will not be available in the cluster and it takes some amount of understanding what is happening to figure out why this fails. Is there any way in the type transformer to identify from the
ctx
that we are in registration mode and not local execution?
e
yeah, we could certainly detect that, @Fabio Grätz. Mind filling an issue?
f
Will do 👍
150 Views