Hi, I have a quick question regarding one-hot-enco...
# ask-the-community
s
Hi, I have a quick question regarding one-hot-encoding. During training, we fit OneHotEncoder in the transform task, but we need the encoder saved somewhere for inference step. How can I do that in flyte?
e
For sklearn, Flyte should automatically pickle the transformer if return it from a task and pass it into another task. If it doesn’t work for sklearn or the library you are using, it should be possible to add a specific type transformer. There is a type transformer for spark ml pipelines in the spark integration. The pattern can be copied. I am happy to work on your specific use-case with you!
s
@Evan Sadler The problem is I don’t run the inference on Flyte (only training is run via Flyte). I use a serving platform (Bentoml) so I’m not sure how to get the original transform encoder to Bentoml which runs in a completely separate environment (i.e. difference cluster) than Flyte.
e
Ah okay! Then using the type system doesn’t make sense. If you use a FlyteFile then the sklearn transformer will get saved to s3. You can reference it in Flyte and then get the s3 path out and send it to BentoML with your model. I believe you are supposed to use the remote_source attribute.
Copy code
# preprocess task
return Flytefile(path=“./saved_model.pkl”)
Copy code
@task
def get_path(ff: FlyteFile) -> str:
    # pass to bento ml?
    return ff.remote_source
s
OK thanks - so the best practice would be create a task that outputs the serialized encoder and then use that from bentoml. So it’s possible to get the output of intermediate tasks? And if so, is there an API endpoint I can use to get that rather than manually copying and pasting?
Also one more question - can I save things to non-flyte s3 bucket from inside a flyte task? I don’t see a reason why not right?
e
Flyte has a way to access workflow outputs from python. You can reference workflows by name and look at specific executions. I will send something over tomorrow (or maybe someone else can help now). To your second comment - yes you can always just save stuff to s3 through boto so long as the iam role that flyte uses has access to the other bucket. Also, flyte files have a remote_path that can be used to specify any destination on s3. Can be convenient in some cases.
s
Good point on the s3 access from flyte - yes, it’d be really helpful if you can tell me how to reference workflows by name and access outputs of specific executions via python. Thank you!
e
Here is the example that I promised! You can use
FlyteRemote
to interact with the inputs and outputs of an execution. This shows how to grab an output of a workflow, but you can get task outputs and more.
Copy code
from flytekit.remote.remote import FlyteRemote, Config
remote = FlyteRemote(config=Config.auto(config_file="/Users/esad/.uctl/config.yaml"))
execution = remote.fetch_execution(project="flytesnacks", 
                      domain="development", 
                      name="f310698c1752b49759b9")

outputs = execution.outputs # this is basically a dict
outputs.get("o0") # o0 is the name of the first positional output
151 Views