Hi! Im running into an issue with returning a Pyto...
# ask-the-community
g
Hi! Im running into an issue with returning a Pytorch model from a Task/Workflow (in demo environment). What am I doing wrong? This gets the Pod stuck in
Running
with logs `Failed to put data from /home/user/model.pt to s3://my-s3-bucket/data/.../model.pt; Original exception: Could not connect to the endpoint URL: \"http://flyte-sandbox-minio.flyte:9000/my-s3-bucket?list-type=2&prefix=data...`:
Copy code
@task()
def train_model() -> bool:
  train() # model training, generates a .pt file in the container
  return FlyteFile(path="/home/user/model.pt")

@workflow
def wf():
    train_model = train_model()
    return train_model
This, just taking another random file in the same container completes without issues:
Copy code
return FlyteFile(path="/home/user/.profile")
When I
exec
into the container, the
<http://model.pt|model.pt>
file is there.
k
That url is wrong. Internally it should be the minio k8s url
Cc @jeev do you spot something
j
that url is the internal minio k8s one it looks like. and another file works?
g
Hi, I was able to fix the issue, took me some hours as it failed kind of silently. All statements in the task were executed properly and I had logging all the way to the end. However the function did not exit. Turned out there was a database connection not being properly cleanup up (it did not implement a destructor method properly), which made the Pod stay in Running state. I think (?) not a Flyte issue.
157 Views