I have been having issues running the llama tutori...
# flyte-support
h
I have been having issues running the llama tutorials referenced in the documentation https://github.com/unionai-oss/llm-fine-tuning/issues/9
πŸ‘€ 1
f
Cc @broad-monitor-993
b
Hi @hallowed-dog-74273 I’m unable to repro this on my side:
Can you share more info on you environment?
pip list
would be useful
also can you share the image being used in your exection?
h
this is pip list
The images current deployed to my cluster are 1.10.6 cr.flyte.org/flyteorg/flytepropeller-release:v1.10.6
b
Can you download that image locally and do
docker run …
and take a look at
/root
to see if the
flyte_llama
directory is inside it?
I suspect the issue is with the built image
h
dont' see any flyte_llama directories under root
i only see a micromamba directory
I had a globally installed version of pyflyte, which was the latest version. maybe that could be it? The version in the flyte_llama is not the latest
that repo has everything pinned to 1.10.0 which is really old
b
The version pinning shouldn’t be the problem (feel free to try updating those deps, but I doubt it’ll fix the issue). β€’ How are you building/pushing the images? What registry are you using? β€’ Are you using the
ImageSpec
as specified in the workflow.py file are you modifying it?
h
I have not modified the code in the repo, aside from updating the registry value
im just running the pyflyte --remote and letting it build and push the images
im using gcp's registry
i am using the ImageSpec and following all the steps in the README
f
this is really odd
h
ok so i am fairly certain it was because my pyflyte version was higher than 1.10.0
i had globally installed the latest pyflyte version
after uninstalling completely, and using the binary in the virtual env, the task succeeded
I do have a new issue though... The second task of the workflow is stuck in a Queue state...
so i think i found a bug with flyte propeller though. from the flyte propeller logs
Copy code
β”‚ {"json":{"exec_id":"fb3bf107c6ddd4889a74","ns":"flytesnacks-development","res_ver":"8113362","routine":"worker-30","wf":"flytesnacks:development:flyte_llama.workflows.train_workflow"},"level":"error","msg":"Error when trying to reco β”‚
β”‚ ncile workflow. Error [failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [Invalid] failed to create resource, caused by: Pod \"fb3bf107c6ddd4889a74 β”‚
β”‚ -n1-0\" is invalid: [spec.volumes[1].name: Invalid value: \"mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy\": must be no more than 63 characters, spec.containers[0].volumeMounts[1].name: Not  β”‚
β”‚ found: \"mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy\"]]. Error Type[*errors.NodeErrorWithCause]","ts":"2024-02-02T00:07:55Z"}                                                               β”‚
β”‚ E0202 00:07:55.870691       1 workers.go:103] error syncing 'flytesnacks-development/fb3bf107c6ddd4889a74': failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [c β”‚
β”‚ ontainer]: [Invalid] failed to create resource, caused by: Pod "fb3bf107c6ddd4889a74-n1-0" is invalid: [spec.volumes[1].name: Invalid value: "mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy":  β”‚
β”‚ must be no more than 63 characters, spec.containers[0].volumeMounts[1].name: Not found: "mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy"]
f
Ohh man, this seems like a weird regression and did it show this in the Ui?
Cc @high-park-82026 / @hallowed-mouse-14616 / @flat-area-42876 . This should be bubbled up
πŸ‘ 1
f
@hallowed-dog-74273 thank you for pointing this out
@hallowed-mouse-14616 I'll handle this as part of this sprint
πŸ™Œ 1