I have been having issues running the llama tutori...
# ask-the-community
I have been having issues running the llama tutorials referenced in the documentation https://github.com/unionai-oss/llm-fine-tuning/issues/9
Cc @Niels Bantilan
Hi @Alex Beach I’m unable to repro this on my side:
Can you share more info on you environment?
pip list
would be useful
also can you share the image being used in your exection?
Screenshot 2024-02-01 at 11.57.32 AM.png
this is pip list
Screenshot 2024-02-01 at 12.16.26 PM.png
The images current deployed to my cluster are 1.10.6 cr.flyte.org/flyteorg/flytepropeller-release:v1.10.6
Can you download that image locally and do
docker run …
and take a look at
to see if the
directory is inside it?
I suspect the issue is with the built image
dont' see any flyte_llama directories under root
i only see a micromamba directory
I had a globally installed version of pyflyte, which was the latest version. maybe that could be it? The version in the flyte_llama is not the latest
that repo has everything pinned to 1.10.0 which is really old
The version pinning shouldn’t be the problem (feel free to try updating those deps, but I doubt it’ll fix the issue). • How are you building/pushing the images? What registry are you using? • Are you using the
as specified in the workflow.py file are you modifying it?
I have not modified the code in the repo, aside from updating the registry value
im just running the pyflyte --remote and letting it build and push the images
im using gcp's registry
i am using the ImageSpec and following all the steps in the README
this is really odd
ok so i am fairly certain it was because my pyflyte version was higher than 1.10.0
i had globally installed the latest pyflyte version
after uninstalling completely, and using the binary in the virtual env, the task succeeded
I do have a new issue though... The second task of the workflow is stuck in a Queue state...
so i think i found a bug with flyte propeller though. from the flyte propeller logs
Copy code
│ {"json":{"exec_id":"fb3bf107c6ddd4889a74","ns":"flytesnacks-development","res_ver":"8113362","routine":"worker-30","wf":"flytesnacks:development:flyte_llama.workflows.train_workflow"},"level":"error","msg":"Error when trying to reco │
│ ncile workflow. Error [failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [Invalid] failed to create resource, caused by: Pod \"fb3bf107c6ddd4889a74 │
│ -n1-0\" is invalid: [spec.volumes[1].name: Invalid value: \"mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy\": must be no more than 63 characters, spec.containers[0].volumeMounts[1].name: Not  │
│ found: \"mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy\"]]. Error Type[*errors.NodeErrorWithCause]","ts":"2024-02-02T00:07:55Z"}                                                               │
│ E0202 00:07:55.870691       1 workers.go:103] error syncing 'flytesnacks-development/fb3bf107c6ddd4889a74': failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [c │
│ ontainer]: [Invalid] failed to create resource, caused by: Pod "fb3bf107c6ddd4889a74-n1-0" is invalid: [spec.volumes[1].name: Invalid value: "mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy":  │
│ must be no more than 63 characters, spec.containers[0].volumeMounts[1].name: Not found: "mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy"]
Ohh man, this seems like a weird regression and did it show this in the Ui?
Cc @Haytham Abuelfutuh / @Dan Rammer (hamersaw) / @Paul Dittamo . This should be bubbled up
@Alex Beach thank you for pointing this out
@Dan Rammer (hamersaw) I'll handle this as part of this sprint