I have been having issues running the llama tutori...
# ask-the-community
a
I have been having issues running the llama tutorials referenced in the documentation https://github.com/unionai-oss/llm-fine-tuning/issues/9
k
Cc @Niels Bantilan
n
Hi @Alex Beach I’m unable to repro this on my side:
Can you share more info on you environment?
pip list
would be useful
also can you share the image being used in your exection?
a
Screenshot 2024-02-01 at 11.57.32 AM.png
this is pip list
Screenshot 2024-02-01 at 12.16.26 PM.png
The images current deployed to my cluster are 1.10.6 cr.flyte.org/flyteorg/flytepropeller-release:v1.10.6
n
Can you download that image locally and do
docker run …
and take a look at
/root
to see if the
flyte_llama
directory is inside it?
I suspect the issue is with the built image
a
dont' see any flyte_llama directories under root
i only see a micromamba directory
I had a globally installed version of pyflyte, which was the latest version. maybe that could be it? The version in the flyte_llama is not the latest
that repo has everything pinned to 1.10.0 which is really old
n
The version pinning shouldn’t be the problem (feel free to try updating those deps, but I doubt it’ll fix the issue). • How are you building/pushing the images? What registry are you using? • Are you using the
ImageSpec
as specified in the workflow.py file are you modifying it?
a
I have not modified the code in the repo, aside from updating the registry value
im just running the pyflyte --remote and letting it build and push the images
im using gcp's registry
i am using the ImageSpec and following all the steps in the README
k
this is really odd
a
ok so i am fairly certain it was because my pyflyte version was higher than 1.10.0
i had globally installed the latest pyflyte version
after uninstalling completely, and using the binary in the virtual env, the task succeeded
I do have a new issue though... The second task of the workflow is stuck in a Queue state...
so i think i found a bug with flyte propeller though. from the flyte propeller logs
Copy code
│ {"json":{"exec_id":"fb3bf107c6ddd4889a74","ns":"flytesnacks-development","res_ver":"8113362","routine":"worker-30","wf":"flytesnacks:development:flyte_llama.workflows.train_workflow"},"level":"error","msg":"Error when trying to reco │
│ ncile workflow. Error [failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [Invalid] failed to create resource, caused by: Pod \"fb3bf107c6ddd4889a74 │
│ -n1-0\" is invalid: [spec.volumes[1].name: Invalid value: \"mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy\": must be no more than 63 characters, spec.containers[0].volumeMounts[1].name: Not  │
│ found: \"mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy\"]]. Error Type[*errors.NodeErrorWithCause]","ts":"2024-02-02T00:07:55Z"}                                                               │
│ E0202 00:07:55.870691       1 workers.go:103] error syncing 'flytesnacks-development/fb3bf107c6ddd4889a74': failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [c │
│ ontainer]: [Invalid] failed to create resource, caused by: Pod "fb3bf107c6ddd4889a74-n1-0" is invalid: [spec.volumes[1].name: Invalid value: "mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy":  │
│ must be no more than 63 characters, spec.containers[0].volumeMounts[1].name: Not found: "mfzg3otbo4ztu32fmnzgk4dtnvqw3ylhmvzdu4ltfvswc32ufuzdumzvgy2dgmzqgyzdanryhjzwky2smv1duxy"]
k
Ohh man, this seems like a weird regression and did it show this in the Ui?
Cc @Haytham Abuelfutuh / @Dan Rammer (hamersaw) / @Paul Dittamo . This should be bubbled up
p
@Alex Beach thank you for pointing this out
@Dan Rammer (hamersaw) I'll handle this as part of this sprint