How can ensure if a particular plugin is actually ...
# ask-the-community
m
How can ensure if a particular plugin is actually installed properly. I followed this doc to install tensorflow-operator but after installation when I tried to launch the example workflow mentioned in the doc, I see below error in my flyteworkflow deployment.
Copy code
message: >-
    failed at Node[n0]. RuntimeExecutionError: failed during plugin execution,
    caused by: failed to execute handle for plugin [tensorflow]: found no
    current condition. Conditions: []
d
@Byron Hsu this is what you were seeing in MPI as well right? Is updating the task status to
Queued
rather than failing the task the correct functionality here?
m
For me it ramained in running state for 1h 40m and then aborted eventually with below error
Copy code
Workflow[flytesnacks:development:kftensorflow.tf_mnist.mnist_tensorflow_workflow] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [tensorflow]: found no current condition. Conditions: []
@Samhita Alla
b
@Dan Rammer (hamersaw) yes. But I think he ran into other errors. No current conditions will not abort the task
Would you mind providing more context @Mohd Shahid Khan Afridi
m
@Byron Hsu I have flyte cluster setup on GCP, I tried to upgrade it to enable tensorflow-operator plugin. I followed this doc for the same. Once upgrade is successful I tried to run one tensorflow workflow which is again mentioned in the the same doc. When I launch this workflow it gives me this error
Copy code
Workflow[flytesnacks:development:kftensorflow.tf_mnist.mnist_tensorflow_workflow] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [tensorflow]: found no current condition. Conditions: []
How can I debug this? How can I make sure if my plugin is properly enabled?
b
1. Describe tfjob (
kubectl describe <your tfjob>
) 2. Check tfjob’s pod logs (
kubectl logs <your tfjob pod
) 3. Check propeller pod logs Here is the several methods i will use to debug
116 Views