I see this error in my flyte logs: ```W0422 18:31:...
# flyte-support
c
I see this error in my flyte logs:
Copy code
W0422 18:31:31.977765       7 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: failed to list *v1.PyTorchJob: <http://pytorchjobs.kubeflow.org|pytorchjobs.kubeflow.org> is forbidden: User "system:serviceaccount:flyte:flyte-backend-flyte-binary" cannot list resource "pytorchjobs" in API group "<http://kubeflow.org|kubeflow.org>" at the cluster scope
E0422 18:31:31.977797       7 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.PyTorchJob: failed to list *v1.PyTorchJob: <http://pytorchjobs.kubeflow.org|pytorchjobs.kubeflow.org> is forbidden: User "system:serviceaccount:flyte:flyte-backend-flyte-binary" cannot list resource "pytorchjobs" in API group "<http://kubeflow.org|kubeflow.org>" at the cluster scope
Am I missing any steps to give flyte permissions after installing KF Training Operator?
weird, I see it in the UI now as well:
Copy code
<http://pytorchjobs.kubeflow.org|pytorchjobs.kubeflow.org> is forbidden: User "system:serviceaccount:flyte:flyte-backend-flyte-binary" cannot create resource "pytorchjobs" in API group "<http://kubeflow.org|kubeflow.org>" in the namespace "flytesnacks-development"
I don't see anything mentioned in the docs here about granting flyte permissions to the kubeflow api group 🤔 https://www.union.ai/docs/flyte/deployment/flyte-plugins/kubernetes-plugins/
fyi I figured this out — will post solution when I’m online later
e
what was it, i am having same issue??
@curved-whale-1505
c
ah sorry, I forgot to follow up here. here is my CDK code, you can translate back into yaml if needed:
Copy code
// Create Cluster Role and ClusterRoleBinding for PyTorchJob access
     const flyteKubeflowRole = new eks.KubernetesManifest(this, 'flyte-kubeflow-role', {
       cluster: props.cluster,
       manifest: [
         {
           apiVersion: '<http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>',
           kind: 'ClusterRole',
           metadata: {
             name: 'flyte-pytorch-job-role',
           },
           rules: [
             {
               apiGroups: ['<http://kubeflow.org|kubeflow.org>'],
               resources: ['pytorchjobs', 'pytorchjobs/status'],
               verbs: ['get', 'list', 'watch', 'create', 'update', 'patch', 'delete'],
             },
           ],
         },
       ],
     });
 
     const flyteKubeflowRoleBinding = new eks.KubernetesManifest(this, 'flyte-kubeflow-rolebinding', {
       cluster: props.cluster,
       manifest: [
         {
           apiVersion: '<http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>',
           kind: 'ClusterRoleBinding',
           metadata: {
             name: 'flyte-pytorch-job-rolebinding',
           },
           roleRef: {
             apiGroup: '<http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>',
             kind: 'ClusterRole',
             name: 'flyte-pytorch-job-role',
           },
           subjects: [
             {
               kind: 'ServiceAccount',
               name: flyteServiceAccountName,
               namespace: flyteNamespaceName,
             },
           ],
         },
       ],
     });
e
ty
c
@powerful-gold-59386 happy to help update the docs if you can point my to where this makes the most sense
e
I think its better to put this as a part of helm chart, you can edit the code there and create a PR
c
I don’t think it can be added there because it relies on the pytorchjob crd being installed which isn’t a requirement for installing flyte? but correct me if I’m wrong
a
ah @curved-whale-1505 great find and thanks for sharing I think this should be in the docs unless there would be a flag that disables by default the pytorcjob CRD (not yet)
it should land here
@curved-whale-1505 I'm a bit confused (sorry in advance). I see the kubeflow-operator chart configures a few RBAC resources but what you shared seems to refer to different API groups and resource types (eg
<http://kubeflow.org|kubeflow.org>
vs
<http://trainer.kubeflow.org|trainer.kubeflow.org>
) We can always instruct the clusterresources component of Flyte to create additional K8s resources like the ClusterRole, using something like this in the values:
Copy code
clusterResourceTemplates:
  inline:
    001_clusterrole.yaml: |
      apiVersion: v1
      kind: ClusterRole
      metadata:
        name: 'flyte-pytorch-role'
...