Is it possible to test the single cluster deployment in AWS Flyte #flyte-deployment

Is it possible to test the single cluster deployme...

echoing-carpenter-92090

04/26/2023, 10:34 PM

Is it possible to test the single cluster deployment in AWS with kubectl port-forwarding? Following this it seems like the helm deployment has changed a bit from the docs. There are 3 services provisioned.

Copy code

NAME                                              READY   STATUS    RESTARTS   AGE
pod/flyte-backend-flyte-binary-84949bf97b-dm746   1/1     Running   0          72m

NAME                                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/flyte-backend-flyte-binary-grpc      ClusterIP   10.100.94.188   <none>        8089/TCP   73m
service/flyte-backend-flyte-binary-http      ClusterIP   10.100.61.78    <none>        8088/TCP   73m
service/flyte-backend-flyte-binary-webhook   ClusterIP   10.100.80.56    <none>        443/TCP    73m

NAME                                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flyte-backend-flyte-binary   1/1     1            1           72m

NAME                                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/flyte-backend-flyte-binary-84949bf97b   1         1         1       72m

I can port-forward to http and grpc independently, but my

pyflyte run --remote...

call errors out with

Copy code

Failed with Exception: Reason: USER:ValueError
Value error!  Received: 403. Request to send data <https://indupro-flyte-metadata.s3.us-west-2.amazonaws.com/flytesnacks/><long url>... failed.

faint-smartphone-23356

04/26/2023, 11:38 PM

@echoing-carpenter-92090 I'm not sure about your full setup, but I read that as the host your running pyflyte from does not have access to your s3 bucket.

faint-smartphone-23356

04/26/2023, 11:40 PM

what happens if you run

aws s3 ls <s3://indupro-flyte-metadata>

from the same host you're running pyflyte on?

✅ 1

echoing-carpenter-92090

04/27/2023, 3:40 AM

Thanks Mike, it was exactly that - S3 permission was not attached to the node security group.

freezing-airport-6809

04/27/2023, 4:20 AM

but pyflyte should not need access to s3

👀 1

freezing-airport-6809

04/27/2023, 4:20 AM

we use signed urls and they should just float

faint-smartphone-23356

04/27/2023, 3:53 PM

In our production cluster (using flyte-binary) I've certainly observed that the nodes that run our tasks need the s3 permissions otherwise they get the errors above. I've not seen the URLs that they receive. I'll keep an eye out for this over the next couple of weeks. I'd love to be able to drop the requirement.

faint-smartphone-23356

04/27/2023, 3:55 PM

This'll be a bit opaque because I'm copying it out of our IaaC:

Copy code

const flyteSnacksServiceAccounts = flyteProjectDomains.map((env) => {
  const namespace = new k8s.core.v1.Namespace(
    `${stack}-${env}-namespace`,
    {
      metadata: {
        name: `${env}`,
      },
    },
    { provider: k8sProvider }
  );

  // Since creating a namespace creates the service account; we're just
  // patching them.
  return new k8s.core.v1.ServiceAccountPatch(
    `${stack}-${env}-serviceAccount`,
    {
      metadata: {
        name: "default",
        namespace: namespace.metadata.name,
        annotations: {
          "<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>": s3Role.arn,
        },
      },
    },
    { provider: k8sProvider }
  );
});

• for each namespace in flytesnacks-[development,staging,production] ◦ patch the default service account with the arn of the s3role Without that we get permission failures.

faint-smartphone-23356

04/27/2023, 3:58 PM

flyte v1.3.0 is installed fwiw

faint-smartphone-23356

05/10/2023, 3:55 PM

With flytebackend 1.5.0 I can confirm that: a) the error is still encountered b) the service account the workload run under need s3 permissions to the data bucket otherwise it gets auth issues.

faint-smartphone-23356

05/10/2023, 3:57 PM

I’ll review (what seems to be relative wonderful) deployment docs https://flyte-org.slack.com/archives/C01P3B761A6/p1683574336171479 and see if there’s something obvious / not so obvious that is causing us to need to set s3 permissions for the default service account / i.e. why signed urls aren’t happening.

160 Views

Open in Slack

Previous Next