Hello guys Can someone please show me an template...
# ask-the-community
f
Hello guys Can someone please show me an template of a working AWS S3 Config? And how to connect it to FlyteRemote? Flyte doesn't seem to notice my configuration.
s
Could you share your FlyteRemote config?
f
Copy code
"""Flyte Remote"""

from flytekit.remote import FlyteRemote # type: ignore
from flytekit.configuration import ( # type: ignore
    Config,
    SerializationSettings,
    ImageConfig,
    PlatformConfig,
    S3Config,
    DataConfig,
)

IMAGE_STR = "docker_user/repo_name:tag"

image_config = ImageConfig.auto(img_name=IMAGE_STR)

s3_config = S3Config(
    endpoint="localhost:8089", # I used the kubernetes endpoint,
    retries=3,
    access_key_id="AKIA0123458DKF", # fake id
    secret_access_key="0dhjheedh38ahdjaleurebad938ud", # fake key
)

remote = FlyteRemote(config=Config(
    platform=PlatformConfig(
        endpoint= "localhost:8089", # cluster dns
        insecure=True,
        insecure_skip_verify=True,
    ),
    data_config=DataConfig(
    s3=s3_config
    ),
))
@Samhita Alla
s
Are you seeing any error?
f
Yes. This is the error I keep getting during a workflow run. https://flyte-org.slack.com/archives/CP2HDHKE1/p1677681711120149
Is the endpoint I used in the S3Config correct? Or should it be the Arn of the S3 bucket?
Although I tried that too, same error occured
s
I'm not sure why that error's cropping up. Could you please go through this thread? https://flyte-org.slack.com/archives/CP2HDHKE1/p1674091420479589?thread_ts=1674085228.291519&cid=CP2HDHKE1
How have you deployed Flyte? What's your admin and propeller versions?
f
Ok, I will check the thread out. How do I check the Admin and propeller versions? I followed this link to deploy Flyte on the cluster: https://docs.flyte.org/en/latest/deployment/deployment/cloud_simple.html, using Helm.
s
Oh nm. Shouldn't the endpoint be 8088?
Also, I don't think you need to provide s3 config in FlyteRemote if you correctly configure it in the eks-starter.yaml file.
f
I am also thinking so. I think there is a problem with my eks-starter. What type of ARN role do I pass to this key:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>:
. Should I create a role that has the policy to manage any resource in the cluster and also be able to
PassRole
?
8088 throws a different error. It seems that's for the GUI and 8089 is for the backend. Or am I missing something?
s
What type of ARN role do I pass to this key:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>:
. Should I create a role that has the policy to manage any resource in the cluster and also be able to
PassRole
?
That should do.
f
Ok, I will try that now. Thank you very much.
s
Let me know if you're seeing any issues!
f
Ok, I will.
Hello @Samhita Alla I am still getting the same error despite creating the role as said above.
s
So you aren't using s3config in FlyteRemote?
f
So I added an full-access to eks policy and full-access to S3 policy to the role I created for flyte-admin. In the FlyteRemote, the S3Config is still there. Let me remove it and try it again.
what is the meaning of
<USER_DATA_BUCKET_NAME>
? It's in the template
eks-starter.yaml
. What will be the value of this?
s
It's the s3 bucket.
f
Ok, I put the same s3 bucket as the value for that and this:
metadataContainer
.
s
As per my understanding,
metaDataContainer
is to store Flyte metadata and
userDataContainer
is to store the actual data.
f
Ok. I think it's doing that. I can see these outputs in S3. From what I see more in the error output. It seems it's looking for this file in storage but can't find it. Says:
Copy code
caused by: path:metadata/propeller/flexs-app-development-f72e5a548840e46ef8f1/n0/data/0/futures.pb: getItem, getting the object: Forbidden: Forbidden
looking into the file path on s3, it's called
metadata/propeller/flexs-app-development-f72e5a548840e46ef8f1/n0/data/0/outputs.pb
and not
metadata/propeller/flexs-app-development-f72e5a548840e46ef8f1/n0/data/0/fututes.pb
. Is there a mistake somewhere?
s
cc @Yee @Dan Rammer (hamersaw)
Hey @Yee! Do you have any idea how to fix this?
k
This seems like some error in deployment
Hard to know
y
@Fhuad Balogun can you paste the full error message for the
caused by
error?
this looks like some storage misconfiguration issue.
f
Thanks guys, I really need to get past this stage. It's been a frustrating 2 weeks. This is the full output of the error:
Copy code
Workflow[flexs-app:development:flexs_app.workflows.train_wf] failed. RuntimeExecutionError: max number of system retry attempts [11/10] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to check existence of futures file: [User] Failed to do HEAD on futures file., caused by: path:metadata/propeller/flexs-app-development-faa05ff0ba4174952b2f/n0/data/0/futures.pb: getItem, getting the object: Forbidden: Forbidden
	status code: 403, request id: ZCJRYJ394HERSEE, host id: dyfcQi1d00vsYsCm
y
can you paste us 1. helm chart name and version, 2. kubectl -n flyte get configmap (or whatever your namespace for flyte is) a. a yaml dump of each configmap (redact any sensitive information of course)
also remind me how this is deployed? this is just on an EKS cluster right?
f
Sorry, I've been busy on something else today. These are the content of three configmaps found in
flyte
namespace: 1. flyte-backend-flyte-binary-cluster-resource-templates.yaml
Copy code
Name:         flyte-backend-flyte-binary-cluster-resource-templates
Namespace:    flyte
Labels:       <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
              <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
              <http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
              <http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
              <http://helm.sh/chart=flyte-binary-v1.3.0|helm.sh/chart=flyte-binary-v1.3.0>
Annotations:  <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
              <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte

Data
====
namespace.yaml:
----
apiVersion: v1
kind: Namespace
metadata:
  name: '{{ namespace }}'


BinaryData
====

Events:  <none>
2. flyte-backend-flyte-binary-config.yaml
Copy code
Name:         flyte-backend-flyte-binary-config
Namespace:    flyte
Labels:       <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
              <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
              <http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
              <http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
              <http://helm.sh/chart=flyte-binary-v1.3.0|helm.sh/chart=flyte-binary-v1.3.0>
Annotations:  <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
              <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte

Data
====
001-plugins.yaml:
----
tasks:
  task-plugins:
    enabled-plugins:
      - container
      - sidecar
      - K8S-ARRAY
    default-for-task-types:
      - container: container
      - container_array: K8S-ARRAY
plugins:
  logs:
    kubernetes-enabled: false
    cloudwatch-enabled: false
    stackdriver-enabled: false
  k8s-array:
    logs:
      config:
        kubernetes-enabled: false
        cloudwatch-enabled: false
        stackdriver-enabled: false

002-database.yaml:
----
database:
  postgres:
    username: dbname
    passwordPath: /var/run/secrets/flyte/db-pass
    host: aurora-rds-host-dns
    port: 5432
    dbname: flyteadmin
    options: "sslmode=disable"

003-storage.yaml:
----
propeller:
  rawoutput-prefix: <s3://bucket_name/data>
storage:
  type: stow
  stow:
    kind: s3
    config:
      region: aws-region
      disable_ssl: false
      v2_signing: false
      auth_type: iam
  container: bucket_name

010-inline-config.yaml:
----
plugins:
  k8s:
    default-env-vars:
    - AWS_METADATA_SERVICE_TIMEOUT: 5
    - AWS_METADATA_SERVICE_NUM_ATTEMPTS: 20
    inject-finalizer: true
storage:
  cache:
    max_size_mbs: 100
    target_gc_percent: 100

000-core.yaml:
----
admin:
  endpoint: localhost:8089
  insecure: true
catalog-cache:
  endpoint: localhost:8081
  insecure: true
  type: datacatalog
cluster_resources:
  standaloneDeployment: false
  templatePath: /etc/flyte/cluster-resource-templates
logger:
  show-source: true
  level: 1
propeller:
  create-flyteworkflow-crd: true
webhook:
  certDir: /var/run/flyte/certs
  localCert: true
  secretName: flyte-backend-flyte-binary-webhook-secret
  serviceName: flyte-backend-flyte-binary-webhook
  servicePort: 443


BinaryData
====

Events:  <none>
3. kube-root-ca.crt.yaml:
Copy code
Name:         kube-root-ca.crt
Namespace:    flyte
Labels:       <none>
Annotations:  <http://kubernetes.io/description|kubernetes.io/description>:
                Contains a CA bundle that can be used to verify the kube-apiserver when using internal endpoints such as the internal service IP or kubern...

Data
====
ca.crt:
----
-----BEGIN CERTIFICATE-----
redacted
-----END CERTIFICATE-----


BinaryData
====

Events:  <none>
NB: sensitive info has been redacted as advised. Thank you @Yee
y
yeah looking at this i don’t see anything wrong with the
storage
sections. to answer your question about Flyteremote though, it doesn’t take much configuration. you just need a configuration file like
Copy code
$ cat ~/.flyte/dev.yaml
admin:
  endpoint: dns:///flytedev.company.net
  authType: Pkce
  insecure: false
and in python you can do
Copy code
from flytekit.remote.remote import FlyteRemote
from flytekit.configuration import Config
r = FlyteRemote(
    Config.auto(config_file="/Users/ytong/.flyte/dev.yaml"),
    default_project="flytesnacks",
    default_domain="development",
)
i will point out though that authentication to S3 is a separate thing. this will not give you S3 permission
you will need to do that out of band (
aws sso login
or something)
if you have explicit S3 perms you can pass those in yes.
however I think this might be a red-herring. your initial error was something to do with futures files?
can you show the dynamic task that generated them?
inputs and outputs from normal (non-dynamic) tasks and the futures files for dynamic tasks end up in the same place
so if other tasks are running properly then s3 access is likely not the issue
f
Thank you @Yee, none of the tasks have run successfully remotely. They've all been aborted with the same issue. Below is the task that generated the error:
Copy code
@task(requests=Resources(cpu="100m", mem="100Mi"), limits=Resources(cpu="200m", mem="500Mi"))
def get_scores(seqs: List[str]) -> np.ndarray:
    landscape = Landscape("landscape")
    scores = landscape.get_fitness(seqs=seqs)
    return scores
One thing I have been looking at. Could it be that there is a mistake in the naming of the file by the
propeller
? It is named
outputs.pb
in S3 but seem to be looking for
futures.pb
? Are they meant to both be created in the folder or just one is and the mistake is due to the name giving to the file?
y
these are different files
futures.pb is very specific
can you paste more of the code please?
a futures file is generated only by a
@dynamic
task.
if you don’t have those, nothing should be looking for them and nothing should be writing them.
f
@Samhita Alla @Yee and @David Espejo (he/him), thank you guys so much for your help and concern. Everything works now. I had to completely destroy the cluster, and other resources to start all over again while following the instructions in this archived link:
<https://web.archive.org/web/20220926215904/https://docs.flyte.org/en/latest/deployment/aws/manual.html>
I got from this github repo:
<https://github.com/alexifm/flyte-eks-deployment>
. The propeller issue no longer come up. Thank you guys once again and have a great weekend.
d
thans for your feedback @Fhuad Balogun and sorry you had to go through a lot to get it working. I'm going also thorugh the process and hopefully soon will have it documented
y
@Niels Bantilan food for thought for the old instructions.
thanks for you patience @Fhuad Balogun
156 Views