Hi I m trying to deploy flyte via k3s and was wondering if a Flyte #flyte-deployment

Hi, I'm trying to deploy flyte via k3s and was won...

purple-father-70173

11/26/2024, 5:45 AM

Hi, I'm trying to deploy flyte via k3s and was wondering if anyone knows the correct configuration for the ingress of flyte-binary

purple-father-70173

11/26/2024, 5:47 AM

K3s uses traefik, and so far when I try to submit workflows all I get is 404 and 500 with "received http2 header with status". I can access the webui but still can't do anything with pyflyte remotely

purple-father-70173

11/26/2024, 5:48 AM

I'm following https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/on-premises/microk8s/05-add-ingress-and-tls.md and so far all I have done is replace the ingress class with traefik

average-finland-92144

11/26/2024, 4:12 PM

@purple-father-70173 there are some previous experiences with Traefik, like this I think @kind-night-79286 also uses Traefik with on-prem K8s

average-finland-92144

11/26/2024, 4:12 PM

seems like effectively the most challenging part is gRPC config, what the CLI uses

purple-father-70173

11/26/2024, 4:13 PM

This has been my experience thus far, grpc config is not really an ootb experience

average-finland-92144

11/26/2024, 4:13 PM

can you share the ingress config you're using?

purple-father-70173

11/26/2024, 4:18 PM

Copy code

ingress:
  create: true
  host: "{{ .Values.userSettings.hostName }}"
  separateGrpcIngress: true
  ingressClassName: traefik
  tls:
    - hosts:
        - "{{ .Values.userSettings.hostName }}"
      secretName: flytetls
  commonAnnotations:
    <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: traefik
    <http://external-dns.alpha.kubernetes.io/hostname|external-dns.alpha.kubernetes.io/hostname>: flyte.local
  httpAnnotations:
    <http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
  grpcAnnotations:
    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC

purple-father-70173

11/26/2024, 4:21 PM

<http://external-dns.alpha.kubernetes.io/hostname|external-dns.alpha.kubernetes.io/hostname>: flyte.local

is just me trying something

average-finland-92144

11/26/2024, 4:33 PM

I don't think nginx annotations will have any effect on traefik behavior. I faced a similar problem in the past and ended up installing k3s without traefik and manually deploying nginx as controller a good reference is: https://www.suse.com/support/kb/doc/?id=000020082

purple-father-70173

11/26/2024, 4:47 PM

Interesting, is there a way to replace traefik with nginx without reinstalling k3s?

purple-father-70173

11/26/2024, 4:58 PM

I tried the curl command in that issue you linked: `curl -v -k -X POST --http2 'https://flyte.local/grpc.health.v1.Health' -d "" -H 'Content-Type: application/grpc' -H 'Accept: application/grpc`` and I don't get a

, I instead get a

Internal Server Error

average-finland-92144

11/26/2024, 5:00 PM

flyte.local

being served by your ingress controller? if that's the case and you haven't changed anything, it's not going to work. To test without ingress you can do a port-forward of the flyteadmin service and then curl that endpoint

purple-father-70173

11/26/2024, 5:12 PM

Copy code

NAME                      CLASS     HOSTS         ADDRESS        PORTS     AGE
flyte-flyte-binary-grpc   traefik   flyte.local   192.168.1.80   80, 443   11h
flyte-flyte-binary-http   traefik   flyte.local   192.168.1.80   80, 443   15h

When port-forwarding the grpc endpoint I get:

Copy code

$ curl -v -X POST --http2 '<https://localhost:8089/grpc.health.v1.Health>' -d "" -H 'Content-Type: application/grpc' -H 'Accept: application/grpc'
* Host localhost:8089 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8089...
* connect to ::1 port 8089 from ::1 port 58848 failed: Connection refused
*   Trying 127.0.0.1:8089...
* Connected to localhost (127.0.0.1) port 8089
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number
* Closing connection
curl: (35) OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number

purple-father-70173

11/26/2024, 5:20 PM

I can curl the http endpoint just fine fyi

average-finland-92144

11/26/2024, 5:29 PM

but are you port-forwarding the ingress resource or the corresponding K8s service?

purple-father-70173

11/26/2024, 5:53 PM

Oh sorry, I misread I was port forwarding the svc

purple-father-70173

11/26/2024, 5:56 PM

I'm port-forwarding the grpc service with

kubectl port-forward -n flyte svc/flyte-flyte-binary-grpc 8089:8089

average-finland-92144

11/26/2024, 6:07 PM

ok. well it's failing on SSL check which I guess it's expected. Not sure if you'd get a different result using

grpcurl

but the point is that unless you have a well-defined set of routes and annotations for Traefik gRPC to work, it may be better to install nginx and use `IngressClassName: nginx`to make sure Ingress requests are fulfilled by that controller and not traefik

purple-father-70173

11/26/2024, 6:17 PM

Good to know, thanks David!

purple-father-70173

11/26/2024, 6:28 PM

Okay, I've deployed the ingress controller and now I'm using that class for flyte:

Copy code

NAME                      CLASS   HOSTS         ADDRESS        PORTS     AGE
flyte-flyte-binary-grpc   nginx   flyte.local   192.168.1.83   80, 443   13h
flyte-flyte-binary-http   nginx   flyte.local   192.168.1.83   80, 443   16h

purple-father-70173

11/26/2024, 6:30 PM

Okay, now that curl command shows up as a

average-finland-92144

11/26/2024, 6:31 PM

coolm what about

pyflyte

purple-father-70173

11/26/2024, 6:33 PM

running the demo, I'm getting:

Copy code

MaxRetryError: HTTPConnectionPool(host='flyte.local', port=30084): Max retries exceeded with url: 
/flyte/flytesnacks/development/XKZHA3QUW6N2XSH5PLMKL22K5U%3D%3D%3D%3D%3D%3D/fast24e3400da62c0949fed85ce3cb777998.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20241126%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241126T183153Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-md5%3Bhos
t%3Bx-amz-meta-flytecontentmd5&X-Amz-Signature=a147ae265522fb16b2d4992d75fdcd0b06596547158f19fe546ae3ce0b260e83 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f371c13e660>: Failed to establish a new connection: [Errno 113] No route to host'))

This is my config:

Copy code

admin:
  endpoint: dns:///flyte.local
  insecure: false
  caCertFilePath: /home/titsw/.flyte/ca.crt

average-finland-92144

11/26/2024, 7:02 PM

looks like

flyte.local

is not resolving on the DNS server

purple-father-70173

11/26/2024, 7:06 PM

I wonder if this has something to do with me not having a dedicated DNS server on my network. I'm just modifying coredns on k3s:

Copy code

Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        hosts /etc/coredns/NodeHosts {
          ttl 60
          reload 15s
          192.168.1.83 flyte.local
          fallthrough
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
        import /etc/coredns/custom/*.override
    }
    import /etc/coredns/custom/*.server

purple-father-70173

11/26/2024, 7:14 PM

I moved Flyte to its own route, looks like I did this step incorrectly:

Copy code

Corefile: |-
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        hosts /etc/coredns/NodeHosts {
          ttl 60
          reload 15s
          fallthrough
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
        import /etc/coredns/custom/*.override
    }
    import /etc/coredns/custom/*.server

    flyte.local:53 {
        errors
        hosts {
          192.168.1.83 flyte.local
        }
    }

purple-father-70173

11/26/2024, 7:44 PM

Okay, so I can

nslookup

flyte.local no problem, but when I try to ping flyte.local I get:

Copy code

From <k3s-agent-node-ip-address> icmp_seq=1 Destination Host Unreachable
...

average-finland-92144

11/26/2024, 7:51 PM

hm, maybe your OS is trying to resolve from another server first?

purple-father-70173

11/26/2024, 7:52 PM

yeah, this is definitely some routing issue in my network. I'm looking into it

purple-father-70173

11/26/2024, 8:31 PM

Is there a reason why it's using port 30084?

average-finland-92144

11/26/2024, 8:40 PM

uh have you used

flytectl demo

before?

purple-father-70173

11/26/2024, 8:41 PM

I have not

average-finland-92144

11/26/2024, 8:42 PM

is the port used by that type of instance. In any case you can instruct the CLI to use your config file:

Copy code

export FLYTECTL_CONFIG=$HOME/.flyte/config.yaml

and validate your

config.yaml

points to your Flyte cluster like

Copy code

cat $HOME/.flyte/config.yaml
admin:
  endpoint: dns:///flyte.local
  authType: Pkce
  insecure: false

purple-father-70173

11/26/2024, 8:44 PM

I've set the config file to point to the correct endpoint, here's a debug output:

Copy code

12:43:42.369408 DEBUG    plugin.py:68 - Creating remote with config Config(platform=PlatformConfig(endpoint='flyte.local', insecure=False, insecure_skip_verify=False, ca_cert_file_path='/home/titsw/.flyte/ca.crt', console_endpoint=None, command=None, proxy_command=None, client_id=None,                        
                         client_credentials_secret=None, scopes=[], auth_mode='Pkce', audience=None, rpc_retries=3, http_proxy_url=None), secrets=SecretsConfig(env_prefix='_FSEC_', default_dir='/etc/secrets', file_prefix=''), stats=StatsConfig(host='localhost', port=8125, disabled=False, disabled_tags=False),
                         data_config=DataConfig(s3=S3Config(enable_debug=False, endpoint=None, retries=3, backoff=datetime.timedelta(seconds=5), access_key_id=None, secret_access_key=None), gcs=GCSConfig(gsutil_parallelism=False), azure=AzureBlobStorageConfig(account_name=None, account_key=None,              
                         tenant_id=None, client_id=None, client_secret=None), generic=GenericPersistenceConfig(attach_execution_metadata=True)), local_sandbox_path='/tmp/flytestlcs8kp')                                                                                                                             
12:43:42.372883 DEBUG    run.py:588 - Running workflow <http://demo.wf|demo.wf> with input {'inputs_file': None}

purple-father-70173

11/26/2024, 8:44 PM

The issue I'm seeing is that the only ports

flyte.local

allows are

and

purple-father-70173

11/26/2024, 9:07 PM

is also the nodeport for minio on this cluster, is that related?

purple-father-70173

11/26/2024, 9:18 PM

Okay, after observing the error more carefully. The issue seems to be that pyflyte is wrongly trying to use

flyte.local

to connect to minio, which is not under my ingress

purple-father-70173

11/26/2024, 9:20 PM

I added a storage configuration to my config.yaml file:

Copy code

storage:
  connection:
    endpoint: <http://192.168.1.41:30084>
    access-key: minio
    secret-key: miniostorage

However, I still get this error `HTTPConnectionPool(host='flyte.local', port=30084): Max retries exceeded with url`so something in my configuration is wrong

average-finland-92144

11/26/2024, 9:34 PM

Can you share your Helm values?

purple-father-70173

11/26/2024, 9:35 PM

Sure, these are my values in argocd:

Copy code

userSettings:
  hostName: flyte.local
configuration:
  logging:
    level: 1
  database:
    username: postgres
    password: postgres
    host: postgres.flyte
    dbname: flyte
  storage:
    type: minio
    metadataContainer: flyte
    userDataContainer: flyte
    provider: s3
    providerConfig:
      s3:
        region: "us-east-1" #Irrelevant for local but still needed
        authType: "accesskey"
        endpoint: "http://{{ .Values.userSettings.hostName }}:30084"
        accessKey: "minio"
        secretKey: "miniostorage"
        disableSSL: "true"
        secure: "false"  
  inline:
    plugins:
      k8s:
        inject-finalizer: true
        default-env-vars:
          - FLYTE_AWS_ENDPOINT: "http://{{ .Values.userSettings.hostName }}:30084"
          - FLYTE_AWS_ACCESS_KEY_ID: "minio"
          - FLYTE_AWS_SECRET_ACCESS_KEY: "miniostorage"
    task_resources:
      defaults: 
        cpu: 1000m
        memory: 500Mi #change default requested resources and limits to fit your needs
      limits:
        cpu: 2000m
        memory: 2000Mi
    storage:
      cache:
        max_size_mbs: 100
        target_gc_percent: 100
ingress:
  create: true
  host: "{{ .Values.userSettings.hostName }}"
  separateGrpcIngress: true
  ingressClassName: nginx
  tls:
    - hosts:
        - "{{ .Values.userSettings.hostName }}"
      secretName: flytetls
  commonAnnotations:
    <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: public
  httpAnnotations:
    <http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
  grpcAnnotations:
    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC

average-finland-92144

11/26/2024, 9:36 PM

So there you have an endpoint pointing to 30084

purple-father-70173

11/26/2024, 9:36 PM

according to my values file I should

purple-father-70173

11/26/2024, 9:37 PM

I'm following the main branch of https://github.com/davidmirror-ops/flyte-the-hard-way/tree/main/docs/on-premises/microk8s So I deployed minio/postgres via https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/on-premises/microk8s/manifests/local-flyte-resources.yaml

average-finland-92144

11/26/2024, 9:38 PM

Ah right, but you should point to 9000 instead of 30084

purple-father-70173

11/26/2024, 9:41 PM

So to clarify, replace all instances of 30084 with 9000 in this values file?

purple-father-70173

11/26/2024, 9:45 PM

Well, I tried that, and I'm getting the same error, different port:

HTTPConnectionPool(host='flyte.local', port=9000): Max retries exceeded with url

purple-father-70173

11/26/2024, 9:46 PM

let me change the values to reflect the actual endpoint...

purple-father-70173

11/26/2024, 9:50 PM

Changing the s3 endpoint values to the actual nodeport endpoint worked!

purple-father-70173

11/26/2024, 9:51 PM

So two weird things: 1. How do I change the default storage configuration in my config? Because what I posted above reflected in the `Config`object, but didn't actually change execution 2. What is the reason for

flyte-the-hard-way

to map minio to

flyte.local

purple-father-70173

11/26/2024, 9:56 PM

Also, the demo.py failed with

ModuleNotFoundError: No module named 'pandas'

76 Views

Open in Slack

Previous Next