Hi, I'm trying to deploy flyte via k3s and was won...
# flyte-deployment
p
Hi, I'm trying to deploy flyte via k3s and was wondering if anyone knows the correct configuration for the ingress of flyte-binary
K3s uses traefik, and so far when I try to submit workflows all I get is 404 and 500 with "received http2 header with status". I can access the webui but still can't do anything with pyflyte remotely
I'm following https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/on-premises/microk8s/05-add-ingress-and-tls.md and so far all I have done is replace the ingress class with traefik
a
@purple-father-70173 there are some previous experiences with Traefik, like this I think @kind-night-79286 also uses Traefik with on-prem K8s
seems like effectively the most challenging part is gRPC config, what the CLI uses
p
This has been my experience thus far, grpc config is not really an ootb experience
a
can you share the ingress config you're using?
p
Copy code
ingress:
  create: true
  host: "{{ .Values.userSettings.hostName }}"
  separateGrpcIngress: true
  ingressClassName: traefik
  tls:
    - hosts:
        - "{{ .Values.userSettings.hostName }}"
      secretName: flytetls
  commonAnnotations:
    <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: traefik
    <http://external-dns.alpha.kubernetes.io/hostname|external-dns.alpha.kubernetes.io/hostname>: flyte.local
  httpAnnotations:
    <http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
  grpcAnnotations:
    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC
<http://external-dns.alpha.kubernetes.io/hostname|external-dns.alpha.kubernetes.io/hostname>: flyte.local
is just me trying something
a
I don't think nginx annotations will have any effect on traefik behavior. I faced a similar problem in the past and ended up installing k3s without traefik and manually deploying nginx as controller a good reference is: https://www.suse.com/support/kb/doc/?id=000020082
p
Interesting, is there a way to replace traefik with nginx without reinstalling k3s?
I tried the curl command in that issue you linked: `curl -v -k -X POST --http2 'https://flyte.local/grpc.health.v1.Health' -d "" -H 'Content-Type: application/grpc' -H 'Accept: application/grpc`` and I don't get a
200
, I instead get a
500
Internal Server Error
a
is
flyte.local
being served by your ingress controller? if that's the case and you haven't changed anything, it's not going to work. To test without ingress you can do a port-forward of the flyteadmin service and then curl that endpoint
p
Copy code
NAME                      CLASS     HOSTS         ADDRESS        PORTS     AGE
flyte-flyte-binary-grpc   traefik   flyte.local   192.168.1.80   80, 443   11h
flyte-flyte-binary-http   traefik   flyte.local   192.168.1.80   80, 443   15h
When port-forwarding the grpc endpoint I get:
Copy code
$ curl -v -X POST --http2 '<https://localhost:8089/grpc.health.v1.Health>' -d "" -H 'Content-Type: application/grpc' -H 'Accept: application/grpc'
* Host localhost:8089 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8089...
* connect to ::1 port 8089 from ::1 port 58848 failed: Connection refused
*   Trying 127.0.0.1:8089...
* Connected to localhost (127.0.0.1) port 8089
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number
* Closing connection
curl: (35) OpenSSL/3.0.13: error:0A00010B:SSL routines::wrong version number
I can curl the http endpoint just fine fyi
a
but are you port-forwarding the ingress resource or the corresponding K8s service?
p
Oh sorry, I misread I was port forwarding the svc
I'm port-forwarding the grpc service with
kubectl port-forward -n flyte svc/flyte-flyte-binary-grpc 8089:8089
a
ok. well it's failing on SSL check which I guess it's expected. Not sure if you'd get a different result using
grpcurl
but the point is that unless you have a well-defined set of routes and annotations for Traefik gRPC to work, it may be better to install nginx and use `IngressClassName: nginx`to make sure Ingress requests are fulfilled by that controller and not traefik
p
Good to know, thanks David!
Okay, I've deployed the ingress controller and now I'm using that class for flyte:
Copy code
NAME                      CLASS   HOSTS         ADDRESS        PORTS     AGE
flyte-flyte-binary-grpc   nginx   flyte.local   192.168.1.83   80, 443   13h
flyte-flyte-binary-http   nginx   flyte.local   192.168.1.83   80, 443   16h
Okay, now that curl command shows up as a
200
a
coolm what about
pyflyte
p
running the demo, I'm getting:
Copy code
MaxRetryError: HTTPConnectionPool(host='flyte.local', port=30084): Max retries exceeded with url: 
/flyte/flytesnacks/development/XKZHA3QUW6N2XSH5PLMKL22K5U%3D%3D%3D%3D%3D%3D/fast24e3400da62c0949fed85ce3cb777998.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20241126%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241126T183153Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=content-md5%3Bhos
t%3Bx-amz-meta-flytecontentmd5&X-Amz-Signature=a147ae265522fb16b2d4992d75fdcd0b06596547158f19fe546ae3ce0b260e83 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f371c13e660>: Failed to establish a new connection: [Errno 113] No route to host'))
This is my config:
Copy code
admin:
  endpoint: dns:///flyte.local
  insecure: false
  caCertFilePath: /home/titsw/.flyte/ca.crt
a
looks like
flyte.local
is not resolving on the DNS server
p
I wonder if this has something to do with me not having a dedicated DNS server on my network. I'm just modifying coredns on k3s:
Copy code
Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        hosts /etc/coredns/NodeHosts {
          ttl 60
          reload 15s
          192.168.1.83 flyte.local
          fallthrough
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
        import /etc/coredns/custom/*.override
    }
    import /etc/coredns/custom/*.server
I moved Flyte to its own route, looks like I did this step incorrectly:
Copy code
Corefile: |-
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        hosts /etc/coredns/NodeHosts {
          ttl 60
          reload 15s
          fallthrough
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
        import /etc/coredns/custom/*.override
    }
    import /etc/coredns/custom/*.server

    flyte.local:53 {
        errors
        hosts {
          192.168.1.83 flyte.local
        }
    }
Okay, so I can
nslookup
flyte.local no problem, but when I try to ping flyte.local I get:
Copy code
From <k3s-agent-node-ip-address> icmp_seq=1 Destination Host Unreachable
...
a
hm, maybe your OS is trying to resolve from another server first?
p
yeah, this is definitely some routing issue in my network. I'm looking into it
Is there a reason why it's using port 30084?
a
uh have you used
flytectl demo
before?
p
I have not
a
30084
is the port used by that type of instance. In any case you can instruct the CLI to use your config file:
Copy code
export FLYTECTL_CONFIG=$HOME/.flyte/config.yaml
and validate your
config.yaml
points to your Flyte cluster like
Copy code
cat $HOME/.flyte/config.yaml
admin:
  endpoint: dns:///flyte.local
  authType: Pkce
  insecure: false
p
I've set the config file to point to the correct endpoint, here's a debug output:
Copy code
12:43:42.369408 DEBUG    plugin.py:68 - Creating remote with config Config(platform=PlatformConfig(endpoint='flyte.local', insecure=False, insecure_skip_verify=False, ca_cert_file_path='/home/titsw/.flyte/ca.crt', console_endpoint=None, command=None, proxy_command=None, client_id=None,                        
                         client_credentials_secret=None, scopes=[], auth_mode='Pkce', audience=None, rpc_retries=3, http_proxy_url=None), secrets=SecretsConfig(env_prefix='_FSEC_', default_dir='/etc/secrets', file_prefix=''), stats=StatsConfig(host='localhost', port=8125, disabled=False, disabled_tags=False),
                         data_config=DataConfig(s3=S3Config(enable_debug=False, endpoint=None, retries=3, backoff=datetime.timedelta(seconds=5), access_key_id=None, secret_access_key=None), gcs=GCSConfig(gsutil_parallelism=False), azure=AzureBlobStorageConfig(account_name=None, account_key=None,              
                         tenant_id=None, client_id=None, client_secret=None), generic=GenericPersistenceConfig(attach_execution_metadata=True)), local_sandbox_path='/tmp/flytestlcs8kp')                                                                                                                             
12:43:42.372883 DEBUG    run.py:588 - Running workflow <http://demo.wf|demo.wf> with input {'inputs_file': None}
The issue I'm seeing is that the only ports
flyte.local
allows are
80
and
443
30084
is also the nodeport for minio on this cluster, is that related?
Okay, after observing the error more carefully. The issue seems to be that pyflyte is wrongly trying to use
flyte.local
to connect to minio, which is not under my ingress
I added a storage configuration to my config.yaml file:
Copy code
storage:
  connection:
    endpoint: <http://192.168.1.41:30084>
    access-key: minio
    secret-key: miniostorage
However, I still get this error `HTTPConnectionPool(host='flyte.local', port=30084): Max retries exceeded with url`so something in my configuration is wrong
a
Can you share your Helm values?
p
Sure, these are my values in argocd:
Copy code
userSettings:
  hostName: flyte.local
configuration:
  logging:
    level: 1
  database:
    username: postgres
    password: postgres
    host: postgres.flyte
    dbname: flyte
  storage:
    type: minio
    metadataContainer: flyte
    userDataContainer: flyte
    provider: s3
    providerConfig:
      s3:
        region: "us-east-1" #Irrelevant for local but still needed
        authType: "accesskey"
        endpoint: "http://{{ .Values.userSettings.hostName }}:30084"
        accessKey: "minio"
        secretKey: "miniostorage"
        disableSSL: "true"
        secure: "false"  
  inline:
    plugins:
      k8s:
        inject-finalizer: true
        default-env-vars:
          - FLYTE_AWS_ENDPOINT: "http://{{ .Values.userSettings.hostName }}:30084"
          - FLYTE_AWS_ACCESS_KEY_ID: "minio"
          - FLYTE_AWS_SECRET_ACCESS_KEY: "miniostorage"
    task_resources:
      defaults: 
        cpu: 1000m
        memory: 500Mi #change default requested resources and limits to fit your needs
      limits:
        cpu: 2000m
        memory: 2000Mi
    storage:
      cache:
        max_size_mbs: 100
        target_gc_percent: 100
ingress:
  create: true
  host: "{{ .Values.userSettings.hostName }}"
  separateGrpcIngress: true
  ingressClassName: nginx
  tls:
    - hosts:
        - "{{ .Values.userSettings.hostName }}"
      secretName: flytetls
  commonAnnotations:
    <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: public
  httpAnnotations:
    <http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
  grpcAnnotations:
    <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC
a
So there you have an endpoint pointing to 30084
p
according to my values file I should
a
Ah right, but you should point to 9000 instead of 30084
p
So to clarify, replace all instances of 30084 with 9000 in this values file?
Well, I tried that, and I'm getting the same error, different port:
HTTPConnectionPool(host='flyte.local', port=9000): Max retries exceeded with url
let me change the values to reflect the actual endpoint...
Changing the s3 endpoint values to the actual nodeport endpoint worked!
So two weird things: 1. How do I change the default storage configuration in my config? Because what I posted above reflected in the `Config`object, but didn't actually change execution 2. What is the reason for
flyte-the-hard-way
to map minio to
flyte.local
?
Also, the demo.py failed with
ModuleNotFoundError: No module named 'pandas'