I'm having some trouble configuring a new installa...
# flyte-deployment
j
I'm having some trouble configuring a new installation of flyte, but I think I'm close. I installed flyte with opta using the documentation here https://docs.flyte.org/en/latest/deployment/aws/opta.html#deployment-aws-opta I'm using an external-ssl-cert and am able to access the flyte console. My two problems: 1. I don't think I am using the correct endpoint for flytectl. I thought it should be the subdomain I access the console through but that didn't work. After trying a few things I was able to get it working by pointing flytectl directly to the flyteadmin service's load balancer on port 81. I used 
kubectl -n flyte get services flyteadmin
 to find it. Is this the correct way to do it? 2. I am having trouble configuring authentication with google cloud. Using https://docs.flyte.org/en/latest/deployment/cluster_config/auth_setup.html#deployment-cluster-config-auth-setup I did the following + Setup my google cloud OAuth2 Client Credential + Ran 
kubectl edit secret -n flyte flyte-admin-secrets
 and added the client secret + Ran 
kubectl edit configmap -n flyte flyte-admin-config
 updated the config according to the docs + Restarted flyteadmin with 
kubectl rollout restart deployment/flyteadmin -n flyte
I didn't get everything wrong because when I visited the flyte console it redirected me to google to login before going to the dashboard. However when I tried to run a workflow the new execution just hung with status unknown. I also was unable to connect with flytectl no matter what I tried. I'm not sure what I'm doing wrong here. Any help is much appreciated.
k
hi JP, firstly welcome to the community. Great to have you here
Great progress and thank you for reaching out for help instead of quitely walking away. We are here to help
I don’t think I am using the correct endpoint for flytectl. I thought it should be the subdomain I access the console through but that didn’t work.
After trying a few things I was able to get it working by pointing flytectl directly to the flyteadmin service’s load balancer on port 81.
I used 
kubectl -n flyte get services flyteadmin
 to find it. Is this the correct way to do it?
Ideally you should use flyteadmin and flyteconsole behind the same domain - If flyte console is working and it is autodiscovering flyteadmin, it is because they are running on the same domain. Are you using Ingress? If so on your domain can you try
<https://your-doman/api/v1/projects>
If this works, then your admin service port 80 is correctly configured. But, I will also let @Prafulla Mahindrakar chip in here - on your browser
for
2. I am having trouble configuring authentication with google cloud.
Using https://docs.flyte.org/en/latest/deployment/cluster_config/auth_setup.html#deployment-cluster-config-auth-setup I did the following
+ Setup my google cloud OAuth2 Client Credential
+ Ran 
kubectl edit secret -n flyte flyte-admin-secrets
 and added the client secret
+ Ran 
kubectl edit configmap -n flyte flyte-admin-config
 updated the config according to the docs
+ Restarted flyteadmin with 
kubectl rollout restart deployment/flyteadmin -n flyte
I didn’t get everything wrong because when I visited the flyte console it redirected me to google to login before going to the dashboard.
However when I tried to run a workflow the new execution just hung with status unknown.
I also was unable to connect with flytectl no matter what I tried.
Seems like you have not added the client secret for flytepropeller, which is the actual engine that progresses the flyte workflows and talks with FlyteAdmin. if you can
kubectl get pods -n flyte | grep flytepropeller
and then
kubectl logs <pod>
you should see some errors - cc @Haytham Abuelfutuh?
j
Hey Ketan, thanks for the help I am using the ingress setup by the opta install. When I run
kubectl -n flyte get ingress
I see two results one named
flyte-core
and the other
flyte-core-grpc
both have the same host, port, and address When I visit https://your-doman/api/v1/projects I get a json response with my projects listed. In
.flyte/config.yaml
I've update the endpoint line to
endpoint: dns:///your-domain:80
from the working
endpoint: dns:///aws-load-balancer:81
I'm getting the following result with flytectl
Copy code
flytectl get projects
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2022-01-05T22:49:12-05:00"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [root] updated. No update handler registered.","ts":"2022-01-05T22:49:12-05:00"}
{"json":{"src":"viper.go:400"},"level":"debug","msg":"Config section [admin] updated. Firing updated event.","ts":"2022-01-05T22:49:12-05:00"}
{"json":{"src":"client.go:180"},"level":"error","msg":"failed to initialize token source provider. Err: rpc error: code = Unavailable desc = connection closed","ts":"2022-01-05T22:49:12-05:00"}
{"json":{"src":"client.go:185"},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-01-05T22:49:12-05:00"}
{"json":{"src":"client.go:58"},"level":"info","msg":"Initialized Admin client","ts":"2022-01-05T22:49:12-05:00"}
Error: rpc error: code = Unavailable desc = connection closed
{"json":{"src":"main.go:13"},"level":"error","msg":"rpc error: code = Unavailable desc = connection closed","ts":"2022-01-05T22:49:13-05:00"}
For the auth I definitely did not add a client secret for flytepropeller and a see error messages like this in the logs
Copy code
{"json":{"exec_id":"bsqhtfi4mr","ns":"flytesnacks-development","res_ver":"69106","routine":"worker-11","src":"workflow_event_recorder.go:69","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"info","msg":"Failed to record workflow event [execution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"bsqhtfi4mr\" \u003e phase:RUNNING occurred_at:\u003cseconds:1641425632 nanos:476363641 \u003e ] with err: EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken]","ts":"2022-01-05T23:33:52Z"}
{"json":{"exec_id":"bsqhtfi4mr","ns":"flytesnacks-development","res_ver":"69106","routine":"worker-11","src":"executor.go:342","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"warning","msg":"Event recording failed. Error [EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken]]","ts":"2022-01-05T23:33:52Z"}
{"json":{"exec_id":"bsqhtfi4mr","ns":"flytesnacks-development","res_ver":"69106","routine":"worker-11","src":"handler.go:134","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"error","msg":"Error when trying to reconcile workflow. Error [[]]. Error Type[*errors.WorkflowErrorWithCause]. Is nill [%!v(MISSING)]","ts":"2022-01-05T23:33:52Z"}
E0105 23:33:52.487513       1 workers.go:102] error syncing 'flytesnacks-development/bsqhtfi4mr': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken]
I'm finding the docs for auth a little confusing. Do I need to follow both the OpenID Connect and OAuth2 Authorization Server sections? I was only interested in configuring authentication through google if possible, not authorization.
k
Ohh thank you for the feedback we will Improve the docs. So you need both, because all Flyte services are authenticated
So OIDC connect is used for Ui authn
p
@JP Kosymna if you are using ssl , are you having insecure key as false in config.yaml . The same domain you use for flyteconsole should work for flytectl without any port suffix. I also see that you were successfully able to do https://your-doman/api/v1/projects So REST endpoint is working fine too . Also the usage of aws alb of flyteadmin can be used for testing purposes but is not recommended and you should ideally go through ingress with your domain name. endpoint: dns///aws load balancer81 is correct way to access if you are using admins alb and not endpoint: dns///aws load balancer80 Can you help me with dump of your config.yaml
k
But since we have a tool cli to connect we need to use client secrets
p
I also see you were able to get flytectl working with
dns:///aws-load-balancer:81
so the difference i see now if you use the domain is that it has ssl termination and hence i am suspecting that insecure is left true in your config.yaml
j
here is my conf.yaml
Copy code
admin:
  endpoint: dns:///aws-load-balancer:81 # works
#   endpoint: dns:///your-domain # doesn't work
  authType: Pkce
  insecure: true
logger:
  show-source: true
  level: 9
storage:
  type: stow	
  stow:
    kind: s3
    config:
      auth_type: iam
      region: us-west-2
  container: flyte-prod-service-flyte
p
change
Copy code
insecure: true
to
Copy code
insecure: false
and use
Copy code
endpoint: dns:///your-domain
j
I get this error
Copy code
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2022-01-06T00:22:02-05:00"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [root] updated. No update handler registered.","ts":"2022-01-06T00:22:02-05:00"}
{"json":{"src":"viper.go:400"},"level":"debug","msg":"Config section [admin] updated. Firing updated event.","ts":"2022-01-06T00:22:02-05:00"}
{"json":{"src":"client.go:180"},"level":"error","msg":"failed to initialize token source provider. Err: rpc error: code = PermissionDenied desc = Forbidden: HTTP status code 403; transport: missing content-type field","ts":"2022-01-06T00:22:02-05:00"}
{"json":{"src":"client.go:185"},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-01-06T00:22:02-05:00"}
{"json":{"src":"client.go:58"},"level":"info","msg":"Initialized Admin client","ts":"2022-01-06T00:22:02-05:00"}
Error: rpc error: code = PermissionDenied desc = Forbidden: HTTP status code 403; transport: missing content-type field
{"json":{"src":"main.go:13"},"level":"error","msg":"rpc error: code = PermissionDenied desc = Forbidden: HTTP status code 403; transport: missing content-type field","ts":"2022-01-06T00:22:02-05:00"}
p
Strange. are you using flytectl in env which has proxy setup .
j
No proxy, I can share the domain if that would help. I'm going to have to add auth anyway.
p
Sure
j
p
flytectl uses grpc for communication. We will have to debug this traffic. its throwing 403 from ingress i suppose and not reaching flyteadmin at all . Can you help me with access logs to your ingress
j
sure. I didn't change much from the basic opta install my env.yaml is
Copy code
name: flyte-prod
org_name: abovedata
providers:
  aws:
    region: <region>
    account_id: <account_id>
modules:
  - type: base
  - type: external-ssl-cert
    domain: "<http://flyte.abovedata.io|flyte.abovedata.io>"
    private_key_file: "./cert/privkey.pem"
    certificate_body_file: "./cert/cert.pem"
    certificate_chain_file: "./cert/chain.pem"
  - type: k8s-cluster
    max_nodes: 15
  - type: k8s-base
I setup a cname to the
load_balancer_raw_dns
that the opt apply command output
I'm not familiar with exactly what the command setup but can pull any logs you need.
p
that seems correct . We can check the ingress access logs to see whats happening . Let me check the command needed for it
y
just to check real quick… if you
k -n flyte get ingress
, then copy that elb address that shows up, and search for it in the aws console, the load balancer that comes up is of type ‘network’ right?
and if you click on the listeners tab, the tls : 443 entry has http2preferred as alpn?
j
The type is 'network' but the listeners table has 'N/A' for alpn
y
should definitely be http2preferred
logs for ingress can be gotten via
Copy code
k -n ingress-nginx logs service/ingress-nginx-controller controller
btw
p
thanks @Yee. @JP Kosymna can you manually changed the type and retry the request again. I can retry from side too to confirm
this seems to be a bug in opta deployment
j
Do you know how to manually change the type? I don't see any option in the ui.
j
Thank you. I've updated the load balancer
p
hmm. that doesnt seem to have helped . Getting the same error
Can we check the ingress logs if it reached there . Additionally we can enable access logs on the network lb https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/enable-access-logs.html
j
Yeah and the flyte console is no longer working either.
this is from the ingress logs
Copy code
127.0.0.1 - - [06/Jan/2022:06:00:15 +0000] "GET /me HTTP/2.0" 501 131 "<https://flyte.abovedata.io/console/>" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.9 Safari/537.36" 2906 0.002 [flyte-flyteadmin-80] [] 10.0.137.42:8088 131 0.000 501 6a08211986bbd36bdd22fa2f38fb9ee3
127.0.0.1 - - [06/Jan/2022:06:01:29 +0000] "H\x00\x00\x00tj\xA8\x9E#D\x98+\xCA\xF0\xA7\xBBl\xC5\x19\xD7\x8D\xB6\x18\xEDJ\x1En\xC1\xF9xu[l\xF0E\x1D-j\xEC\xD4xL\xC9r\xC9\x15\x10u\xE0%\x86Rtg\x05fv\x86]%\xCC\x80\x0C\xE8\xCF\xAE\x00\xB5\xC0f\xC8\x8DD\xC5\x09\xF4" 400 150 "-" "-" 0 0.032 [] [] - - - - 6ce31975ecfa63d344918a94299911f7
2022/01/06 06:01:50 [crit] 130#130: *270355 SSL_do_handshake() failed (SSL: error:141CF06C:SSL routines:tls_parse_ctos_key_share:bad key share) while SSL handshaking, client: 127.0.0.1, server: 0.0.0.0:443
127.0.0.1 - - [06/Jan/2022:06:30:01 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.001 [] [] - - - - 10aab187ee0fc030cfffb9f683a1f124
127.0.0.1 - - [06/Jan/2022:06:30:03 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.001 [] [] - - - - 10a592ec76400045899b6582b4a491cc
127.0.0.1 - - [06/Jan/2022:06:30:05 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.000 [] [] - - - - c3c1d409665b23488d499744c09866f1
1
p
current GMT is 6:37 . can you try rehitting and see if there is any activity
And also check the network load balancer health in the aws console
j
Here are some of the load balancer logs from s3
Copy code
tls 2.0 2022-01-06T06:33:06 net/opta-flyte-prod-lb/5181b1de81b2e8a5 5128fa559e0e0e55 172.70.175.129:18738 10.0.10.27:443 194 96 3153 280 - arn:aws:acm:us-west-2:518673686532:certificate/a96b5b0f-c46f-4d68-ad84-7e6ccb064b03 - TLS_AES_128_GCM_SHA256 tlsv13 - <http://flyte.abovedata.io|flyte.abovedata.io> h2 - "h2","http/1.1"
tls 2.0 2022-01-06T06:33:03 net/opta-flyte-prod-lb/5181b1de81b2e8a5 5128fa559e0e0e55 172.70.175.129:13922 10.0.10.27:443 182 91 3152 280 - arn:aws:acm:us-west-2:518673686532:certificate/a96b5b0f-c46f-4d68-ad84-7e6ccb064b03 - TLS_AES_128_GCM_SHA256 tlsv13 - <http://flyte.abovedata.io|flyte.abovedata.io> h2 - "h2","http/1.1"
tls 2.0 2022-01-06T06:33:05 net/opta-flyte-prod-lb/5181b1de81b2e8a5 5128fa559e0e0e55 172.70.175.129:16288 10.0.10.27:443 210 104 3152 280 - arn:aws:acm:us-west-2:518673686532:certificate/a96b5b0f-c46f-4d68-ad84-7e6ccb064b03 - TLS_AES_128_GCM_SHA256 tlsv13 - <http://flyte.abovedata.io|flyte.abovedata.io> h2 - "h2","http/1.1"
tls 2.0 2022-01-06T06:34:40 net/opta-flyte-prod-lb/5181b1de81b2e8a5 5128fa559e0e0e55 172.70.175.9:38220 10.0.10.27:443 183 91 3116 280 - arn:aws:acm:us-west-2:518673686532:certificate/a96b5b0f-c46f-4d68-ad84-7e6ccb064b03 - TLS_AES_128_GCM_SHA256 tlsv13 - <http://flyte.abovedata.io|flyte.abovedata.io> h2 - "h2","http/1.1"
tls 2.0 2022-01-06T06:34:35 net/opta-flyte-prod-lb/5181b1de81b2e8a5 5128fa559e0e0e55 172.70.143.75:61152 10.0.10.27:443 379 190 685 280 - arn:aws:acm:us-west-2:518673686532:certificate/a96b5b0f-c46f-4d68-ad84-7e6ccb064b03 - TLS_AES_128_GCM_SHA256 tlsv13 - <http://flyte.abovedata.io|flyte.abovedata.io> h2 - "h2","http/1.1"
[jp:~/code/flyte/opta/aws/logs] master(+34/-11)* + cat 2022-01-06-04-19-55-50D57302AA03F02D.txt
5bc8cf1d4425a49bdda561d1cb646f612e01e61624891a83a121d15bb937416b flyte-prod-service-flyte [06/Jan/2022:03:34:55 +0000] 44.237.129.175 - VRQG12Z8D8MV0V1E REST.HEAD.BUCKET - "HEAD /flyte-prod-service-flyte HTTP/1.1" 403 AccessDenied 243 - 12 - "-" "aws-sdk-go/1.37.31 (go1.17.1; linux; amd64)" - xERUwvdicSIM15Eq1k50tRWiDWYwbT3nijd6syk0QJhjlh0sNF/OIIIrd8eITKb4uuPQB4SeKXw= - ECDHE-RSA-AES128-GCM-SHA256 - <http://s3.us-west-2.amazonaws.com|s3.us-west-2.amazonaws.com> TLSv1.2 -
Newer Ingress logs just seem to have some sort of heartbeat
Copy code
127.0.0.1 - - [06/Jan/2022:06:38:56 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.000 [] [] - - - - 6a24fd39725c087c20644b825d2cabdc
127.0.0.1 - - [06/Jan/2022:06:38:58 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.001 [] [] - - - - 4e606fca6e256b06c1749869de0f8b01
127.0.0.1 - - [06/Jan/2022:06:39:01 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.001 [] [] - - - - 2b608725dcf37090bef2d3f40630fe1d
127.0.0.1 - - [06/Jan/2022:06:42:50 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.001 [] [] - - - - 4873377819415ca0434fd819f237f38e
127.0.0.1 - - [06/Jan/2022:06:42:52 +0000] "PRI * HTTP/2.0" 400 150 "-" "-" 0 0.001 [] [] - - - - 838df2b70e932f9513a5440e13c907ef
p
seems it not reaching ingress and terminating with 403 failure in load balancer.
if http2 has been set as policy in alpn then not sure why the logs are mentioning HTTP/1.1
j
It's late here I'll pick this up in the morning. Is there something I can do to help you debug this? I can start from scratch and document my exact steps to reproduce this cluster setup if that helps. I can also try the manual install and see if I get the same errors. I'm not sure if there is a way to dump the current aws config, but I saw opta apply had a show full details or something like. Maybe reinstall with that option and keep all the details? Let me know if any of the above sounds useful.
p
eg log from one of out internal lb's where you can see it shows HTTP/2.0 negotiated
Copy code
h2 2021-01-01T01:11:19.448563Z app/k8s-flyte-flytesys-f5f79a76fb/0f5c7efe7163f712 50.46.126.122:57853 192.168.156.35:31448 0.000 0.002 0.000 200 200 229 5386 "GET <https://demo.nuclyde.io:443/console/projects/flytetester/domains/development/executions/ahf8lvf2ky> HTTP/2.0" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:us-east-2:590375264460:targetgroup/k8s-flyte-flytecon-f4d140aa81/4e985ce8467816a6 "Root=1-5fee76b7-093e67ed0f08306856fd1714" "<http://demo.nuclyde.io|demo.nuclyde.io>" "arn:aws:acm:us-east-2:590375264460:certificate/e2f04275-2dff-4118-a493-ed3ec8f41605" 2 2021-01-01T01:11:19.445000Z "forward" "-" "-" "192.168.156.35:31448" "200" "-" "-"
One thing i would suggest is to upgrade to latest opta binary and see if you get the same issue . This would be faster aswell . And if this doesn't work we will involve the opta team and parallely we can try a manual one
yeah and also do dump the opta config
j
Sounds like a plan
Thank you for all the help
p
sure np.
y
@JD Palomino any thoughts/ideas?
j
this is a lot to take in lol-- also I see that there’s been some manual changes so I am not certain about the state of the system. When I find myself in such a situation I usually like to start with a clean slate, so I would recommend destroying the current service and k8s-base (no need to destroy all env) and re-applying. From there, if you are still facing issues I am happy to jump on a quick call to help out
j
I decided to blow everything out to start from scratch. These are the basic steps to reproduce my issue with flytectl not being able to establish a connection to the cluster.
Copy code
> opta version
v0.23.0

> terraform -version
Terraform v1.0.11
on linux_amd64

> git rev-parse HEAD
8c43b3a564f2637b09c5a6a34f1a4bdc9545e68b

> git diff
diff --git a/opta/aws/env.yaml b/opta/aws/env.yaml
index 244f4494..3cd5e235 100644
--- a/opta/aws/env.yaml
+++ b/opta/aws/env.yaml
@@ -1,14 +1,16 @@
-name: <env_name>
-org_name: <your_company>
+name: flyte-prod
+org_name: abovedata
 providers:
   aws:
-    region: <region>
-    account_id: <account_id>
+    region: us-west-2
+    account_id: 518673686532
 modules:
   - type: base
-  - type: dns
-    domain: <domain>
-    delegated: false # set to true once ready <https://docs.opta.dev/miscellaneous/ingress/>
+  - type: external-ssl-cert
+    domain: "<http://flyte.abovedata.io|flyte.abovedata.io>"
+    private_key_file: "./cert/live/abovedata.io/privkey.pem"
+    certificate_body_file: "./cert/live/abovedata.io/cert.pem"
+    certificate_chain_file: "./cert/live/abovedata.io/chain.pem"
   - type: k8s-cluster
     max_nodes: 15
   - type: k8s-base
diff --git a/opta/aws/flyte.yaml b/opta/aws/flyte.yaml
index 69aa32fd..41fd99a4 100644
--- a/opta/aws/flyte.yaml
+++ b/opta/aws/flyte.yaml
@@ -2,8 +2,8 @@ environments:
   - name: default
     path: "./env.yaml" # NOTE: relative path to environment
     variables:
-      region: <region>
-      account_id: <account_id>
+      region: us-west-2
+      account_id: 518673686532
 name: service-flyte
 modules:
   - name: postgres
I installed the env with
opta apply -c env.yaml --auto-approve --detailed-plan
it completed in ~25mins I updated my cname flyte.abovedata.io to point to opta-flyte-prod-lb-500005f6805177fc.elb.us-west-2.amazonaws.com I then installed flyte with
opta apply -c flyte.yaml --auto-approve --detailed-plan
it completed in ~9mins This is the flytectl config I'm using
Copy code
> cat ~/.flyte/config.yaml
admin:
  # works
  endpoint: dns:///a0c0be52128614dd6a2d9f1bc17a3a15-637195292.us-west-2.elb.amazonaws.com:81
  insecure: true
#   # doesn't work
#   endpoint: dns:///flyte.abovedata.io 
#   insecure: false
  authType: Pkce
logger:
  show-source: true
  level: 9
storage:
  type: stow    
  stow:
    kind: s3
    config:
      auth_type: iam
      region: us-west-2
  container: flyte-prod-service-flyte
And here is the output of the two opta commands
j
awesome! As promised, would you like to jump on a call?
j
sure thing can we do it in about 10 minutes?
j
👍 send a link/invite when ready
y
hey sorry did this end up happening?
what was the resolution?
j
@Yee @JP Kosymna I reproduced the environment, but it’s actually working for me
I made sure to change as little as possible
The only difference is that I am not using cloud flare for dns
message has been deleted
@JP Kosymna if you could sit down once more to go over your setup we would greatly appreciate it
j
@JD Palomino @Yee I started from scratch again, this time I configured my cname in cloudflare without checking the proxy setting and now flytectl works! I'll finish up configuring auth today and post if I have any other issues. Thanks for the help
❤️ 2
k
Hurrah, but @JD Palomino / @JP Kosymna it would be great to capture the reasons in a discussion so that others could avail info
j
The issue was using cloudflares proxy on the dns cname (it was a little button that defaulted to on) for my flyte subdomain. Removing the cloudflare proxy fixed the flytectl connection issue.
j
I bet it’s something to do with http2/grpc-- that’s why te web ui was working
y
so if you had pointed flytectl to the elb ingress (not the lb for admin itselft, but the one for ingress) it would’ve worked?
j
When I pointed to the elb ingress I would get a 404 from nginx
j
that’s because it’s looking for requests sent to your domain as host
h
This is great to hear! @JP Kosymna What I’ve done, with Cloudflare before, (Sorry for being late to the party): 1. Modify the env.yaml and set
delegated: false
2. Run
opta apply -c env.yaml --auto-approve
3. Run
opta output  -c env.yaml
and note the
Name servers
4. Go to cloudflare -> DNS settings and add records for these nameservers… if it’s the root domain then use
@
as the name of the record… you are going to have to add 4 records corresponding to the 4 NSs returned from
opta output
5. Note that when you add
NS
records, Cloudflare will automatically disable Proxying traffic… I don’t think it’ll even allow you to turn it on for these records… 6. Modify env.yaml and set
delegated: true
7. Run
opta apply -c env.yaml --auto-approve
to finish the rest of the domain delegation steps…
@JP Kosymna did you manage to configure authN to work too?
y
oh nice. should we add this to docs?
👍 1
j
@Haytham Abuelfutuh I just started working on authN. I want to use google just for authentication with the built-in authorization server. These are the sets I've taken 1. set the oidc_client_secret in flyte-admin-secrets 2. in flyte-admin-config, added host to authorizedUris, set the clientId and set useAuth to true 3. kubectl rollout restart deployment/flytepropeller -n flyte 4. set client_secret in flyte-secret-auth 5. kubectl rollout restart deployment/flytepropeller -n flyte I can login to the console. I tried to run a workflow but I don't think flyte propeller is configured correctly. I see this in the logs
Copy code
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","res_ver":"118138","routine":"worker-6","src":"executor.go:269","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"debug","msg":"Transitioning/Recording event for workflow state transition [Ready] -\u003e [Running]","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","res_ver":"118138","routine":"worker-6","src":"admin_eventsink.go:44","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"debug","msg":"AdminEventSink received a new event execution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"x1smkw4dvh\" \u003e phase:RUNNING occurred_at:\u003cseconds:1641587535 nanos:884137893 \u003e ","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","res_ver":"118138","routine":"worker-6","src":"workflow_event_recorder.go:69","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"info","msg":"Failed to record workflow event [execution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"x1smkw4dvh\" \u003e phase:RUNNING occurred_at:\u003cseconds:1641587535 nanos:884137893 \u003e ] with err: EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"invalid_client\",\"error_description\":\"Client authentication failed (e.g., unknown client, no client authentication included, or unsupported authentication method).\"}]","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","res_ver":"118138","routine":"worker-6","src":"executor.go:342","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"warning","msg":"Event recording failed. Error [EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: 401 Unauthorized\nResponse: {\"error\":\"invalid_client\",\"error_description\":\"Client authentication failed (e.g., unknown client, no client authentication included, or unsupported authentication method).\"}]]","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","res_ver":"118138","routine":"worker-6","src":"executor.go:370","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"info","msg":"Handling Workflow [x1smkw4dvh] Done","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","res_ver":"118138","routine":"worker-6","src":"handler.go:134","wf":"flytesnacks:development:flyte.workflows.example.my_wf"},"level":"error","msg":"Error when trying to reconcile workflow. Error [[]]. Error Type[*errors.WorkflowErrorWithCause]. Is nill [%!v(MISSING)]","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","routine":"worker-6","src":"passthrough.go:80"},"level":"debug","msg":"Observed FlyteWorkflow Update (maybe finalizer)","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","routine":"worker-6","src":"passthrough.go:100"},"level":"debug","msg":"Updated workflow.","ts":"2022-01-07T20:32:15Z"}
{"json":{"exec_id":"x1smkw4dvh","ns":"flytesnacks-development","routine":"worker-6","src":"handler.go:284"},"level":"info","msg":"Completed processing workflow.","ts":"2022-01-07T20:32:15Z"}
E0107 20:32:15.909422       1 workers.go:102] error syncing 'flytesnacks-development/x1smkw4dvh': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: 401 Unauthorized
Response: {"error":"invalid_client","error_description":"Client authentication failed (e.g., unknown client, no client authentication included, or unsupported authentication method)."}
I missed the step
kubectl edit configmap -n flyte flyte-propeller-config
and set the clientId there. But after doing that I still have the errors after restarting flytepropeller
h
Can you share what flyte-propeller-config
admin
section looks like now? And what flyte-admin-config
auth
section looks like?
Do you want to jump on a call?
y
Copy code
kf get secret flyte-secret-auth -o jsonpath="{.data.client_secret}" | base64 --decode
foobar%
j
{"json":{"src":"auth_flow_orchestrator.go:128"},"level":"info","msg":"Opening the browser at https://flyte.abovedata.io/oauth2/authorize?client_id=flytectl\u0026redirect_uri=http%!A(MISSING)%!F(MISSING)%!F(MISSING)localhost%!A(MISSING)53593%!F(MISSING)callback\u0026response_type=code\u0026scope=offline+all\u0026state=bHc1djh3N3FudnZtYndkNW1mcHpoemdqdGd3dHo2ajI\u0026nonce=amhqOGZ4dnZjeHp6dndjc2ZqYmJ0NGh4dHgybjVmNXQ\u0026code_challenge=QC6hFFungKtyGDx-5fXM9-dWT6f2s2iG-3S-usVIXbk\u0026code_challenge_method=S256","ts":"2022-01-07T164931-05:00"} {"json":{"src":"auth_flow_orchestrator.go:123"},"level":"fatal","msg":"Couldn't start the callback http server on host %v due to %vlocalhost:53593listen tcp: lookup localhost on 68.105.28.1153 no such host","ts":"2022-01-07T164931-05:00"}
h
Thank you, @JP Kosymna for the walk through and glad it worked out! Notes I took: 1. Open ID Connection docs section should clearly state that this could be all you need to do if you are only interested in user auth within the browser… 2. After restarting flyteadmin in OIDC Config section, we also need to restart flytepropeller to restablish an authenticated gRPC connection to flyteadmin 3. It’s unclear/confusing what’s the difference between OIDC and Authorization Server… when do you need one or both or either… 4. Looks like
/etc/hosts
was messed up and it was missing
127.0.0.1 localhost
entry causing flytectl to fail in weird ways… we should experiment with using 127.0.0.1 directly as the callback url (maybe) or at the very least document the error @JP Kosymna posted and the solution…
cc @Ketan (kumare3)
j
@Haytham Abuelfutuh any issues with opta?
j
I'm trying once more from scratch just to make sure I understand everything but I don't believe any of the issues were with opta.
Thank you all for the help
h
@JD Palomino thank you for your help… I think we need to update our opta docs to elaborate on the steps needed if you are using an external DNS provider… otherwise all good it seems
j
I'm up and running now. I used opta to install on aws and in general it went pretty smoothly. I run my dns on cloudflare and used the
external-ssl-cert
option in env.yaml So just a recap of my issues. Issue 1 When I setup the cname in my cloudflare instance I didn't realize by default that cloudflare would proxy the requests and by default would filter out grpc. The console worked, but flytectl didn't. Solution: You have two options either enable grpc in the network tab or turn off the proxy both worked for me. Issue 2 Following the auth guide to enable google oidc didn't fully work. I could login via google but running a workflow would hang. Solution: The docs left out that I needed to restart the flytepropeller deployment. (I went down a lot of rabbit holes here as I thought I needed to continue with the OAuth2 configuration to make it all work) Issue 3 When I tried to run
flytectl get projects
with auth enabled my browser just opened with a long localhost url. What was happening was flytectl failed to start its server but was still opening the browser. Solution: I made a correction to my /etc/hosts file so flytectl could start it's server
❤️ 2
Thanks again for everyone's help
221 Views