Shahwar Saleem
10/13/2022, 3:11 PMuserAuth
in config.
2. Flyte CTL uses Device Authentication Flow which should be configured as thirdPartyConfig.flyteClient
3. Flytepropeller and Flyte Scheduler internally use Client Credential flow which are also required to be configured via thirdPartyConfig.flyteClient
My question is that with Auth enabled, how is this possible to have all of above configured simultaneously?
I have 3 clientIds one corresponding to each of above cases. Is it possible for have multiple `flyteClient`s configured using thirdPartyConfig
?Fredrick
10/14/2022, 4:54 PMscratch-volume
does not get added to the flyte containers.
apiVersion: v1
kind: PodTemplate
metadata:
name: flyte-pod-template
namespace: flyte
template:
metadata:
labels:
app: flyte
spec:
containers:
- name: default
image: <http://docker.io/rwgrim/docker-noop|docker.io/rwgrim/docker-noop>
volumeMounts:
- mountPath: /scratch
name: scratch-volume
volumes:
- name: scratch-volume
emptyDir: {}
Felix Ruess
10/14/2022, 5:11 PMflytectl
, which seems to try to connect via gRPC?
Any pointers?Felix Ruess
10/17/2022, 3:30 PMconfigmap.core.propeller.rawoutput-prefix
set to the bucket name that is already set in the storage settings... took me a while to find this...Felix Ruess
10/18/2022, 3:21 PMLaura Lin
10/18/2022, 3:36 PMUNKNOWN:failed to connect to all addresses; last error: UNKNOWN: Peer name <http://k8s-flyte-XXXXX.us-west-1.elb.amazonaws.com|k8s-flyte-XXXXX.us-west-1.elb.amazonaws.com> is not in peer certificate
Laura Lin
10/18/2022, 11:30 PMlytekit.exceptions.user.FlyteAssertion: Failed to put data from /tmp/flyte-gnaguqof/sandbox/local_flytekit/engine_dir to <s3://flyte-bucket/metadata/propeller/flytetester-development-a9xg68jktjbhd7nz7sc7/n0/data/0> (recursive=True).
Original exception: Called process exited with error code: 1. Stderr dump:
b'upload failed: ../../tmp/flyte-gnaguqof/sandbox/local_flytekit/engine_dir/error.pb to <s3://flyte-bucket/metadata/propeller/flytetester-development-a9xg68jktjbhd7nz7sc7/n0/data/0/error.pb> An error occurred (AccessDenied) when calling the PutObject operation: Access Denied\n'
the flyte-user-role has AmazonS3FullAccess, I verified that the failing pod has the env var AWS_ROLE_ARN
set to the flyte-user-role. And when I look inside the bucket, I can see that there's a <s3://flyfte-bucket/metadata/propeller/flytetester-development-a9xg68jktjbhd7nz7sc7/n0/data/inputs.pb>
So something is working correctly to put in objects but then fails? Not using minio either and docker image has awscli==1.25.94
Yash Kalode
10/20/2022, 4:04 PMAleksander Lempinen
10/26/2022, 8:33 AMAndrew Achkar
10/26/2022, 8:06 PMHanno Küpers
11/01/2022, 8:27 AMaccesskey
for storage. What is the recommended alternative to use the user-role for workflows? Inject a secret as default environment variable in the k8s plugin?Hanno Küpers
11/08/2022, 3:05 PMflytectl get projects
Error: Connection Info: [Endpoint: dns:///flyte.internal.n0q.eu, InsecureConnection?: true, AuthMode: ClientSecret]: rpc error: code = Unavailable desc = timed out waiting for server handshake
{"json":{},"level":"error","msg":"Connection Info: [Endpoint: dns:///flyte.internal.n0q.eu, InsecureConnection?: true, AuthMode: ClientSecret]: rpc error: code = Unavailable desc = timed out waiting for server handshake","ts":"2022-11-08T16:12:04+01:00"}
#~/.flyte/config.yaml
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///flyte.internal.eu
insecure: true
insecureSkipVerify: true
Edgar Trujillo
11/09/2022, 9:16 PM22/11/09 20:41:34 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/11/09 20:41:34 INFO SharedState: Warehouse path is 'file:/root/spark-warehouse'.
{"asctime": "2022-11-09 20:41:35,835", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "!! Begin System Error Captured by Flyte !!"}
{"asctime": "2022-11-09 20:41:35,835", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Traceback (most recent call last):\n\n File \"/usr/local/lib/python3.10/site-packages/flytekit/exceptions/scopes.py\", line 165, in system_entry_point\n return wrapped(*args, **kwargs)\n File \"/usr/local/lib/python3.10/site-packages/flytekit/core/base_task.py\", line 472, in dispatch_execute\n native_inputs = TypeEngine.literal_map_to_kwargs(exec_ctx, input_literal_map, self.python_interface.inputs)\n File \"/usr/local/lib/python3.10/site-packages/flytekit/core/type_engine.py\", line 800, in literal_map_to_kwargs\n return {k: TypeEngine.to_python_value(ctx, lm.literals[k], python_types[k]) for k, v in lm.literals.items()}\n File \"/usr/local/lib/python3.10/site-packages/flytekit/core/type_engine.py\", line 800, in <dictcomp>\n return {k: TypeEngine.to_python_value(ctx, lm.literals[k], python_types[k]) for k, v in lm.literals.items()}\n File \"/usr/local/lib/python3.10/site-packages/flytekit/core/type_engine.py\", line 764, in to_python_value\n return transformer.to_python_value(ctx, lv, expected_python_type)\n File \"/usr/local/lib/python3.10/site-packages/flytekit/types/pickle/pickle.py\", line 59, in to_python_value\n with open(uri, \"rb\") as infile:\n\nMessage:\n\n [Errno 2] No such file or directory: '/var/folders/n9/l_t6ghp503g_c3btld_vtrj80000gq/T/flyte-0u_yn2wu/raw/f307bbecb7ed78ceadfe26e17ce70f15/a9e378697a49254e4539ea4f69d37f2c'\n\nSYSTEM ERROR! Contact platform administrators."}
{"asctime": "2022-11-09 20:41:35,835", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "!! End Error Captured by Flyte !!"}
Any pointers on what is causing the error?Arshak Ulubabyan
11/10/2022, 10:51 AMflytectl config init
command, it seems to ignore all the options “inherited from parent commands”. For example, from the documentation ( https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_config_init.html ) I would assume calling flytectl config init --host=some.host --admin.clientSecretLocation=~/path/to/client/secret
would set the admin.clientSecretLocation
in the config file, but it only sets the host. Or am I doing something wrong? How should I initialise or update the flytectl config file?Jonathan Lamiel
11/11/2022, 7:05 PMXuan Hu
11/15/2022, 1:21 PMhelm template
and kubectl apply
on a self-hosted cluster since there are some restrictions that I can not use helm install
directly. Everything works well but I encountered Failed to fetch data
error when accessing /console
. I find a related issue but seems it maybe outdated and for sandbox mostly. Any comments or suggestions are welcome! BTW, the screenshot of error message is like the following:Hanno Küpers
11/17/2022, 1:28 PMpyflyte run --remote
is in the Running
state in the console, a workflow
is created but in propeller there is an error logged:
E1117 09:51:41.531924 1 workers.go:102] error syncing 'flytesnacks-development/fdc00f514c7f24074ba8': failed to update object (Update for <http://flyte.lyft.com/v1alpha1|flyte.lyft.com/v1alpha1>, Kind=FlyteWorkflow) managed fields: failed to update ManagedFields: failed to convert old object: <http://flyte.lyft.com/v1alpha1|flyte.lyft.com/v1alpha1>, Kind=FlyteWorkflow is unstructured and is not suitable for converting to ""
Do you know where that error is coming from? Sounds like the controller wants to update the workflow but something breaks. Could it be related to the cluster version?Mohit Talele
11/28/2022, 10:35 AMFredrik Lyford
11/30/2022, 12:12 PMkarthikraj
12/05/2022, 4:54 AMXuan Hu
12/14/2022, 1:12 PMflyte-core
helm chart on self-hosted kubernetes cluster but encounter certificate problem when trying to register a workflow remotely. The service is deployed with “Kubernetes Ingress Controller Fake Certificate” and all the ssl/tls related settings should be configured with default value of the template. I roughly looked through them, but did not find any obvious problem. BTW, the flyte console seems to work fine.
When I try to flytectl register
with client config admin.insecure: false
(the default value by flytectl config init
), it complains about
$ flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version latest
------------------------------------------------------------------ -------- ----------------------------------------------------
| NAME | STATUS | ADDITIONAL INFO |
------------------------------------------------------------------ -------- ----------------------------------------------------
| /tmp/register2617257857/0_flyte.workflows.example.say_hello_1.pb | Failed | Error registering file due to rpc error: code = |
| | | Unavailable desc = connection error: desc = |
| | | "transport: authentication handshake failed: x509: |
| | | "Kubernetes Ingress Controller Fake Certificate" |
| | | certificate is not trusted" |
------------------------------------------------------------------ -------- ----------------------------------------------------
1 rows
Error: Connection Info: [Endpoint: dns:///flyte.XXX.com, InsecureConnection?: false, AuthMode: Pkce]: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: x509: "Kubernetes Ingress Controller Fake Certificate" certificate is not trusted"
After changing the insecure
config to true
, the error message becomes
$ flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version latest
------------------------------------------------------------------ -------- ----------------------------------------------------
| NAME | STATUS | ADDITIONAL INFO |
------------------------------------------------------------------ -------- ----------------------------------------------------
| /tmp/register3222452968/0_flyte.workflows.example.say_hello_1.pb | Failed | Error registering file due to rpc error: code = |
| | | Unavailable desc = connection closed before server |
| | | preface received |
------------------------------------------------------------------ -------- ----------------------------------------------------
1 rows
Error: Connection Info: [Endpoint: dns:///flyte.XXX.com, InsecureConnection?: true, AuthMode: Pkce]: rpc error: code = Unavailable desc = connection closed before server preface received
Actually, I am not sure the problem is caused by inappropriate client config or server settings. So I suppose the first step is to check the GRPC service of flyte admin.
Just let me know if you have any comments. Thanks in advance.Nada Saiyed
12/14/2022, 4:28 PMLawrence Lee
12/16/2022, 6:36 PM$ kubectl get ingress -n flyte
NAME CLASS HOSTS ADDRESS PORTS AGE
flyte-core <none> * 80 26m
flyte-core-grpc <none> * 80 26m
Describing the ingress shows this
kubectl describe ingress -n flyte
Name: flyte-core
Namespace: flyte
Address:
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
*
/* ssl-redirect:use-annotation (<error: endpoints "ssl-redirect" not found>)
/console flyteconsole:80 (192.168.103.57:8080,192.168.128.109:8080)
/console/* flyteconsole:80 (192.168.103.57:8080,192.168.128.109:8080)
/api flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/api/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/healthcheck flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/v1/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/.well-known flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/.well-known/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/login flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/login/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/logout flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/logout/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/callback flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/callback/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/me flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/config flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/config/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/oauth2 flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
/oauth2/* flyteadmin:80 (192.168.110.232:8088,192.168.152.242:8088)
Annotations: <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>:
{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}
<http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: arn:aws:acm:us-west-2:582526512915:certificate/6c75c8f4-04a1-4aa7-81fa-59c7241e52ba
<http://alb.ingress.kubernetes.io/group.name|alb.ingress.kubernetes.io/group.name>: flyte
<http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: [{"HTTP": 80}, {"HTTPS":443}]
<http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internet-facing
<http://alb.ingress.kubernetes.io/tags|alb.ingress.kubernetes.io/tags>: service_instance=production
<http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
<http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 5m1s (x18 over 26m) ingress Failed deploy model due to InvalidParameter: 1 validation error(s) found.
- minimum field value of 1, CreateTargetGroupInput.Port.
Name: flyte-core-grpc
Namespace: flyte
Address:
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
*
/flyteidl.service.AdminService flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/flyteidl.service.AdminService/* flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/flyteidl.service.DataProxyService flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/flyteidl.service.DataProxyService/* flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/flyteidl.service.AuthMetadataService flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/flyteidl.service.AuthMetadataService/* flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/flyteidl.service.IdentityService flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/flyteidl.service.IdentityService/* flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/grpc.health.v1.Health flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
/grpc.health.v1.Health/* flyteadmin:81 (192.168.110.232:8089,192.168.152.242:8089)
Annotations: <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>:
{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}
<http://alb.ingress.kubernetes.io/backend-protocol-version|alb.ingress.kubernetes.io/backend-protocol-version>: GRPC
<http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: arn:aws:acm:us-west-2:582526512915:certificate/6c75c8f4-04a1-4aa7-81fa-59c7241e52ba
<http://alb.ingress.kubernetes.io/group.name|alb.ingress.kubernetes.io/group.name>: flyte
<http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: [{"HTTP": 80}, {"HTTPS":443}]
<http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internet-facing
<http://alb.ingress.kubernetes.io/tags|alb.ingress.kubernetes.io/tags>: service_instance=production
<http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
<http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
<http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 5m4s (x19 over 27m) ingress Failed deploy model due to InvalidParameter: 1 validation error(s) found.
- minimum field value of 1, CreateTargetGroupInput.Port.
Any ideas how to best debug?Matt Dupree
12/26/2022, 9:31 PMflyte-deps-contour-envoy
pod is stuck in a pending state when I try to deploy the sandbox env to a cloud kubernetes cluster w/ 4 nodes. I’m just following the docs.
I see that this has come up before here and here, but neither of the suggested solutions make sense for me (viz., I don’t want to deploy on kind and I’m not running an nginx pod that would conflict with countour/envoy.) Could I get some help?
Here’s the output of `k get pods -n flyte`:
○ → kubectl get pods -n flyte
NAME READY STATUS RESTARTS AGE
flyte-deps-contour-envoy-xp6x2 0/2 Pending 0 51m
flyte-deps-contour-envoy-v2tnd 0/2 Pending 0 51m
flyte-deps-contour-envoy-qjfp5 0/2 Pending 0 51m
flyte-deps-contour-envoy-bz2xj 0/2 Pending 0 51m
flyte-deps-kubernetes-dashboard-8b7d858b7-2gnk2 1/1 Running 0 51m
minio-7c99cbb7bd-bczp4 1/1 Running 0 51m
postgres-7b7dd4b66-n2w8g 1/1 Running 0 51m
flyte-deps-contour-contour-cd4d956d9-tz82c 1/1 Running 0 51m
syncresources-6fb7586cb-szrjx 1/1 Running 0 49m
flytepropeller-585fb99968-7bc9c 1/1 Running 0 49m
datacatalog-7875898bf8-zdd6n 1/1 Running 0 49m
flyteconsole-5667f8f975-q5j7b 1/1 Running 0 49m
flyte-pod-webhook-8669764d6-8xsjx 1/1 Running 0 49m
flyteadmin-649d4df4b-sk9px 1/1 Running 0 49m
flytescheduler-9bdf8bf84-frn9r 1/1 Running 0 49m
And here’s the logs for one of the pending pods:
○ → kubectl describe pods flyte-deps-contour-envoy-xp6x2 -n flyte
Name: flyte-deps-contour-envoy-xp6x2
Namespace: flyte
Priority: 0
Node: <none>
Labels: <http://app.kubernetes.io/component=envoy|app.kubernetes.io/component=envoy>
<http://app.kubernetes.io/instance=flyte-deps|app.kubernetes.io/instance=flyte-deps>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=contour|app.kubernetes.io/name=contour>
controller-revision-hash=67bdb7bd55
<http://helm.sh/chart=contour-7.10.1|helm.sh/chart=contour-7.10.1>
pod-template-generation=1
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/flyte-deps-contour-envoy
Init Containers:
envoy-initconfig:
Image: <http://docker.io/bitnami/contour:1.20.1-debian-10-r53|docker.io/bitnami/contour:1.20.1-debian-10-r53>
Port: <none>
Host Port: <none>
Command:
contour
Args:
bootstrap
/config/envoy.json
--xds-address=flyte-deps-contour
--xds-port=8001
--resources-dir=/config/resources
--envoy-cafile=/certs/ca.crt
--envoy-cert-file=/certs/tls.crt
--envoy-key-file=/certs/tls.key
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 10m
memory: 50Mi
Environment:
CONTOUR_NAMESPACE: flyte (v1:metadata.namespace)
Mounts:
/admin from envoy-admin (rw)
/certs from envoycert (ro)
/config from envoy-config (rw)
Containers:
shutdown-manager:
Image: <http://docker.io/bitnami/contour:1.20.1-debian-10-r53|docker.io/bitnami/contour:1.20.1-debian-10-r53>
Port: <none>
Host Port: <none>
Command:
contour
Args:
envoy
shutdown-manager
Liveness: http-get http://:8090/healthz delay=120s timeout=5s period=20s #success=1 #failure=6
Environment: <none>
Mounts:
/admin from envoy-admin (rw)
envoy:
Image: <http://docker.io/bitnami/envoy:1.21.1-debian-10-r55|docker.io/bitnami/envoy:1.21.1-debian-10-r55>
Ports: 8080/TCP, 8443/TCP, 8002/TCP
Host Ports: 80/TCP, 443/TCP, 0/TCP
Command:
envoy
Args:
-c
/config/envoy.json
--service-cluster $(CONTOUR_NAMESPACE)
--service-node $(ENVOY_POD_NAME)
--log-level info
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 10m
memory: 50Mi
Liveness: http-get http://:8002/ready delay=120s timeout=5s period=20s #success=1 #failure=6
Readiness: http-get http://:8002/ready delay=10s timeout=1s period=3s #success=1 #failure=3
Environment:
CONTOUR_NAMESPACE: flyte (v1:metadata.namespace)
ENVOY_POD_NAME: flyte-deps-contour-envoy-xp6x2 (v1:metadata.name)
Mounts:
/admin from envoy-admin (rw)
/certs from envoycert (rw)
/config from envoy-config (rw)
Conditions:
Type Status
PodScheduled False
Volumes:
envoy-admin:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
envoy-config:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
envoycert:
Type: Secret (a volume populated by a Secret)
SecretName: envoycert
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/disk-pressure:NoSchedule|node.kubernetes.io/disk-pressure:NoSchedule> op=Exists
<http://node.kubernetes.io/memory-pressure:NoSchedule|node.kubernetes.io/memory-pressure:NoSchedule> op=Exists
<http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists
<http://node.kubernetes.io/pid-pressure:NoSchedule|node.kubernetes.io/pid-pressure:NoSchedule> op=Exists
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists
<http://node.kubernetes.io/unschedulable:NoSchedule|node.kubernetes.io/unschedulable:NoSchedule> op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 56m default-scheduler 0/4 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match Pod's node affinity/selector.
Warning FailedScheduling 54m default-scheduler 0/4 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match Pod's node affinity/selector.
Alex Beatson
12/28/2022, 2:00 AMFelix Ruess
01/09/2023, 11:12 AMAndrew Achkar
01/13/2023, 9:06 PMXuan Hu
01/29/2023, 7:53 AMCSRF_TOKEN_VALIDATION_FAILED
, any clues on how to debug the problem?
The flyte is deployed on self-hosted k8s cluster with an internal domain name and HTTP access (not HTTPS) using template generated by flyte-core
helm chart (helm template flyte-core -f values.yaml
). The OIDC seems to work (After a fresh deployment, when opening flyte console, it will redirect to the gitlab authorization page, and after approval it can redirect back to flyte console page with username show on the top-right corner), but when I try to register workflow according to the tutorial [1] with command flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version v1
, the gitlab authorization page is prompted successfully but when redirect back to <http://flyte.example.com/callback|flyte.example.com/callback>
page, it complains about 401
error. And the log of flyteadmin
shows something like
{
"json": {},
"level": "error",
"msg": "Invalid CSRF token cookie [CSRF_TOKEN_VALIDATION_FAILED] CSRF token does not match state 2r4rcd3npg, 3237e1083ec0ae2bd20acbe8a5817d18475faaee5a060d2184ab7ffddd151290 vs OXpoczQyanRxcW43c3hnZ3RjbnBnZjZrNnptMnA2dDY",
"ts": "2023-01-29T06:43:34Z"
}
There are several small questions that might be relevant:
1. In the auth doc [2], there is a comment for redirectUri: <http://localhost:53593/callback>
saying that This should not change
, but I suppose it is for sandbox deployment so I changed it to the domain name something like <http://flyte.example.com/callback>
. Otherwise, it will complains about invalid redirect uri when authorize on gitlab page.
2. For the scopes
, according to the gitlab doc [3], I only set it to read_user
.
3. Any approach to show more logs for flytectl
CLI? I tried to set the log level to 0
, but it does not show anything.
4. Any approach to show more logs for flyteadmin
service? I found there is a flyteadmin.extraArgs
in the template but does not know how to inject --logger.level 0
.
The relevant values.yaml
for auth is shown as following (with some sensitive info masked):
configmap:
adminServer:
auth:
appAuth:
authServerType: External
externalAuthServer:
metadataUrl: .well-known/openid-configuration
thirdPartyConfig:
flyteClient:
clientId: <client_id generated by gitlab>
redirectUri: <http://flyte.example.com/callback>
scopes:
- read_user
authorizedUris:
- <http://flyte.example.com>
- <http://flyteadmin:80>
- <http://flyteadmin.flyte.svc.cluster.local:80>
userAuth:
openId:
baseUrl: <https://git.example.com>
clientId: <client_id generated by gitlab>
scopes:
- openid
server:
security:
useAuth: true
flyteadmin:
secrets:
oidc_client_secret: <client_secret generated by gitlab>
secrets:
adminOauthClientCredentials:
clientId: <client_id generated by gitlab>
clientSecret: <client_secret generated by gitlab>
[1] https://docs.flyte.org/projects/cookbook/en/latest/auto/larger_apps/larger_apps_deploy.html#build-deploy-your-application-to-the-cluster
[2] https://docs.flyte.org/en/latest/deployment/cluster_config/auth_setup.html
[3] https://docs.gitlab.com/ee/integration/oauth_provider.html#view-all-authorized-applicationsHampus Rosvall
02/15/2023, 8:38 AMFerdinand von den Eichen
02/16/2023, 2:46 PM/var/run/credentials
in the sync-cluster-resources
init container.
• syncresources also became unhealthy, because the secret was not being mounted to the service at all (as far as we could tell)
That being said, after solving both steps manually we eventually saw some life signs on our data plane cluster. However the flytepropeller
service on the data plane cluster didn’t seem to be able to reconcile the workflow correctly, instead referencing a non existing flyteadmin service.
E0216 14:41:48.783203 1 workers.go:102] error syncing 'flytesnacks-development/f9f0589cf97c949c892f': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup flyteadmin on 10.100.0.10:53: no such host"]
Is there a full working example? What are we missing? Or would anyone from the community be up for an exchange, if you have succeeded to do the multi cluster setup?
For final background:
• We aim for the multi cluster setup for operational reasons, but mostly for data isolation. That way we can isolate customer data on a cluster and AWS account level.