Hi Everyone. I'm new to Flyte so please bear with ...
# ask-the-community
g
Hi Everyone. I'm new to Flyte so please bear with me.. I've started out with this tutorial: https:// docs.flyte.org/en/v1.0.0/getting_started/index.html all fine so far, I can run the example with the --remote flag. Next I would like to run the example on a remote Kubernetes cluster hosted on AWS. I ran through all steps here: https://docs.flyte.org/en/v1.0.0/deployment/aws/manual.html which seems to be all successful kubectl -n flyte get ingress NAME CLASS HOSTS ADDRESS PORTS AGE flyte-core <none> * k8s-flyte-123456789.eu-west-1.elb.amazonaws.com 80 17h flyte-core-grpc <none> * k8s-flyte-123456789-721009295.eu-west-1.elb.amazonaws.com 80 17h I updated the config.yaml file to point to the ingress endpoints: cat ~/.flyte/config.yaml admin: # For GRPC endpoints you might want to use dns:///flyte.myexample.com endpoint: dns:///k8s-flyte-123456789.eu-west-1.elb.amazonaws.com insecureSkipVerify: true authType: Pkce insecure: true logger: show-source: true level: 6 I also updated my environment variables to point to the config.yaml _echo $FLYTECTL_CONFIG_ /home/my-username/.flyte/config.yaml echo $KUBECONFIG /home/my username/.kube/config/home/gajus/.flyte/k3s/k3s.yaml Here’s the problem, when I run: _FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2_ I get the following error: _FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2_ {"asctime": "2023-09-15 080754,296", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/my-username/.flyte/config.yaml"} {"asctime": "2023-09-15 080754,298", "name": "flytekit", "levelname": "INFO", "message": "Setting protocol to file"} {"asctime": "2023-09-15 080754,303", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/my-username/.flyte/config.yaml"} {"asctime": "2023-09-15 080754,304", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/my-username/.flyte/config.yaml"} {"asctime": "2023-09-15 080754,630", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."} {"asctime": "2023-09-15 080754,641", "name": "flytekit", "levelname": "INFO", "message": "Setting protocol to file"} {"asctime": "2023-09-15 080754,641", "name": "flytekit", "levelname": "INFO", "message": "Setting protocol to file"} {"asctime": "2023-09-15 080754,642", "name": "flytekit", "levelname": "INFO", "message": "Setting protocol to file"} {"asctime": "2023-09-15 080754,642", "name": "flytekit", "levelname": "INFO", "message": "Setting protocol to file"} {"asctime": "2023-09-15 080754,643", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"} {"asctime": "2023-09-15 080754,684", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/my-username/.flyte/config.yaml"} _{"asctime": "2023-09-15 080754,685", "name": "flytekit.cli", "levelname": "INFO", "message": "Creating remote with config Config(platform=PlatformConfig(endpoint='k8s-flyte-123456789.eu-west-1.elb.amazonaws.com', insecure=True, insecure_skip_verify=True, console_endpoint=None, command=None, client_id=None, client_credentials_secret=None, scopes=[], auth_mode='Pkce'), secrets=SecretsConfig(env_prefix='_FSEC_', default_dir='/etc/secrets', file_prefix=''), stats=StatsConfig(host='localhost', port=8125, disabled=False, disabled_tags=False), data_config=DataConfig(s3=S3Config(enable_debug=False, endpoint=None, retries=3, backoff=datetime.timedelta(seconds=5), access_key_id=None, secret_access_key=None), gcs=GCSConfig(gsutil_parallelism=False)), local_sandbox_path='/tmp/flyteysnmzvj4')"}_ {"asctime": "2023-09-15 080754,884", "name": "flytekit.cli", "levelname": "INFO", "message": "Flyte Client configured -> k8s-flyte-123456789.eu-west-1.elb.amazonaws.com in insecure mode."} _{"asctime": "2023-09-15 080754,886", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses; last error: INTERNAL: Trying to connect an http1.x server\"\n\tdebug_error_string = \"UNKNOWN:Failed to pick subchannel {created_time:\"2023-09-15T080754.8865895+02:00\", children[UNKNOWNfailed to connect to all addresses; last error: INTERNAL: Trying to connect an http1.x server {grpc_status:14, created_time:\"2023-09-15T080754.8865848+02:00\"}]}\"\n>, sleeping 200ms and retrying"}_ _{"asctime": "2023-09-15 080755,087", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses; last error: INTERNAL: Trying to connect an http1.x server\"\n\tdebug_error_string = \"UNKNOWN:Failed to pick subchannel {created_time:\"2023-09-15T080755.0874841+02:00\", children[UNKNOWNfailed to connect to all addresses; last error: INTERNAL: Trying to connect an http1.x server {grpc_status:14, created_time:\"2023-09-15T080755.0874779+02:00\"}]}\"\n>, sleeping 400ms and retrying"}_ Traceback (most recent call last): File "/home/my-username/miniconda3/envs/flyte/bin/pyflyte", line 8, in <module> sys.exit(main()) _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1130, in __call___ return self.main(*args, **kwargs) File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1657, in invoke _return _process_result(sub_ctx.command.invoke(sub_ctx))_ File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1657, in invoke _return _process_result(sub_ctx.command.invoke(sub_ctx))_ File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1657, in invoke _return _process_result(sub_ctx.command.invoke(sub_ctx))_ File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/click/core.py", line 760, in invoke _return _callback(*args, **kwargs) _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 529, in run _remote_entity = remote.register_script(_ _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/remote/remote.py", line 671, in register_script_ _upload_location, md5_bytes = fast_register_single_script(_ _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 111, in fast_register_single_script_ _upload_location = create_upload_location_fn(content_md5=md5)_ _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url_ _return super(SynchronousFlyteClient, self).create_upload_location(_ File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler return fn(*args, **kwargs) _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/flytekit/clients/raw.py", line 856, in create_upload_location_ _return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self.metadata) _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call___ _return _end_unary_response_blocking(state, call, False, None)_ _File "/home/my-username/miniconda3/envs/flyte/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking_ _raise InactiveRpcError(state) _grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:_ status = StatusCode.UNAVAILABLE details = "failed to connect to all addresses; last error: INTERNAL: Trying to connect an http1.x server" _debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2023-09-15T080755.488686825+02:00", children[UNKNOWNfailed to connect to all addresses; last error: INTERNAL: Trying to connect an http1.x server {created_time:"2023-09-15T080755.4886694+02:00", grpc_status:14}]}"_ This seems to have something to do with gRPC not liking my self-signed certificate. Does anyone have any idea’s on how to fix this? Any help would be greatly appreciated (P.s. I am running pyflyte on WSL Ubuntu 20.04)
g
hey thanks for your reply, i think i read that (or similar) i have that flag on true:
Copy code
insecureSkipVerify
and still run into the issue
any other suggestions?
d
try removing that flag, with
insecure :True
it doesn't do much
g
Hi I have removed that flag. In addition I think the company vpn / proxy was messing with the certificate. I have disabled the service. Now I am getting the following error:
debug_error_string = "UNKNOWN:Error received from peer ipv499.81.150.83443 {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: ValidationError: Request ARN is invalid\n\tstatus code: 400, request id: 9d075706-2881-4b4b-9554-53eb3f30a8e6", grpc_status:2, created_time:"2023-10-27T153315.9558153+02:00"}"
Any suggestions?
d
@gajus is this certificate generated by ACM or is it imported?
g
Hi David, I have imported the certificate as per instructions here: https://docs.flyte.org/en/v1.0.0/deployment/aws/manual.html
so this req:
[req] distinguished_name = req_distinguished_name x509_extensions = v3_req prompt = no [req_distinguished_name] C = US ST = WA L = Seattle O = Flyte OU = IT CN = flyte.example.org emailAddress = dummyuser@flyte.org [v3_req] keyUsage = keyEncipherment, dataEncipherment extendedKeyUsage = serverAuth subjectAltName = @alt_names [alt_names] DNS.1 = flyte.example.org
I have change the CN and DNS to the flyte endpoint i get from: kubectl -n flyte get ingress
before I changed the CN i was getting Peer name is not in peer certificate
my config is as follows:
admin: # For GRPC endpoints you might want to use dns:///flyte.myexample.com endpoint: dns:///k8s-flyte-1234567890-1234567890.eu-west-1.elb.amazonaws.com insecureSkipVerify: true authType: Pkce logger: show-source: true level: 6 storage: kind: s3 config: auth_type: iam region: eu-west-1 container: my_bucket_name
d
sorry, I think I made a mistake on my previous advice. Try:
insecure: True
insecureSkipVerify: false
Trying to connect an http1.x server
is typically tied to an SSL mismatch
g
Hi David, thanks for the response. I think Im just going to issue an official certificate. The self signed cert is giving me a headache
Ill report back on the outcome
have a nice weekend
d
sure, let us know if you need anything else.
g
just a quick question regarding this how-to: https://docs.flyte.org/en/v1.0.0/deployment/aws/manual.html
Ive started from scratch again and every time I do I end up with this command:
kubectl -n flyte get ingress
showing NAME CLASS HOSTS ADDRESS PORTS AGE flyte-core <none> * 80 23m flyte-core-grpc <none> * 80 23m
also there are no loadbalancers in AWS, previous time Ive managed to fix it but not really sure how I did it, what could be the issue?
d
@gajus is the AWS LB controller running? that's the component that should reconcile your config into an actual LB. The guide you mention is old, we have 2 resources for AWS deployment: manual and automated with Terraform. Both of them use the ALB Ingress Controller
g
thanks david, im indeed stuggling with the outdated docs. It also mentions opta which is also not supported. Let me review your documents
d
Sorry for that
g
your document already looks promising
Hi David, just an update. Your documents are really useful. Thanks. Ran into the same issue as here: https://discuss.flyte.org/t/14151589/hi-u04h6uue78b-i-ve-followed-the-eks-deployment-guide-https-
I changed the port from 8088 to 8089 which worked. May I ask why that fixed the issue?
d
Hi @gajus I think the default config.yaml requires an update. These days,
flyte-binary
exposes http(UI) and grpc(backend) using two different services.
8089
is for the backend so it should be the default port in the
config
file
g
Hi David, the docs were great, thanks. Trying to change the Auth to Auth0 now but everything else went like a charm.
I did find a few minor typo's, created a branch and am happy to create a PR however im not allowed to push
my GH handle is gdirkzwager
d
@gajus thank you!
im not allowed to push
This is strange. In the meantime, just added you to the repo. Feel free to improve
g
I sure will
just one more question (i hope)
image.png
any idea why
the signed urls are invalid?
i didnt get the issue while port forwarding
d
> any idea why not sure. So it only happens if you connect through Ingress?
g
let me try through port forwarding. Do notice that the task never finish though but havent looked into that
no now also though port forwarding. I did end up doing the okta config. Let me run through that to see if I made any mistakes
Hi @David Espejo (he/him) I started playing around with it again. To recap Im getting this error: pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2 Running Execution on Remote. 127.0.0.1 - - [10/Nov/2023 184313] "GET /callback?code=o3---------&state=NMNX1wRdsYo-1TLaZKMIBUkghfDYy7I6dE4JtIRX0NN6JG2TsgSliQ HTTP/1.1" 200 - Failed with Exception Code: USER:ValueError Value error! Received: 403. Request to send data: my-signed-url-for-s3 So I ended up fixing this by adding S3 full access permission on the EKS node role: https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/01-eks-permissions.md#eks-node-role Should this be added as the docs don't include it and I dont remember seeing it on the official flyte docs either
d
Hey @gajus So, the policy for the NodeGroup shouldn't be that permissive. Here we share the minimum permissions needed: https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#prerequisites
but it's true that this should be better placed in the docs. Let me know if that config works for you
g
yeah great, that works fine
sorry to to bother you so much, really appreciate the help
something I have been ignoring so far is that the job status actually doesnt change:
I haven't really delved into it so not sure if the jobs are finished but status just not read from the db or that the jobs dont even start as there might be a problem with the scheduler. do you have any pointers here?
d
can you get logs from the flyte-binary pod? like
kubectl logs -n <your-namespace> <flyte-pod-name>
g
{"json":{"src":"controller.go:159"},"level":"info","msg":"==\u003e Enqueueing workflow [flytesnacks-development/f4a3d9e2ee74b40bd907]","ts":"2023-11-10T181156Z"} {"json":{"exec_id":"f4a3d9e2ee74b40bd907","ns":"flytesnacks-development","routine":"worker-18","src":"passthrough.go:101"},"level":"debug","msg":"Updated workflow.","ts":"2023-11-10T181156Z"} {"json":{"exec_id":"f4a3d9e2ee74b40bd907","ns":"flytesnacks-development","routine":"worker-18","src":"handler.go:366"},"level":"info","msg":"Completed processing workflow.","ts":"2023-11-10T181156Z"} E1110 181156.970252 1 workers.go:102] error syncing 'flytesnacks-development/f4a3d9e2ee74b40bd907': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: oauth2: cannot fetch token: 401 Unauthorized Response: {"errorCode":"invalid_client","errorSummary":"Invalid value for 'client_id' parameter.","errorLink":"invalid_client","errorId":"oaewLxuDQlSTlqBfLY5AaYGAA","errorCauses":[]}] {"json":{"src":"cookie.go:88"},"level":"debug","msg":"Existing [flyte_idt] cookie found","ts":"2023-11-10T181157Z"} {"json":{"src":"cookie.go:88"},"level":"debug","msg":"Existing [flyte_at] cookie found","ts":"2023-11-10T181157Z"} {"json":{"src":"cookie.go:79"},"level":"info","msg":"Could not detect existing cookie [flyte_rt]. Error: http: named cookie not present","ts":"2023-11-10T181157Z"} {"json":{"src":"cookie_manager.go:76"},"level":"info","msg":"Refresh token doesn't exist or failed to read it. Ignoring this error. Error: [EMPTY_OAUTH_TOKEN] Failure to retrieve cookie [flyte_rt], caused by: http: named cookie not present","ts":"2023-11-10T181157Z"}
seems to be a permission issuie
d
so are you enabling
auth
? it's complaining about an invalid
client_id
sorry to to bother you so much, really appreciate the help
no need to sorry, happy to help where I can!
g
yeah im enabling auth, ive double checked the okta settings. let me tripple check..
d
ok, but are you using the internal auth server or the External from Okta?
d
so, I guess when you open the UI it prompts you for Okta authentication right?
g
yes that all goes well
d
ok what about CLI. Does
flytectl get projects
work?
I don't think so
g
flytectl get projects {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2023-11-10T194624+01:00"} {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [root] updated. No update handler registered.","ts":"2023-11-10T194624+01:00"} {"json":{"src":"viper.go:400"},"level":"debug","msg":"Config section [admin] updated. Firing updated event.","ts":"2023-11-10T194624+01:00"} {"json":{"src":"client.go:63"},"level":"info","msg":"Initialized Admin client","ts":"2023-11-10T194624+01:00"} {"json":{"src":"auth_interceptor.go:86"},"level":"debug","msg":"Request failed due to [rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken]. If it's an unauthenticated error, we will attempt to establish an authenticated context.","ts":"2023-11-10T194624+01:00"} {"json":{"src":"auth_interceptor.go:91"},"level":"debug","msg":"Request failed due to [Unauthenticated]. Attempting to establish an authenticated connection and trying again.","ts":"2023-11-10T194624+01:00"} {"json":{"src":"token_source_provider.go:151"},"level":"warning","msg":"Failed fetching from cache. Will restart the flow. Error: no token found in the cache","ts":"2023-11-10T194624+01:00"} {"json":{"src":"auth_flow_orchestrator.go:77"},"level":"info","msg":"Opening the browser at <okta login url> {"json":{"src":"auth_flow_orchestrator.go:93"},"level":"error","msg":"unable to save the new token due to. Will ignore the error and use the issued token. Error: unable to save token. Error: The name org.freedesktop.secrets was not provided by any .service files","ts":"2023-11-10T194627+01:00"} {"json":{"src":"project.go:102"},"level":"debug","msg":"Retrieved 1 projects","ts":"2023-11-10T194627+01:00"} ------------- ------------- ------------------------- | ID | NAME | DESCRIPTION | ------------- ------------- ------------------------- | flytesnacks | flytesnacks | flytesnacks description | ------------- ------------- -------------------------
image.png
so its successful but its not😄
so im running it on WSL, might be an issue there: https://stackoverflow.com/questions/72528100/how-to-unlock-gnome-keyring-on-debian-headless-wsl-2-and-make-it-work-in-pytho let me have a look. Ill let you know