https://flyte.org logo
#ask-the-community
Title
# ask-the-community
m

Matt Coan

07/24/2023, 4:24 PM
Somewhat new to the ins and outs of implementing Flyte and have been working through the "Flyte-the-hard-way" binary setup. I've gotten to the point of attempting the hello_world workflow but am getting the below errors when I attempt to execute it as per the instructions:
Copy code
~/.flyte/flytesnacks/examples/basics/basics (master ✔) ᐅ pyflyte run --remote hello_world.py my_wf
E0724 11:27:28.869290000 4558732800 <http://hpack_parser.cc:833]|hpack_parser.cc:833]>               Error parsing 'content-type' metadata: error=invalid value key=content-type
E0724 11:27:28.891411000 4558732800 <http://hpack_parser.cc:833]|hpack_parser.cc:833]>               Error parsing 'content-type' metadata: error=invalid value key=content-type
E0724 11:27:28.914052000 4558732800 <http://hpack_parser.cc:833]|hpack_parser.cc:833]>               Error parsing 'content-type' metadata: error=invalid value key=content-type
E0724 11:27:28.938491000 4558732800 <http://hpack_parser.cc:833]|hpack_parser.cc:833]>               Error parsing 'content-type' metadata: error=invalid value key=content-type
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNKNOWN
	details: Stream removed
	Debug string UNKNOWN:Error received from peer  {grpc_message:"Stream removed", grpc_status:2, created_time:"2023-07-24T11:27:28.938614-04:00"}
Didn't see anything upthread and was wondering if anyone has run into this before? The pod and ingress seem good. And here's my basic config.yaml . . .
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///<my flyte-backend-flyte-binary-grpc address> #Replace with your domain name
  authType: Pkce
  insecure: false
  insecureSkipVerify: true
logger:
  show-source: true
  level: 6
y

Yee

07/24/2023, 4:27 PM
can you pip show grpcio?
m

Matt Coan

07/24/2023, 4:28 PM
Sure thing . . .
Copy code
Name: grpcio
Version: 1.56.0
Summary: HTTP/2-based RPC framework
Home-page: <https://grpc.io>
Author: The gRPC Authors
Author-email: <mailto:grpc-io@googlegroups.com|grpc-io@googlegroups.com>
License: Apache License 2.0
Location: /usr/local/lib/python3.10/site-packages
Requires:
Required-by: flytekit, grpcio-status
y

Yee

07/24/2023, 4:33 PM
can you try downgrading to 1.54.x?
still trying to fully understand this particular hack_parser issue… but we’ve been seeing it come up randomly.
m

Matt Coan

07/24/2023, 4:34 PM
Sure, lemme give that a shot.
Downgraded to:
Copy code
Name: grpcio
Version: 1.54.0
Summary: HTTP/2-based RPC framework
Home-page: <https://grpc.io>
Author: The gRPC Authors
Author-email: <mailto:grpc-io@googlegroups.com|grpc-io@googlegroups.com>
License: Apache License 2.0
Location: /usr/local/lib/python3.10/site-packages
Requires:
Required-by: flytekit, grpcio-status
Still get the error but instead of
Copy code
stream removed
It's now a 404 . . .
Copy code
~/.flyte/flytesnacks/examples/basics/basics (master ✔) ᐅ pyflyte run --remote hello_world.py my_wf
E0724 12:47:06.765915000 4445138432 <http://hpack_parser.cc:866]|hpack_parser.cc:866]>               Error parsing metadata: error=invalid value key=content-type value=text/plain; charset=utf-8
E0724 12:47:06.792559000 4445138432 <http://hpack_parser.cc:866]|hpack_parser.cc:866]>               Error parsing metadata: error=invalid value key=content-type value=text/plain; charset=utf-8
E0724 12:47:06.815735000 4445138432 <http://hpack_parser.cc:866]|hpack_parser.cc:866]>               Error parsing metadata: error=invalid value key=content-type value=text/plain; charset=utf-8
E0724 12:47:06.840782000 4445138432 <http://hpack_parser.cc:866]|hpack_parser.cc:866]>               Error parsing metadata: error=invalid value key=content-type value=text/plain; charset=utf-8
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNIMPLEMENTED
	details: Received http2 header with status: 404
	Debug string UNKNOWN:Error received from peer  {grpc_message:"Received http2 header with status: 404", grpc_status:12, created_time:"2023-07-24T12:47:06.840918-04:00"}
y

Yee

07/24/2023, 5:46 PM
what backend version are you running?
unimplemented is usually a pretty simple thing to fix.
usually that means the back end needs to be upgraded.
m

Matt Coan

07/24/2023, 5:54 PM
Ah that would be great if I was just installing something stale and could upgrade . . .
Copy code
NAME         	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART              	APP VERSION
flyte-backend	flyte    	2       	2023-07-21 15:45:37.537995 -0400 EDT	deployed	flyte-binary-v1.7.0	1.16.0
In my case, I just took what the
Copy code
helm install flyte-backend flyteorg/flyte-binary --namespace flyte --values eks-starter.yaml --create-namespace
command gave me.
Then . . .
Copy code
helm upgrade flyte-backend flyteorg/flyte-binary -n flyte --values eks-production.yaml --install
y

Yee

07/24/2023, 5:57 PM
are there logs on the flyte pod side that look relevant?
m

Matt Coan

07/24/2023, 6:46 PM
I don't see anything that jumps out
Sorry, had to jump into meetings for a bit.
y

Yee

07/24/2023, 6:49 PM
we’re going to look into the parser issue later, but it still makes me uneasy to see that there.
could you try a couple other versions of grpcio - https://pypi.org/project/grpcio/#history
see if it persists.
the error logs… not the unimplemented bit.
m

Matt Coan

07/24/2023, 6:55 PM
Sure thing.
y

Yee

07/24/2023, 6:56 PM
for unimplemented i wonder what it’s hitting
run your command with
GRPC_VERBOSITY=debug GRPC_TRACE=all
then | grep tcp_posix
m

Matt Coan

07/24/2023, 6:56 PM
Yeah just found / set those
So, on grpcio 1.53.0 with the debug on, get a lot more detail however, not sure what to make of it . . .
y

Yee

07/24/2023, 7:11 PM
this is with the grep?
m

Matt Coan

07/24/2023, 7:13 PM
Ah no I didn't see it. running it with it now.
y

Yee

07/24/2023, 7:52 PM
can you confirm with a
flytectl get projects
command that you have access?
if you can list projects, then somehow your data proxy service isn’t being installed
m

Matt Coan

07/24/2023, 7:56 PM
Yeah, that's actually throwing errors too . . . flytectl get projects
Copy code
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2023-07-24T15:53:54-04:00"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [root] updated. No update handler registered.","ts":"2023-07-24T15:53:54-04:00"}
{"json":{"src":"viper.go:400"},"level":"debug","msg":"Config section [admin] updated. Firing updated event.","ts":"2023-07-24T15:53:54-04:00"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [files] updated. No update handler registered.","ts":"2023-07-24T15:53:54-04:00"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [console] updated. No update handler registered.","ts":"2023-07-24T15:53:54-04:00"}
{"json":{"src":"client.go:145"},"level":"warning","msg":"using insecureSkipVerify. Server's certificate chain and host name wont be verified. Caution : shouldn't be used for production usecases","ts":"2023-07-24T15:53:54-04:00"}
{"json":{"src":"client.go:63"},"level":"info","msg":"Initialized Admin client","ts":"2023-07-24T15:53:54-04:00"}
{"json":{"src":"auth_interceptor.go:86"},"level":"debug","msg":"Request failed due to [rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type \"text/plain; charset=utf-8\"]. If it's an unauthenticated error, we will attempt to establish an authenticated context.","ts":"2023-07-24T15:53:54-04:00"}
Error: rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type "text/plain; charset=utf-8"
{"json":{"src":"main.go:13"},"level":"error","msg":"rpc error: code = Unimplemented desc = unexpected HTTP status code received from server: 404 (Not Found); transport: received unexpected content-type \"text/plain; charset=utf-8\"","ts":"2023-07-24T15:53:54-04:00"}
y

Yee

07/24/2023, 7:56 PM
can you port forward the grpc port from admin to localhost and then try
m

Matt Coan

07/24/2023, 8:06 PM
So it does allow me to forward the port and sits there and listens. I point my ./flyte config to it:
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///127.0.0.1:8089 #Replace with your domain name
  authType: Pkce
  insecure: true
  insecureSkipVerify: true
logger:
  show-source: true
  level: 10
And if I try to execute the hello world workflow, I get . . .
Copy code
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.INTERNAL
	details: failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
	status code: 403, request id: 7b26a76e-215c-4492-80d7-62e458acc320
	Debug string UNKNOWN:Error received from peer ipv4:127.0.0.1:8089 {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 7b26a76e-215c-4492-80d7-62e458acc320", grpc_status:13, created_time:"2023-07-24T16:06:24.281082-04:00"}
But interestingly flytectl get projects works:
And it for sure handles the connection attempts locally,
y

Yee

07/25/2023, 12:22 AM
can you grant the admin role more perms?
m

Matt Coan

07/25/2023, 4:18 PM
Yeah I did try that originally, giving it full blown admin access temporarily to no avail. I've made a bit of progress this morning though. I stopped using the original cert and generated a new one and uploaded it to ACM. I grabbed the cert.out and updated my ./flyte/config to point to it:
Copy code
caCertFilePath: /Users/mattcoan/.flyte/crt.out
If I run the hello_world workflow with this cert in the mix / config, I get:
Copy code
SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))
However, I'm now able to skip SSL verification it (which I know isn't great) in my config:
Copy code
insecureSkipVerify: true
And with that set, can get the hello_world workflow to run:
Copy code
/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host '<http://at-scale-dev-eks-default.s3.us-east-2.amazonaws.com|at-scale-dev-eks-default.s3.us-east-2.amazonaws.com>'. Adding certificate verification is strongly advised. See: <https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings>
  warnings.warn(
Go to <https://at-scale-flyte-3b5b709126cf94935c4ba9ddb43b01.dev.embarkvet.com/console/projects/flytesnacks/domains/development/executions/f57456d6bba044e85a09> to see execution in the console.
So, at least I know that once I'm beyond auth, the mechanics behind everything else work. Just a matter of figuring out why it can't find this cert locally (I think). If you have any pointers in that direction, I'd love to hear them.
4 Views