I'm working to add some RBAC to flyte admin in our...
# contribute
c
I'm working to add some RBAC to flyte admin in our fork and I couldn't help but notice all the API handlers with
defer m.interceptPanic(ctx, request)
. Seems like this should be middleware, any reason why it isn't?
Also, I don't think the requests can be nil? I thought they were pointers just because of gRPC and some locking inside the struct
And shouldn't these be OTEL RPC server metrics?
Copy code
m.Metrics.projectEndpointMetrics.register.Time(func() {
		response, err = m.ProjectManager.CreateProject(ctx, *request)
	})
Ah I see, the gRPC prometheus metrics don't record durations.. But OTEL metrics would here..
There is also error handling here that should be in middleware. Almost all of the following code is boilerplate that should be in middleware..
Copy code
func (m *AdminService) RegisterProject(ctx context.Context, request *admin.ProjectRegisterRequest) (
	*admin.ProjectRegisterResponse, error) {
	defer m.interceptPanic(ctx, request)
	if request == nil {
		return nil, status.Errorf(codes.InvalidArgument, "Incorrect request, nil requests not allowed")
	}
	var response *admin.ProjectRegisterResponse
	var err error
	m.Metrics.projectEndpointMetrics.register.Time(func() {
		response, err = m.ProjectManager.CreateProject(ctx, *request)
	})
	if err != nil {
		return nil, util.TransformAndRecordError(err, &m.Metrics.projectEndpointMetrics.register)
	}

	return response, nil
}
f
so Flyte started before OTEL existed šŸ˜„
changing all of this might be dangerous, as a lot of folks have built dashboards
c
Yeah, I think the panic can move to middleware pretty safely tho
@freezing-airport-6809 Are you open to running OTEL middleware on the servers as part of a potential migration path? By default is will noop like it does for tracing since no exporters are configured. Happy to add/migrate the dashboards too. I understand that changing this stuff is obviously non trivial since folks build alerting infrastructure off of it and these might not be where you want to invest engineering resources.
f
@clean-glass-36808 honestly this is not just my decisions. There are 3000+ companies using Flyte, not sure, if this will work for all of them
šŸ‘ 1
a
@clean-glass-36808 a lot of the custom service method handling indeed ought to be cleaned up and move to middleware - would you have any interest in doing that (or maybe even just filing a housekeeping issue šŸ˜„)
f
@acceptable-policeman-57188 / @clean-glass-36808 wanted to help with a lot of improvements in the cleaning part - some places I had to ask him to not do so, just to ensure backwards compatibility, but I think you folks should discuss
a
I think we should be careful when it comes to backwards compatibility for metrics, like you said, but wrapping common patterns (the defer call to handle panics) can definitely be introduced as middleware rather than copy/paste everywhere
c
@acceptable-policeman-57188 Yeah I'm happy to make changes. I'm a platform engineer so middleware is part of my jam. I can look at each item separately. • Panic handling: tied to a global metric so should be do-able, just need to figure out the plumbing. • server RPC handler duration histograms: would be nice to auto instrument this with OTEL but that would require a deeper discussion with the community regarding migration path • transformAndRecordError: tied to a global metric so should be do-able, just need to figure out the plumbing. • nil request handling: I can possibly clean that up separately and fix the non-pointer warnings, the code is quite inconsistent about this
ā¤ļø 1