I m working to add some RBAC to flyte admin in our fork and Flyte #contribute

I'm working to add some RBAC to flyte admin in our...

clean-glass-36808

06/24/2024, 9:39 PM

I'm working to add some RBAC to flyte admin in our fork and I couldn't help but notice all the API handlers with

defer m.interceptPanic(ctx, request)

. Seems like this should be middleware, any reason why it isn't?

clean-glass-36808

06/24/2024, 9:42 PM

Also, I don't think the requests can be nil? I thought they were pointers just because of gRPC and some locking inside the struct

clean-glass-36808

06/24/2024, 9:46 PM

And shouldn't these be OTEL RPC server metrics?

Copy code

m.Metrics.projectEndpointMetrics.register.Time(func() {
		response, err = m.ProjectManager.CreateProject(ctx, *request)
	})

clean-glass-36808

06/24/2024, 9:54 PM

Ah I see, the gRPC prometheus metrics don't record durations.. But OTEL metrics would here..

clean-glass-36808

06/24/2024, 9:56 PM

There is also error handling here that should be in middleware. Almost all of the following code is boilerplate that should be in middleware..

Copy code

func (m *AdminService) RegisterProject(ctx context.Context, request *admin.ProjectRegisterRequest) (
	*admin.ProjectRegisterResponse, error) {
	defer m.interceptPanic(ctx, request)
	if request == nil {
		return nil, status.Errorf(codes.InvalidArgument, "Incorrect request, nil requests not allowed")
	}
	var response *admin.ProjectRegisterResponse
	var err error
	m.Metrics.projectEndpointMetrics.register.Time(func() {
		response, err = m.ProjectManager.CreateProject(ctx, *request)
	})
	if err != nil {
		return nil, util.TransformAndRecordError(err, &m.Metrics.projectEndpointMetrics.register)
	}

	return response, nil
}

freezing-airport-6809

06/25/2024, 1:12 AM

so Flyte started before OTEL existed 😄

freezing-airport-6809

06/25/2024, 1:13 AM

changing all of this might be dangerous, as a lot of folks have built dashboards

clean-glass-36808

06/25/2024, 2:22 AM

Yeah, I think the panic can move to middleware pretty safely tho

clean-glass-36808

06/26/2024, 7:54 PM

@freezing-airport-6809 Are you open to running OTEL middleware on the servers as part of a potential migration path? By default is will noop like it does for tracing since no exporters are configured. Happy to add/migrate the dashboards too. I understand that changing this stuff is obviously non trivial since folks build alerting infrastructure off of it and these might not be where you want to invest engineering resources.

freezing-airport-6809

06/27/2024, 12:18 AM

@clean-glass-36808 honestly this is not just my decisions. There are 3000+ companies using Flyte, not sure, if this will work for all of them

👍 1

acceptable-policeman-57188

07/05/2024, 9:12 AM

@clean-glass-36808 a lot of the custom service method handling indeed ought to be cleaned up and move to middleware - would you have any interest in doing that (or maybe even just filing a housekeeping issue 😄)

freezing-airport-6809

07/06/2024, 6:41 PM

@acceptable-policeman-57188 / @clean-glass-36808 wanted to help with a lot of improvements in the cleaning part - some places I had to ask him to not do so, just to ensure backwards compatibility, but I think you folks should discuss

acceptable-policeman-57188

07/08/2024, 10:25 AM

I think we should be careful when it comes to backwards compatibility for metrics, like you said, but wrapping common patterns (the defer call to handle panics) can definitely be introduced as middleware rather than copy/paste everywhere

clean-glass-36808

07/08/2024, 4:20 PM

@acceptable-policeman-57188 Yeah I'm happy to make changes. I'm a platform engineer so middleware is part of my jam. I can look at each item separately. • Panic handling: tied to a global metric so should be do-able, just need to figure out the plumbing. • server RPC handler duration histograms: would be nice to auto instrument this with OTEL but that would require a deeper discussion with the community regarding migration path • transformAndRecordError: tied to a global metric so should be do-able, just need to figure out the plumbing. • nil request handling: I can possibly clean that up separately and fix the non-pointer warnings, the code is quite inconsistent about this

❤️ 1

21 Views

Open in Slack

Previous Next