Performance of function in protocol extension vs in conforming types

I have a function a implemented in a protocol extension that's called millions of times per second. It in turn calls a function b which is required by the protocol which does not have an implementation in the extension. According to the Time Profiler in Instruments, function a spends a lot of time in __swift_instantiateGenericMetadata.

I get a big performance bump by moving a out of the extension and re-implementing it identically in each type that conforms to the protocol. Is there any way to get the compiler to do this itself? Do I need to write a macro to do it for me? There are screenshots from Instruments illustrating the issue below.

Thanks!

These traces are made a tiny bit more confusing because the real names of a and b are the same: read(at:). (They take different types as their parameters.)

Here's the Time Profiler trace of the protocol extension implementation:

And here's the trace for the duplicated-in-each-type version:

Performance of function in protocol extension vs in conforming types
 
 
Q