OK, that's interesting. I only want to clarify that I'm using an ASMEDIA ASM2464PDX, which is mentioned as a USB4 device (no Thunderbolt Certification Logo).
Assuming you're talking about this product:
https://developer.apple.com/forums/thread/831333?page=1
"ASM2464PDX is a new generation of USB4/Thunderbolt to PCIe/NVMe Accessory controller based on ASMedia in-house designed PHYs."
...then I believe it's capable of functioning in either mode. More to the point, I believe you're building on IOPCIDevice, which means you're using Thunderbolt, not USB.
LPM policies in the USB4 controller
So, as a general comment, I'm not a huge fan of "ioreg", particularly when you start poking at its contents with tools like "grep". The core problem here is that the IORegistry is a tree of objects communicating with each other, which means the structure of the hierarchy is as important, if not more so, than the individual objects. Interacting with it as pure text tends to obscure that structure, making it easy to misunderstand or confuse what's going on.
I'll sometimes extract the full structure as XML using:
ioreg -la > <output file>
...but my actual preference is to use IORegistryExplorer.app, as I think it does a much better job of conveying what's actually going on. See this forum post for download instructions and general guidance on using it.
In terms of the two specific snippets you posted, there isn't enough context to know what you're looking at, but I don't think it's relevant.
First, the specs that you mentioned are well exceeded by modern audio cards. Many high-end brands achieve <2ms roundtrip latency, so the problem solution exists.
Sure. The number I posted was for "full" latency ("my mouth to your ear"), so the intermediate hardware needs to have a latency below that.
The real-time scheduled driver thread is a thing and HAL thread is another, but I can guarantee after many years of experience in the field, that CoreAudio and upper level pro audio applications succeed in sustaining low latencies even in the range of ~1.5ms (that is e.g. 64 samples at 48kHz) without glitches.
The problem here is that you didn't say "1ms", you said "(~30 to 50 μs)". 1ms is 1000μs. Similarly, the "spike" you're describing here:
Deadline misses cause not predictable high read time spikes (>350us).
...is ~1/4 of the 1.5ms CoreAudio latency number you just quoted. Framing all of this is a different way, what's the actual maximum acceptable latency of your entire "system“? My concern here is that you seem to be trying to run at a frequency that's far faster than the larger system, which is going to unnecessarily make the entire system less reliable.
I can maybe go up to 32, to try matching the minimum HAL buffer, but not higher than that.
Why would you want to be smaller than the minimum HAL buffer?
I mean, if this system works correctly for the 99.999% of the time, there is for sure an Apple Engineer which can tell me why in that 0.001% my read takes 10 times the usual time. I'm sure the cause can be found and tackled.
Let me ask another question first. What else was your system doing during your experiment(s)? My concern is that your answer is going to be "not very much", which means there's a big issue you haven't really considered.
The fundamental problem here is that the large system doesn't actually offer much in the way of "strong" guarantees around scheduling. The very lowest level hardware interface does allow relatively "narrow" timing and the real time thread does offer the strongest guarantee the system offers, but across ALL level systems the basic goal is "do its best to do as much as possible".
On an idle system, that makes this:
I mean, if this system works correctly for the 99.999% of the time
...pretty trivial. That is, the system isn't "doing anything", so as soon as you give it "something" to do, it immediately does it. That works really great until "something" delays things, causing things like this:
...in that 0.001% my read takes 10 times the usual time
It's possible this is true:
there is for sure an Apple Engineer which can tell me why
...but it isn't me and it's also much harder than you might think. There's an enormous amount happening within the system and, as I noted above, the system isn't really trying to organize its work in a way that provides any kind of strong scheduling guarantee. Adding to the fun, conventional debugging tools like logging and even dtrace can be counterproductive, due to the disruption they introduce.
However, the much bigger issue is overall system load, because whatever issues you're having are VERY likely to become MUCH more common once you start to load the system. The specific cause here doesn't matter all that much if this is going to happen 20x more often once you load the system.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware