CoreML GPU NaN bug with fused QKV attention on macOS Tahoe

Problem: CoreML produces NaN on GPU (works fine on CPU) when running transformer attention with fused QKV projection on macOS 26.2.

Root cause: The common::fuse_transpose_matmul optimization pass triggers a Metal kernel bug when sliced tensors feed into matmul(transpose_y=True).

Workaround: pipeline = ct.PassPipeline.DEFAULT pipeline.remove_passes(['common::fuse_transpose_matmul']) mlmodel = ct.convert(model, ..., pass_pipeline=pipeline)

Minimal repro: https://github.com/imperatormk/coreml-birefnet/blob/main/apple_bug_repro.py

Affected: Any ViT/Swin/transformer with fused QKV attention (BiRefNet, etc.)

Has anyone else hit this? Filed FB report too.

CoreML GPU NaN bug with fused QKV attention on macOS Tahoe
 
 
Q