Metal

Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Post

Replies

Boosts

Views

Activity

NSScreen's maximumExtendedDynamicRangeColorComponentValue does not seem to provide the proper value after sleep/wake on third party HDR displays even when there is EDR content on screen in macOS Tahoe

The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR. This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible. Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.

Graphics & Games Metal Metal EDR

338

Dec ’25

Metal: Intersection results unstable when reusing Instance Acceleration Structures

Hi all, I'm encountering an issue with Metal raytracing on my M5 MacBook Pro regarding Instance Acceleration Structure (IAS). Intersection tests suddenly stop working after a certain point in the sampling loop. Situation I implemented an offline GPU path tracer that runs the same kernel multiple times per pixel (sampleCount) using metal::raytracing. Intersection tests are performed using an IAS. Since this is an offline path tracer, geometries inside the IAS never changes across samples (no transforms or updates). As sampleCount increases, there comes a point where the number of intersections drops to zero, and remains zero for all subsequent samples. Here's a code sketch: let sampleCount: UInt16 = 1024 for sampleIndex: UInt16 in 0..<sampleCount { // ... do { let commandBuffer = commandQueue.makeCommandBuffer() // Dispatch the intersection kernel. await commandBuffer.completed() } do { let commandBuffer = commandQueue.makeCommandBuffer() // Use the intersection test results from the previous command buffer. await commandBuffer.completed() } // ... } kernel void intersectAlongRay( const metal::uint32_t threadIndex [[thread_position_in_grid]], // ... const metal::raytracing::instance_acceleration_structure accelerationStructure [[buffer(2)]], // ... ) { // ... const auto result = intersector.intersect(ray, accelerationStructure); switch (result.type) { case metal::raytracing::intersection_type::triangle: { // Write intersection result to device buffers. break; } default: break; } Observations Encoding both the intersection kernel and the subsequent result usage in the same command buffer does not resolve the problem. Switching from IAS to Primitive Acceleration Structure (PAS) fixes the problem. Rebuilding the IAS for each sample also resolves the issue. Intersections produce inconsistent results even though the IAS and rays are identical — Image 1 shows a hit, while Image 2 shows a miss. Questions Am I misusing IAS in some way ? Could this be a Metal bug ? Any guidance or confirmation would be greatly appreciated.

Graphics & Games Metal Metal

362

Dec ’25

App Freezes on iPadOS 26.x - GPU Metal Errors

I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users. I was able to reproduce this and grabbed the following error messages when the freeze happens: IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault) IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource) Environment: Qt 6.5.4 (Qt for iOS) Esri Maps SDK for Qt 200.3 iPadOS 26.1 Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api): QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL) Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple. I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update? Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now. Appreciate any thoughts/insights....

Graphics & Games Metal Metal OpenGLES

588

Dec ’25

iOS Metal system delayed one Vsync period to really display the frame on the screen

View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.

Graphics & Games Metal Metal

621

Dec ’25

Deterministic RNG behaviour across Mac M1 CPU and Metal GPU – BigCrush pass & structural diagnostics

Hello, I am currently working on a research project under ENINCA Consulting, focused on advanced diagnostic tools for pseudorandom number generators (structural metrics, multi-seed stability, cross-architecture reproducibility, and complementary indicators to TestU01). To validate this diagnostic framework, I prototyped a small non-linear 64-bit PRNG (not as a goal in itself, but simply as a vehicle to test the methodology). During these evaluations, I observed something interesting on Apple Silicon (Mac M1): • bit-exact reproducibility between M1 ARM CPU and M1 Metal GPU, • full BigCrush pass on both CPU and Metal backends, • excellent p-values, • stable behaviour across multiple seeds and runs. This was not the intended objective, the goal was mainly to validate the diagnostic concepts, but these results raised some questions about deterministic compute behaviour in Metal. My question: Is there any official guidance on achieving (or expecting) deterministic RNG or compute behaviour across CPU ↔ Metal GPU on Apple Silicon? More specifically: • Are deterministic compute kernels expected or guaranteed on Metal for scientific workloads? • Are there recommended patterns or best practices to ensure reproducibility across GPU generations (M1 → M2 → M3 → M4)? • Are there known Metal features that can introduce non-determinism? I am not sharing the internal recurrence (this work is proprietary), but I can discuss the high-level diagnostic observations if helpful. Thank you for any insight, very interested in how the Metal engineering team views deterministic compute patterns on Apple Silicon. Pascal ENINCA Consulting

Graphics & Games Metal

227

Nov ’25

Deterministic RNG behaviour across Mac M1 CPU and Metal GPU – BigCrush pass & structural diagnostics

Graphics & Games Metal ML Compute Metal Metal Performance Shaders Apple Silicon

343

Nov ’25

MetalFX for Unity 2022.3.62f3?

Hi, I’m testing Unity’s Spaceship HDRP demo on iPhone 17 Pro Max and iPad Pro M4 (iOS 26.1). Everything renders correctly, and my custom MetalFX Spatial plugin initializes successfully — it briefly reports active scaling (e.g. 1434×660 → 2868×1320 at 50% scaling), then reverts to native rendering a few frames later. Setup: Xcode 16.1 (targeting iOS 18) Unity 2022.3.62f3 (HDRP) Metal backend Dynamic Resolution enabled in HDRP assets and cameras Relevant Xcode console excerpt: [MetalFXPlugin] MetalFX_Enable(True) called. [SpaceshipOptions] MetalFX enabled with HDRP dynamic resolution integration. [SpaceshipOptions] Disabled TAA for MetalFX Spatial. [SpaceshipOptions] Created runtime RenderTexture: 1434x660 [MetalFX] Spatial scaler created (1434x660 → 2868x1320). [MetalFX] Processed frame with scaler. [MetalFXPlugin] Sent RenderTexture (1434x660) to MetalFX. Output target 2868x1320. [SpaceshipOptions] MetalFX target set: 1434x660 [SpaceshipOptions] Camera targetTexture cleared after MetalFX handoff. It looks like HDRP clears the camera’s target texture right after MetalFX submits the frame, which causes it to revert to native rendering. Is there a recommended way to persist or rebind the MetalFX output texture when using HDRP on iOS? Unity doesn’t appear to support MetalFX in the Editor either: Thanks!

Graphics & Games Metal MetalFX

296

Nov ’25

Help Request! How to Render Models with SubMeshes Using Metal 4?

Hi, I'm Beginner with Metal 4 and Model I/O 🥺. I can render simple models with just one mesh, but when I try to render models with SubMeshes, nothing shows up on screen. Can anyone help me figure out how to properly render models with multiple submeshes? I think I'm not iterating through them correctly or maybe missing some buffers setup. Here's what I have so far: https://www.icloud.com.cn/iclouddrive/0a6x_NLwlWy-herPocExZ8g3Q#LoadModel

Graphics & Games Metal Metal MetalKit

324

Nov ’25

Are there complete code examples available for “Combine Metal 4 machine learning and graphics”?

Hello, I recently watched the WWDC2025 session titled “Combine Metal 4 machine learning and graphics” (https://developer.apple.com/videos/play/wwdc2025/262/ ), and I’m very excited about the new Metal 4 features that integrate machine learning with graphics—such as neural ambient occlusion, shader-based ML inference, and the use of MTLTensor and MTL4MachineLearningCommandEncoder. While the session includes helpful code snippets and a compelling debug demo (e.g., the neural ambient occlusion example), the implementation details are not fully shown, and I haven’t been able to find a complete, runnable sample project that demonstrates end-to-end integration of ML and rendering in Metal 4. Would Apple be able to provide a full, working example—such as an Xcode project—that shows how to: Export a model to an .mlpackage, Convert it to an .mtlpackage, Use MTL4MachineLearningCommandEncoder alongside render passes, Or embed small neural networks directly in shaders using Shader ML? Having such a sample would greatly help developers like me adopt these powerful new capabilities correctly and efficiently. Thank you very much for your time and support! Best regards,

Graphics & Games Metal Metal MetalKit Core ML

Nov ’25

MPSMatrixRandom SEGFAULTs when ran in an async context

The following minimal snippet SEGFAULTS with SDK 26.0 and 26.1. Won't crash if I remove async from the enclosing function signature - but it's impractical in a real project. import Metal import MetalPerformanceShaders let SEED = UInt64(0x0) typealias T = Float16 /* Why ran in async context? Because global GPU object, and async makeMTLFunction, and async makeMTLComputePipelineState. Nevertheless, can trigger the bug without using global @MainActor let myGPU = MyGPU() */ @main struct CMDLine { static func main() async { let ptr = UnsafeMutablePointer<T>.allocate(capacity: 0) async let future: Void = randomFillOnGPU(ptr, count: 0) print("Main thread is playing around") await future print("Successfully reached the end.") } static func randomFillOnGPU(_ buf: UnsafeMutablePointer<T>, count destbufcount: Int) async { // let (device, queue) = await (myGPU.device, myGPU.commandqueue) let myGPU = MyGPU() let (device, queue) = (myGPU.device, myGPU.commandqueue) // Init MTLBuffer, async let makeFunction, makeComputePipelineState, etc. let tempDataType = MPSDataType.uInt32 let randfiller = MPSMatrixRandomMTGP32(device: device, destinationDataType: tempDataType, seed: Int(bitPattern:UInt(SEED))) print("randomFillOnGPU: successfully created MPSMatrixRandom.") // try await computePipelineState // ^ Crashes before this could return // Or in this minimal case, after randomFillOnGPU() returns // make encoder, set pso, dispatch, commit... } } actor MyGPU { let device : MTLDevice let commandqueue : MTLCommandQueue init() { guard let dev: MTLDevice = MPSGetPreferredDevice(.skipRemovable), let cq = dev.makeCommandQueue(), dev.supportsFamily(.apple6) || dev.supportsFamily(.mac2) else { print("Unable to get Metal Device! Exiting"); exit(EX_UNAVAILABLE) } print("Selected device: \(String(format: "%llX", dev.registryID))") self.device = dev self.commandqueue = cq print("myGPU: initialization complete.") } } See FB20916929. Apparently objc autorelease pool is releasing the wrong address during context switch (across suspension points). I wonder why such obvious case has not been caught before.

Graphics & Games Metal Swift Metal Performance Shaders

197

Nov ’25

Unable to compile Core Image filter on Xcode 26 due to missing Metal toolchain

I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode. If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice? Xcode Version Version 26.0 beta (17A5241e) Build Output Showing All Errors Only Build target Lessons of project StudyJapanese with configuration Light RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese') cd /Users/chris/Code/SerpentiSei/StudyJapanese /bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"' ' error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain /Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain Build failed 6/9/25, 8:31 PM 27.1 seconds Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select) xcodebuild -downloadComponent MetalToolchain Beginning asset download... Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg Done downloading: Metal Toolchain (17A5241c). Screenshots from Xcode Result of "Copy Information" Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)

Graphics & Games Metal Metal Core Image

3.7k

Oct ’25

Float64 (Double Precision) Support on MPS with PyTorch on Apple Silicon?

Hi everyone, This project uses PyTorch on an Apple Silicon Mac (M1/M2/etc.), and the goal is to use the MPS backend for GPU acceleration, notes Apple Developer. However, the workflow depends on Float64 (double-precision) floating-point numbers for certain computations, notes PyTorch Forums. The error "Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead" has been encountered, notes GitHub. It seems that the MPS backend doesn't currently support Float64 for direct GPU computation. Questions for the community: Are there any known workarounds or best practices for handling Float64-dependent operations when using the MPS backend with PyTorch? For those working with high-precision tasks on Apple Silicon, what strategies are being used to balance performance with the need for Float64? Offloading to the CPU is an option, and it's of interest to know if there are any specific techniques or libraries within the Apple ecosystem that could streamline this process while aiming for optimal performance. Any insights, tips, or experiences would be appreciated. Thanks in advance, Jonaid MacBook Pro M3 Max

Graphics & Games Metal

511

Oct ’25

Context I’m deploying large language models on iPhone using llama.cpp. A new iPhone Air (12 GB RAM) reports a Metal MTLDevice.recommendedMaxWorkingSetSize of 8,192 MB, and my attempt to load Llama-2-13B Q4_K (~7.32 GB weights) fails during model initialization. Environment Device: iPhone Air (12 GB RAM) iOS: 26 Xcode: 26.0.1 Build: Metal backend enabled llama.cpp App runs on device (not Simulator) What I’m seeing MTLCreateSystemDefaultDevice().recommendedMaxWorkingSetSize == 8192 MiB Loading Llama-2-13B Q4_K (7.32 GB) fails to complete. Logs indicate memory pressure / allocation issues consistent with the 8 GB working-set guidance. Smaller models (e.g., 7B/8B with similar quantization) load and run (8B Q4_K provide around 9 tokens/second decoding speed). Questions Is 8,192 MB an expected recommendedMaxWorkingSetSize on a 12 GB iPhone? What values should I expect on other 2025 devices including iPhone 17 (8 GB RAM) and iPhone 17 Pro (12 GB RAM) Is it strictly enforced by Metal allocations (heaps/buffers), or advisory for best performance/eviction behavior? Can a process practically exceed this for long-lived buffers without immediate Jetsam risk? Any guidance for LLM scenarios near the limit?

Graphics & Games Metal ML Compute iOS iPhone Metal

702

Oct ’25

Why is there no Metal on Apple Watch?

subj And how in this case are beautiful system dials made with smoke effects and other particles?

Graphics & Games Metal

333

Oct ’25

How to use MetalPeformancePrimitives

I am trying to learn the new Metal Peformance Primitives APIs. I have added the MetalPeformancePrimitives framework and included the header in my shader code as per documentation #include <MetalPeformancePrimitives/MetalPeformancePrimitives.h> Unfortunately, Xcode complains that the header cannot be found. How do I include it properly? I am using Xcode 26 on Tahoe. The MetalPeformancePrimitives framework is present on my machine and I can inspect the headers in the filesystem.

Graphics & Games Metal

807

Oct ’25

Can you delete a MTLLibrary once shaders are placed into pipeline?

Hello, I am quite new to using the metal API and was wondering if it was common (or even possible) if you knew that, when a pipeline was created, you never needed to make another one with the same shaders again, if it is safe to release the library the was used to reference the shaders? Only asking because this is possible in other apis, but apple never mentions (as far as I have found) if this is safe or not safe to do.

Graphics & Games Metal

409

Oct ’25

10-bit support in iPad Pro

Hi, I’m using the latest iPad Pro (13-inch) and I can see that Metal offers an rgb10a2unorm texture for rendering, but when I render a grey ramp and measure the actual luminance, I get a pattern that I would expect from an 8-bit texture (see below). Before I start ripping apart all my code, is there anything else I need to do to convince iOS to render my texture in 10-bit? I already tried setting the PixelFormat in my CMetalLayer to rgb10a2unorm, but that didn’t change anything.

Graphics & Games Metal iPad Metal

465

Sep ’25

Metal fails to create PSO on AMD based GPUs

Hello, Shaders in our application is written using HLSL and we rely on Metal Shader Converter to convert DXIL to Metal IR. We ran into an issue that causes metal pipeline state creation to fail when vertex stage-in function is used on AMD GPUs. Here's the error reported by Metal in Xcode output: Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED XPC_ERROR_CONNECTION_INTERRUPTED MTLCompiler: Compilation failed with XPC_ERROR_CONNECTION_INTERRUPTED on 4 try. This error suggests an unexpected interruption in the connection. Possible reasons: a crash in the compiler service, termination by the OS due to resource constraints (e.g., jetsam), a timeout in the service, or an issue with IPC. Verify system stability and check the logs for more details. Compiler failed with XPC_ERROR_CONNECTION_INVALID XPC_ERROR_CONNECTION_INVALID MTLCompiler: Compiler encountered XPC_ERROR_CONNECTION_INVALID: failed to check-in, peer may have been unloaded: mach_error=10000003 (is the OS shutting down or process jetsammed?) Compilation failed due to an interrupted connection: XPC_ERROR_CONNECTION_INTERRUPTED. This error occurred after multiple retries. which seems to indicate a internal compiler error. I have a minimal repro here: https://github.com/kcloudy0717/metal_pso_fail/tree/main, simply follow the instructions in README.

Graphics & Games Metal Metal Metal Shader Converter

231

Sep ’25

Pink screen on MTLCommandBuffer.presentDrawable.

I rewrote my graphics pipeline to use Load/Store better for clearing and don't care cases. All my tests pass, and in the Metal debugger, all the draw calls succeed. But when I present drawables (before [commandBuffer commit]) I only get a pink screen. I've tried everything I can think of: making sure the pixel formats are the same for the back buffer as my render targets, etc. But it's still pink. Could you point me in the right direction so I can fix this, or help describe why it's pink. That would be really helpful. Thank you, Brian Hapgood

Graphics & Games Metal

424

Sep ’25

Metal HUD Display Value Range

Can't seem to get the Metal HUD to display value range's (pre 26 Tahoe). The documented environment variable MTL_HUD_SHOW_VALUE_RANGE doesn't seem to work. https://developer.apple.com/documentation/xcode/monitoring-your-metal-apps-graphics-performance#Display-the-value-range-of-metrics Anyone having any luck?

Graphics & Games Metal

361

Sep ’25