Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

Background GPU Access availability
I would love to use Background GPU Access to do some video processing in the background. However the documentation of BGContinuedProcessingTaskRequest.Resources.gpu clearly states: Not all devices support background GPU use. For more information, see Performing long-running tasks on iOS and iPadOS. Is there a list available of currently released devices that do (or don't) support GPU background usage? That would help to understand what part of our user base can use this feature. (And what hardware we need to test this on as developers.) For example it seems that it isn't supported on an iPad Pro M1 with the current iOS 26 beta. The simulators also seem to not support the background GPU resource. So would be great to understand what hardware is capable of using this feature!
5
0
1.2k
5h
Timestamp counter heap always returns zero
Hi, I am trying to use a timestamp counter heap, but it always seems to report timestamp zero. Consider this example program: #include <Metal/Metal.h> #include <assert.h> int main(int argc, char *argv[]) { auto device = MTLCreateSystemDefaultDevice(); assert(device); auto descriptor = [MTL4CounterHeapDescriptor new]; [descriptor setType:MTL4CounterHeapTypeTimestamp]; [descriptor setCount:1]; auto heap = [device newCounterHeapWithDescriptor:descriptor error:nullptr]; assert(heap); [heap invalidateCounterRange:NSMakeRange(0, 1)]; auto command_buffer = [device newCommandBuffer]; assert(command_buffer); auto allocator = [device newCommandAllocator]; assert(allocator); [command_buffer beginCommandBufferWithAllocator:allocator]; auto encoder = [command_buffer computeCommandEncoder]; assert(encoder); [encoder writeTimestampWithGranularity:MTL4TimestampGranularityPrecise intoHeap:heap atIndex:0]; [encoder endEncoding]; [command_buffer endCommandBuffer]; auto queue = [device newMTL4CommandQueue]; assert(queue); auto event = [device newSharedEvent]; assert(event); [queue commit:&command_buffer count:1]; [queue signalEvent:event value:1]; [event waitUntilSignaledValue:1 timeoutMS:UINT64_MAX]; auto data = [heap resolveCounterRange:NSMakeRange(0, 1)]; printf("size %lu: %llu\n", data.length, *(uint64_t*)data.bytes); return 0; } Trying to compile and run: % clang++ -g -O0 -o test test.mm -framework Metal -framework Foundation && MTL_DEBUG_LAYER=1 ./test 2026-06-23 14:44:48.006 test[26472:1588857] Metal API Validation Enabled size 8: 0 I would have expected to receive size 8: [some random non-zero number] that number being a GPU timestamp of when the command was executed, but I always get zero. Does anybody have an idea of what I am doing wrong?
0
0
73
4d
MDLAsset loads texture in usdz file loaded with wrong colorspace
I have a very basic usdz file from this repo I call loadTextures() after loading the usdz via MDLAsset. Inspecting the MDLTexture object I can tell it is assigning a colorspace of linear rgb instead of srgb although the image file in the usdz is srgb. This causes the textures to ultimately render as over saturated. In the code I later convert the MDLTexture to MTLTexture via MTKTextureLoader but if I set the srgb option it seems to ignore it. This significantly impacts the usefulness of Model I/O if it can't load a simple usdz texture correctly. Am I missing something? Thanks!
4
3
1k
6d
Comprehensive documentation and literature
The WWDC videos like the new "Boost your graphics performance with the M5 and A19 GPUs" contain extremely valuable information and tips on how to discover, diagnose and remedy performance issues. They seem to serve as quick reminders and distilled summaries of more comprehensive documentation that I assume can be found somewhere. Where do we find the underlying comprehensive documentation that explains Apple Silicon GPU architecture? How can I learn to understand the basis of the data presented by the Xcode Metal Debugger? Any hints at external literature and resources are welcome.
1
0
184
2w
Documentation and literature
The WWDC videos like the new "Boost your graphics performance with the M5 and A19 GPUs" contain extremely valuable information and tips on how to discover, diagnose and remedy performance issues. They seem to serve as quick reminders and distilled summaries of more comprehensive documentation that I assume can be found somewhere. Where do we find the underlying comprehensive documentation that explains Apple Silicon GPU architecture? How can I learn to understand the basis of the data presented by the Xcode Metal Debugger? Any hints at external literature and resources are welcome.
0
0
151
2w
Performance Optimization for Large-Kernel Image Processing
I am processing large images where each output pixel depends on a large neighborhood of surrounding pixels. As a result, the shader performs a very high number of texture sampling operations, which appears to cause cache misses and becomes a performance bottleneck. Since neighboring threads often process adjacent pixels, many of the sampled pixels overlap between threads. Although each thread operates on a slightly different output pixel, a large portion of the texture accesses are effectively identical. Does Metal provide mechanisms that allow neighboring threads to share or synchronize intermediate results in order to reduce redundant texture fetches? Are there recommended approaches for exploiting data reuse across threads, for example through threadgroup memory or other Metal-specific features? In this type of workload, how effective is texture gathering (gather) for reducing sampling overhead, especially when only the RGB channels of an RGBA texture are required? Would using gather generally improve cache utilization and performance in this scenario? When using gather, what is the preferred way to handle texture borders and edge conditions without introducing per-thread branching (e.g., explicit if statements)? Any recommendations for optimizing large-radius neighborhood operations in Metal would be greatly appreciated.
1
0
181
2w
Memory allocation of textures in Metal
At which time does Metal allocate and deallocate memory for textures? I've observed that the textures live for the whole time of the commandBuffer. So, if I have multiple large textures that I need in subsequent shaders, it would make sense to work with multiple commandBuffers to enable deallocation in order to reduce peak memory usage. Is that correct? Do you have any other suggestions on how to reduce peak memory usage when working with large metal textures? Hint: I am using compute shaders only.
1
1
191
2w
MetalFX upscaler/denoiser and instant changes
Hi, What's the best way to handle drastic changes in scene charateristics with the new MTLFXTemporalDenoisedScaler? Let's say a visible object of the scene radically changes its material properties. I can modify the albedo and roughness textures consequently. But I suspect the history will be corrupted. Blending visual information between the new frame and the previous ones might be a nonsense. I guess the problem should be the same when objects appear or disappear instantly. Is the upsacler manage these events for us (by lowering blending), or should we use the reactive or the denoise strength mask or something like that to handle them?
3
0
440
2w
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets Summary The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional. This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference. System Component Value Model MacBook Pro (Mac17,9) Chip Apple M5 Pro (applegpu_g17s) GPU Cores 16 RAM 24 GB LPDDR5 macOS 26.5 (25F71) Metal Metal 4 GPU Driver AGXMetalG17X 351.2 Xcode 26.5 (17F42) Reproduction MLX (Python) pip install mlx mlx-lm python -m mlx_lm.generate \ --model mlx-community/Qwen2.5-3B-Instruct-4bit \ --max-tokens 10 \ --prompt "Hello" Expected: Normal text generation Actual: Crash with: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) llama.cpp brew install llama.cpp llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99 Expected: Fast GPU generation Actual: Process hangs indefinitely Test Results Tool Model Peak Memory Result MLX Qwen2.5-0.5B-4bit 0.36 GB ✅ Works MLX Qwen2.5-1.5B-4bit 0.98 GB ✅ Works MLX Qwen3-1.7B-4bit 1.01 GB ✅ Works MLX Qwen2.5-3B-4bit ~1.5 GB ❌ Metal OOM crash MLX Qwen3-4B-4bit ~2.1 GB ❌ Metal OOM crash MLX Qwen3-8B-4bit ~4.5 GB ❌ Metal OOM crash llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ❌ Hangs with GPU llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ✅ Works with CPU only Key Evidence Hardware is healthy — Apple Diagnostics passed all tests Basic Metal works — matmul, array ops work fine CPU inference works — llama.cpp with -ngl 0 runs correctly The error is NOT about actual memory exhaustion — kIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set. Crash Log Extract Thread 31 Crashed: 0 libsystem_kernel.dylib __pthread_kill + 8 1 libsystem_pthread.dylib pthread_kill + 296 2 libsystem_c.dylib abort + 148 3 Metal MTLReportFailure.cold.1 + 48 4 Metal MTLReportFailure + 576 5 Metal -[_MTLCommandBuffer addCompletedHandler:] + 104 ... Exception Type: EXC_CRASH (SIGABRT) Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6 Related Issues ml-explore/mlx#3586 — Metal compiler regression on macOS 26.5 ml-explore/mlx#3534 — M5 float32 precision issue ml-explore/mlx#3568 — M5 random divergence ml-explore/mlx#3539 — Metal residency OOM (M4 Max) Request Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available). Happy to provide full crash logs, sysdiagnose archives, or run additional tests.
0
0
306
May ’26
Inexplicable Metal crash ever since iOS 26.5 beta 4
Hi all, I'm working on updating my audio visualizer app. I'm adding new visualizers based on Metal 4 compute shaders. They worked in iOS 26.4 and iOS 26.5 up until beta 3. However, after that, the visualizers started crashing the phone and forcing a restart. On the latest version of iOS 26.5, the crash is still there. I submitted feedback, but haven't heard anything back just yet. I was wondering if others have faced this same issue, and if there are any workarounds. Here is my repo if you want to look at the code (forgive me if it's sloppy, I'm quite new to graphics programming and Metal): https://github.com/aabagdi/VisualMan/tree/main Thank you!
4
0
1.5k
May ’26
Using setVertexBytes for index primitives
When using index primitives is there a method to provide the indices using a temp buffer like setVertexBytes? Right now I have to create a temp metal buffer even for a small number of vertices and toss it after rendering using drawIndexedPrimitives.
1
0
654
May ’26
MTL4FXTemporalDenoisedScaler initialization
I’m trying to use MTL4FXTemporalDenoisedScaler, and I’m seeing a crash during initialization even with a very simple sample app. I created a minimal sample here: https://github.com/tatsuya-ogawa/MetalFXInitExample The exception is: NSException: "-[AGXG16XFamilyHeap baseObject]: unrecognized selector sent to instance ..." What I found is: • This works: descriptor.makeTemporalDenoisedScaler(device: device) • This crashes: descriptor.makeTemporalDenoisedScaler(device: device, compiler: metal4Compiler) So the issue seems to happen only with the Metal4FX version. For testing, I’m using an iPhone 15 Pro. According to the Metal Feature Set Tables, MetalFX denoised upscaling should be supported on Apple9 and later, so I believe the device itself should meet the requirements. Reference: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf Has anyone seen this before, or knows what might be causing it? I’d appreciate any advice. Thanks.
4
2
562
Apr ’26
Cannot load .mtlpackage to MTLLibrary
After watching WWDC 2025 session "Combine Metal 4 machine learning and graphics", I have decided to give it a shot to integrate the latest MTL4MachineLearningCommandEncoder to my existing render pipeline. After a lot of trial and errors, I managed to set up the pipeline and have the app compiled. However, I am now stuck on creating a MTLLibrary with .mtlpackage. Here is the code I have to create a MTLLibrary according the WWDC session https://developer.apple.com/videos/play/wwdc2025/262/?time=550: let coreMLFilePath = bundle.path(forResource: "my_model", ofType: "mtlpackage")! let coreMLURL = URL(string: coreMLFilePath)! do { metalDevice.makeLibrary(URL: coreMLURL) } catch { print("error: \(error)") } With the above code, I am getting error: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid metal package" UserInfo={NSLocalizedDescription=Invalid metal package} What is the correct way to create a MTLLibrary with .mtlpackage? Do I see this error because the .mtlpackage I am using is incorrect? How should I go with debugging this? I'd really appreciate if I could get some help on this as I have been stuck with it for some time now. Thanks in advance!
1
0
685
Apr ’26
Can a compute pipeline be as efficient as a render pipeline for rasterization?
I'm new to graphics and game design and I just wanted to know if a compute pipeline could be as efficient as a render pipeline for rasterization and an explanation on how and why. Also is it possible to manually perform rasterization with a render pipeline as in manipulate individual pixel data in a metal texture yourself but do it with a render pipeline?
1
0
757
Apr ’26
Question on setVertexBytes
I think if your buffer is less than 4k its recommended to use setVertexBytes, the question I have is can I keep hammering on setVertexBytes as the primary method to issue multiple draw calls within a render buffer and rely on Metal to figure out how to orphan and replace the target buffer? A lot of the primitives I am drawing are less than 4k and the process of wiring down larger segments of memory for individual buffers for each draw primitive call seems to be a negative. And it's just simpler to copy, submit and forget about buffer synchronization.
2
0
816
Apr ’26
GPTK 3 and D3DMetal issue with Modern Pipeline Creation
Death Stranding 2: On the Beach (v1.0.48.0, Steam) crashes during rendering initialization when running through CrossOver 26 with D3DMetal 3.0 on an Apple M2 Max Mac Studio running macOS Sequoia. The game successfully initializes Streamline, NVAPI, DLSS (Result::eOk), DLSSG (Result::eOk), Reflex, and XeSS — all subsystems report success. The crash occurs immediately after, during rendering pipeline creation, before the game reaches NXStorage initialization or window creation. Minidump analysis confirms the crash is an access violation (0xc0000005) at DS2.exe+0x67233d, writing to address 0x0. RAX=0x0 (null pointer being dereferenced), R12=0xFFFFFFFFFFFFFFFF (error/invalid handle return). The game appears to call a D3D12 API — likely CheckFeatureSupport or a pipeline state creation function — that D3DMetal acknowledges as supported but returns null or invalid data for. The game trusts the response and dereferences the null pointer. Two other Nixxes titles using the same engine and D3DMetal setup run without issue: Spider-Man 2 (~50 FPS) and Horizon Zero Dawn Remastered (~34 FPS). DS2 uses newer technology versions (DLSS 4, FSR 4, XeSS 2) and a newer DirectX 12 Agility SDK, which likely queries D3D12 features that D3DMetal does not yet fully implement. The crash also reproduces when D3DMetal reports as AMD vendor (1002) instead of NVIDIA (10de), crashing at the same executable offset, confirming it is a D3D12 feature reporting gap in D3DMetal rather than a vendor-specific issue. How To Reproduce Install Crossover 26+ on MacOS 26.4 Install Steam and download Death Stranding 2 Run Death Stranding 2 and check logs after crash in Documents\DEATH STRANDING 2 ON THE BEACH Feedback Requests FB22285513 — Game Porting Toolkit 3 issue with Modern Pipeline Creation
1
4
945
Apr ’26
Xcode26 Replay frame broken
Got a broken frame when using Xcode to capture a frame and replay it from a Unity game. It seems like the vertex buffer is broken; I see a bunch of "nan"s in the vertex buffer. However, the game displays correct when running, and it only happend when I upgrade my Xcode and iphone to Xcode26 and IOS26 ios26
1
0
546
Apr ’26
Background GPU Access availability
I would love to use Background GPU Access to do some video processing in the background. However the documentation of BGContinuedProcessingTaskRequest.Resources.gpu clearly states: Not all devices support background GPU use. For more information, see Performing long-running tasks on iOS and iPadOS. Is there a list available of currently released devices that do (or don't) support GPU background usage? That would help to understand what part of our user base can use this feature. (And what hardware we need to test this on as developers.) For example it seems that it isn't supported on an iPad Pro M1 with the current iOS 26 beta. The simulators also seem to not support the background GPU resource. So would be great to understand what hardware is capable of using this feature!
Replies
5
Boosts
0
Views
1.2k
Activity
5h
Timestamp counter heap always returns zero
Hi, I am trying to use a timestamp counter heap, but it always seems to report timestamp zero. Consider this example program: #include <Metal/Metal.h> #include <assert.h> int main(int argc, char *argv[]) { auto device = MTLCreateSystemDefaultDevice(); assert(device); auto descriptor = [MTL4CounterHeapDescriptor new]; [descriptor setType:MTL4CounterHeapTypeTimestamp]; [descriptor setCount:1]; auto heap = [device newCounterHeapWithDescriptor:descriptor error:nullptr]; assert(heap); [heap invalidateCounterRange:NSMakeRange(0, 1)]; auto command_buffer = [device newCommandBuffer]; assert(command_buffer); auto allocator = [device newCommandAllocator]; assert(allocator); [command_buffer beginCommandBufferWithAllocator:allocator]; auto encoder = [command_buffer computeCommandEncoder]; assert(encoder); [encoder writeTimestampWithGranularity:MTL4TimestampGranularityPrecise intoHeap:heap atIndex:0]; [encoder endEncoding]; [command_buffer endCommandBuffer]; auto queue = [device newMTL4CommandQueue]; assert(queue); auto event = [device newSharedEvent]; assert(event); [queue commit:&command_buffer count:1]; [queue signalEvent:event value:1]; [event waitUntilSignaledValue:1 timeoutMS:UINT64_MAX]; auto data = [heap resolveCounterRange:NSMakeRange(0, 1)]; printf("size %lu: %llu\n", data.length, *(uint64_t*)data.bytes); return 0; } Trying to compile and run: % clang++ -g -O0 -o test test.mm -framework Metal -framework Foundation && MTL_DEBUG_LAYER=1 ./test 2026-06-23 14:44:48.006 test[26472:1588857] Metal API Validation Enabled size 8: 0 I would have expected to receive size 8: [some random non-zero number] that number being a GPU timestamp of when the command was executed, but I always get zero. Does anybody have an idea of what I am doing wrong?
Replies
0
Boosts
0
Views
73
Activity
4d
MDLAsset loads texture in usdz file loaded with wrong colorspace
I have a very basic usdz file from this repo I call loadTextures() after loading the usdz via MDLAsset. Inspecting the MDLTexture object I can tell it is assigning a colorspace of linear rgb instead of srgb although the image file in the usdz is srgb. This causes the textures to ultimately render as over saturated. In the code I later convert the MDLTexture to MTLTexture via MTKTextureLoader but if I set the srgb option it seems to ignore it. This significantly impacts the usefulness of Model I/O if it can't load a simple usdz texture correctly. Am I missing something? Thanks!
Replies
4
Boosts
3
Views
1k
Activity
6d
Comprehensive documentation and literature
The WWDC videos like the new "Boost your graphics performance with the M5 and A19 GPUs" contain extremely valuable information and tips on how to discover, diagnose and remedy performance issues. They seem to serve as quick reminders and distilled summaries of more comprehensive documentation that I assume can be found somewhere. Where do we find the underlying comprehensive documentation that explains Apple Silicon GPU architecture? How can I learn to understand the basis of the data presented by the Xcode Metal Debugger? Any hints at external literature and resources are welcome.
Replies
1
Boosts
0
Views
184
Activity
2w
Documentation and literature
The WWDC videos like the new "Boost your graphics performance with the M5 and A19 GPUs" contain extremely valuable information and tips on how to discover, diagnose and remedy performance issues. They seem to serve as quick reminders and distilled summaries of more comprehensive documentation that I assume can be found somewhere. Where do we find the underlying comprehensive documentation that explains Apple Silicon GPU architecture? How can I learn to understand the basis of the data presented by the Xcode Metal Debugger? Any hints at external literature and resources are welcome.
Replies
0
Boosts
0
Views
151
Activity
2w
Performance Optimization for Large-Kernel Image Processing
I am processing large images where each output pixel depends on a large neighborhood of surrounding pixels. As a result, the shader performs a very high number of texture sampling operations, which appears to cause cache misses and becomes a performance bottleneck. Since neighboring threads often process adjacent pixels, many of the sampled pixels overlap between threads. Although each thread operates on a slightly different output pixel, a large portion of the texture accesses are effectively identical. Does Metal provide mechanisms that allow neighboring threads to share or synchronize intermediate results in order to reduce redundant texture fetches? Are there recommended approaches for exploiting data reuse across threads, for example through threadgroup memory or other Metal-specific features? In this type of workload, how effective is texture gathering (gather) for reducing sampling overhead, especially when only the RGB channels of an RGBA texture are required? Would using gather generally improve cache utilization and performance in this scenario? When using gather, what is the preferred way to handle texture borders and edge conditions without introducing per-thread branching (e.g., explicit if statements)? Any recommendations for optimizing large-radius neighborhood operations in Metal would be greatly appreciated.
Replies
1
Boosts
0
Views
181
Activity
2w
Opportunities to use Apple intelligence.
Are there opportunities for developers to use Apple Intelligence models through Metal in ways that unlock new rendering, simulation, or real-time content generation techniques?
Replies
1
Boosts
0
Views
173
Activity
2w
Memory allocation of textures in Metal
At which time does Metal allocate and deallocate memory for textures? I've observed that the textures live for the whole time of the commandBuffer. So, if I have multiple large textures that I need in subsequent shaders, it would make sense to work with multiple commandBuffers to enable deallocation in order to reduce peak memory usage. Is that correct? Do you have any other suggestions on how to reduce peak memory usage when working with large metal textures? Hint: I am using compute shaders only.
Replies
1
Boosts
1
Views
191
Activity
2w
MetalFX upscaler/denoiser and instant changes
Hi, What's the best way to handle drastic changes in scene charateristics with the new MTLFXTemporalDenoisedScaler? Let's say a visible object of the scene radically changes its material properties. I can modify the albedo and roughness textures consequently. But I suspect the history will be corrupted. Blending visual information between the new frame and the previous ones might be a nonsense. I guess the problem should be the same when objects appear or disappear instantly. Is the upsacler manage these events for us (by lowering blending), or should we use the reactive or the denoise strength mask or something like that to handle them?
Replies
3
Boosts
0
Views
440
Activity
2w
metal shader converter library distribution
The documentation is unclear. I need a clarification on metal shader converter library distribution. Am I allowed to distribute the library as a part of my macos app bundle?
Replies
0
Boosts
1
Views
147
Activity
3w
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets Summary The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional. This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference. System Component Value Model MacBook Pro (Mac17,9) Chip Apple M5 Pro (applegpu_g17s) GPU Cores 16 RAM 24 GB LPDDR5 macOS 26.5 (25F71) Metal Metal 4 GPU Driver AGXMetalG17X 351.2 Xcode 26.5 (17F42) Reproduction MLX (Python) pip install mlx mlx-lm python -m mlx_lm.generate \ --model mlx-community/Qwen2.5-3B-Instruct-4bit \ --max-tokens 10 \ --prompt "Hello" Expected: Normal text generation Actual: Crash with: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) llama.cpp brew install llama.cpp llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99 Expected: Fast GPU generation Actual: Process hangs indefinitely Test Results Tool Model Peak Memory Result MLX Qwen2.5-0.5B-4bit 0.36 GB ✅ Works MLX Qwen2.5-1.5B-4bit 0.98 GB ✅ Works MLX Qwen3-1.7B-4bit 1.01 GB ✅ Works MLX Qwen2.5-3B-4bit ~1.5 GB ❌ Metal OOM crash MLX Qwen3-4B-4bit ~2.1 GB ❌ Metal OOM crash MLX Qwen3-8B-4bit ~4.5 GB ❌ Metal OOM crash llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ❌ Hangs with GPU llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ✅ Works with CPU only Key Evidence Hardware is healthy — Apple Diagnostics passed all tests Basic Metal works — matmul, array ops work fine CPU inference works — llama.cpp with -ngl 0 runs correctly The error is NOT about actual memory exhaustion — kIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set. Crash Log Extract Thread 31 Crashed: 0 libsystem_kernel.dylib __pthread_kill + 8 1 libsystem_pthread.dylib pthread_kill + 296 2 libsystem_c.dylib abort + 148 3 Metal MTLReportFailure.cold.1 + 48 4 Metal MTLReportFailure + 576 5 Metal -[_MTLCommandBuffer addCompletedHandler:] + 104 ... Exception Type: EXC_CRASH (SIGABRT) Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6 Related Issues ml-explore/mlx#3586 — Metal compiler regression on macOS 26.5 ml-explore/mlx#3534 — M5 float32 precision issue ml-explore/mlx#3568 — M5 random divergence ml-explore/mlx#3539 — Metal residency OOM (M4 Max) Request Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available). Happy to provide full crash logs, sysdiagnose archives, or run additional tests.
Replies
0
Boosts
0
Views
306
Activity
May ’26
Inexplicable Metal crash ever since iOS 26.5 beta 4
Hi all, I'm working on updating my audio visualizer app. I'm adding new visualizers based on Metal 4 compute shaders. They worked in iOS 26.4 and iOS 26.5 up until beta 3. However, after that, the visualizers started crashing the phone and forcing a restart. On the latest version of iOS 26.5, the crash is still there. I submitted feedback, but haven't heard anything back just yet. I was wondering if others have faced this same issue, and if there are any workarounds. Here is my repo if you want to look at the code (forgive me if it's sloppy, I'm quite new to graphics programming and Metal): https://github.com/aabagdi/VisualMan/tree/main Thank you!
Replies
4
Boosts
0
Views
1.5k
Activity
May ’26
Metal 4 support in iOS simulator
I'm updating our app to support metal 4, but the metal 4 types don't seem to get recognized when targeting simulator. Is it known if metal 4 will be supported in the near future, or am I setting up the app wrong?
Replies
6
Boosts
0
Views
1.4k
Activity
May ’26
Using setVertexBytes for index primitives
When using index primitives is there a method to provide the indices using a temp buffer like setVertexBytes? Right now I have to create a temp metal buffer even for a small number of vertices and toss it after rendering using drawIndexedPrimitives.
Replies
1
Boosts
0
Views
654
Activity
May ’26
MTL4FXTemporalDenoisedScaler initialization
I’m trying to use MTL4FXTemporalDenoisedScaler, and I’m seeing a crash during initialization even with a very simple sample app. I created a minimal sample here: https://github.com/tatsuya-ogawa/MetalFXInitExample The exception is: NSException: "-[AGXG16XFamilyHeap baseObject]: unrecognized selector sent to instance ..." What I found is: • This works: descriptor.makeTemporalDenoisedScaler(device: device) • This crashes: descriptor.makeTemporalDenoisedScaler(device: device, compiler: metal4Compiler) So the issue seems to happen only with the Metal4FX version. For testing, I’m using an iPhone 15 Pro. According to the Metal Feature Set Tables, MetalFX denoised upscaling should be supported on Apple9 and later, so I believe the device itself should meet the requirements. Reference: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf Has anyone seen this before, or knows what might be causing it? I’d appreciate any advice. Thanks.
Replies
4
Boosts
2
Views
562
Activity
Apr ’26
Cannot load .mtlpackage to MTLLibrary
After watching WWDC 2025 session "Combine Metal 4 machine learning and graphics", I have decided to give it a shot to integrate the latest MTL4MachineLearningCommandEncoder to my existing render pipeline. After a lot of trial and errors, I managed to set up the pipeline and have the app compiled. However, I am now stuck on creating a MTLLibrary with .mtlpackage. Here is the code I have to create a MTLLibrary according the WWDC session https://developer.apple.com/videos/play/wwdc2025/262/?time=550: let coreMLFilePath = bundle.path(forResource: "my_model", ofType: "mtlpackage")! let coreMLURL = URL(string: coreMLFilePath)! do { metalDevice.makeLibrary(URL: coreMLURL) } catch { print("error: \(error)") } With the above code, I am getting error: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid metal package" UserInfo={NSLocalizedDescription=Invalid metal package} What is the correct way to create a MTLLibrary with .mtlpackage? Do I see this error because the .mtlpackage I am using is incorrect? How should I go with debugging this? I'd really appreciate if I could get some help on this as I have been stuck with it for some time now. Thanks in advance!
Replies
1
Boosts
0
Views
685
Activity
Apr ’26
Can a compute pipeline be as efficient as a render pipeline for rasterization?
I'm new to graphics and game design and I just wanted to know if a compute pipeline could be as efficient as a render pipeline for rasterization and an explanation on how and why. Also is it possible to manually perform rasterization with a render pipeline as in manipulate individual pixel data in a metal texture yourself but do it with a render pipeline?
Replies
1
Boosts
0
Views
757
Activity
Apr ’26
Question on setVertexBytes
I think if your buffer is less than 4k its recommended to use setVertexBytes, the question I have is can I keep hammering on setVertexBytes as the primary method to issue multiple draw calls within a render buffer and rely on Metal to figure out how to orphan and replace the target buffer? A lot of the primitives I am drawing are less than 4k and the process of wiring down larger segments of memory for individual buffers for each draw primitive call seems to be a negative. And it's just simpler to copy, submit and forget about buffer synchronization.
Replies
2
Boosts
0
Views
816
Activity
Apr ’26
GPTK 3 and D3DMetal issue with Modern Pipeline Creation
Death Stranding 2: On the Beach (v1.0.48.0, Steam) crashes during rendering initialization when running through CrossOver 26 with D3DMetal 3.0 on an Apple M2 Max Mac Studio running macOS Sequoia. The game successfully initializes Streamline, NVAPI, DLSS (Result::eOk), DLSSG (Result::eOk), Reflex, and XeSS — all subsystems report success. The crash occurs immediately after, during rendering pipeline creation, before the game reaches NXStorage initialization or window creation. Minidump analysis confirms the crash is an access violation (0xc0000005) at DS2.exe+0x67233d, writing to address 0x0. RAX=0x0 (null pointer being dereferenced), R12=0xFFFFFFFFFFFFFFFF (error/invalid handle return). The game appears to call a D3D12 API — likely CheckFeatureSupport or a pipeline state creation function — that D3DMetal acknowledges as supported but returns null or invalid data for. The game trusts the response and dereferences the null pointer. Two other Nixxes titles using the same engine and D3DMetal setup run without issue: Spider-Man 2 (~50 FPS) and Horizon Zero Dawn Remastered (~34 FPS). DS2 uses newer technology versions (DLSS 4, FSR 4, XeSS 2) and a newer DirectX 12 Agility SDK, which likely queries D3D12 features that D3DMetal does not yet fully implement. The crash also reproduces when D3DMetal reports as AMD vendor (1002) instead of NVIDIA (10de), crashing at the same executable offset, confirming it is a D3D12 feature reporting gap in D3DMetal rather than a vendor-specific issue. How To Reproduce Install Crossover 26+ on MacOS 26.4 Install Steam and download Death Stranding 2 Run Death Stranding 2 and check logs after crash in Documents\DEATH STRANDING 2 ON THE BEACH Feedback Requests FB22285513 — Game Porting Toolkit 3 issue with Modern Pipeline Creation
Replies
1
Boosts
4
Views
945
Activity
Apr ’26
Xcode26 Replay frame broken
Got a broken frame when using Xcode to capture a frame and replay it from a Unity game. It seems like the vertex buffer is broken; I see a bunch of "nan"s in the vertex buffer. However, the game displays correct when running, and it only happend when I upgrade my Xcode and iphone to Xcode26 and IOS26 ios26
Replies
1
Boosts
0
Views
546
Activity
Apr ’26