Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

Cannot Display MTKView on a sheeted view on macOS15
I use xcode16 and swiftUI for programming on a macos15 system. There is a problem. When I render a picture through mtkview, it is normal when displayed on a regular view. However, when the view is displayed through the .sheet method, the image cannot be displayed. There is no error message from xcode. import Foundation import MetalKit import SwiftUI struct CIImageDisplayView: NSViewRepresentable { typealias NSViewType = MTKView var ciImage: CIImage init(ciImage: CIImage) { self.ciImage = ciImage } func makeNSView(context: Context) -> MTKView { let view = MTKView() view.delegate = context.coordinator view.preferredFramesPerSecond = 60 view.enableSetNeedsDisplay = true view.isPaused = true view.framebufferOnly = false if let defaultDevice = MTLCreateSystemDefaultDevice() { view.device = defaultDevice } view.delegate = context.coordinator return view } func updateNSView(_ nsView: MTKView, context: Context) { } func makeCoordinator() -> RawDisplayRender { RawDisplayRender(ciImage: self.ciImage) } class RawDisplayRender: NSObject, MTKViewDelegate { // MARK: Metal resources var device: MTLDevice! var commandQueue: MTLCommandQueue! // MARK: Core Image resources var context: CIContext! var ciImage: CIImage init(ciImage: CIImage) { self.ciImage = ciImage self.device = MTLCreateSystemDefaultDevice() self.commandQueue = self.device.makeCommandQueue() self.context = CIContext(mtlDevice: self.device) } func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {} func draw(in view: MTKView) { guard let currentDrawable = view.currentDrawable, let commandBuffer = commandQueue.makeCommandBuffer() else { return } let dSize = view.drawableSize let drawImage = self.ciImage let destination = CIRenderDestination(width: Int(dSize.width), height: Int(dSize.height), pixelFormat: view.colorPixelFormat, commandBuffer: commandBuffer, mtlTextureProvider: { () -> MTLTexture in return currentDrawable.texture }) _ = try? self.context.startTask(toClear: destination) _ = try? self.context.startTask(toRender: drawImage, from: drawImage.extent, to: destination, at: CGPoint(x: (dSize.width - drawImage.extent.width) / 2, y: 0)) commandBuffer.present(currentDrawable) commandBuffer.commit() } } } struct ShowCIImageView: View { let cii = CIImage.init(contentsOf: Bundle.main.url(forResource: "9-10", withExtension: "jpg")!)! var body: some View { CIImageDisplayView.init(ciImage: cii).frame(width: 500, height: 500).background(.red) } } struct ContentView: View { @State var showImage = false var body: some View { VStack { Image(systemName: "globe") .imageScale(.large) .foregroundStyle(.tint) Text("Hello, world!") ShowCIImageView() Button { showImage = true } label: { Text("showImage") } } .frame(width: 800, height: 800) .padding() .sheet(isPresented: $showImage) { ShowCIImageView() } } }
2
1
703
Oct ’24
MTKTextureLoader loading texture error on visionOS2.0
hello everyone. I got a texture loading error on visionOS 2.0: Can't create texture(Error Domain=MTKTextureLoaderErrorDomain Code=0 "Pixel format(MTLPixelFormatInvalid) is not valid on this device" UserInfo={NSLocalizedDescription=Pixel format(MTLPixelFormatInvalid) is not valid on this device, MTKTextureLoaderErrorKey=Pixel format(MTLPixelFormatInvalid) is not valid on this device} But this texture can load correctly on visionOS1.3. I don't know what happen between visionOS1.3 and visionOS2.0. The texture is a ktx file which stores cubemap that encoding in astc6x6hdr. And the ktx texture has a glInternalFormat info: GL_COMPRESSED_RGBA_ASTC_6x6. I wonder if visionOS2.0 no longer supports astc6x6hdr cubemap format, or there is something wrong with my assets.
1
0
563
Oct ’24
Why is the speed of metal shading kernel so slow?
Hi, I am recently writing metal shader language to parallelize the algorithms to accelerate the speed of it. I created a simple example to show the acceleration result of it. Since Rust is used in our algorithm, so I used metal-rs as the wrapper to execute the MSL kernels from rust side. In this example, I am calculating the result of two arrays, and kernel looks like: kernel void two_array_addition_2( constant uint* a [[buffer(0)]], constant uint* b [[buffer(1)]], device uint* c [[buffer(2)]], uint idx [[thread_position_in_grid]] ) { c[idx] = a[idx] + b[idx]; } in the main.rs, you can see a function called execute_kernel() , this function has all it needs to execute the kernel in MSL (such as commandEncoder, piplelineState, etc). use core::mem; use metal::{Buffer, MTLSize}; use objc::rc::autoreleasepool; use std::time::Instant; use two_array_addition::abstractions::state::MetalState; fn execute_kernel( name: &str, state: &MetalState, input_a: &Buffer, input_b: &Buffer, output_c: &Buffer, ) -> Vec<u32> { // assert!(input_a.len() == input_b.len() && input_a.len() == output_c.len()); // let len = input_a.len() as u64; let len = input_a.length() as u64 / mem::size_of::<u32>() as u64; // 1. Init the MetalState // - we inited it // 2. Set up Pipeline State let pipeline = state.setup_pipeline(name).unwrap(); // 3. Allocate the buffers for A, B, and C // - we allocated outside of this function let mut result: &[u32] = &[]; autoreleasepool(|| { // 4. Create the command buffer & command encoder let (command_buffer, command_encoder) = state.setup_command( &pipeline, Some(&[(0, input_a), (1, input_b), (2, output_c)]), ); // 5. command encoder dispatch the threadgroup size and num of threads per threadgroup let threadgroup_count = MTLSize::new((len + 256 - 1) / 256, 1, 1); let thread_per_threadgroup = MTLSize::new(256, 1, 1); // let grid_size = MTLSize::new(len, 1, 1); // let threadgroup_count = MTLSize::new(pipeline.max_total_threads_per_threadgroup(), 1, 1); command_encoder.dispatch_thread_groups(threadgroup_count, thread_per_threadgroup); command_encoder.end_encoding(); command_buffer.commit(); command_buffer.wait_until_completed(); // 6. Copy the result back to the host let start = Instant::now(); result = MetalState::retrieve_contents::<u32>(output_c); let duration = start.elapsed(); println!("Duration for copying result back to host: {:?}", duration); }); result.to_vec() } The performance of the result is kinda interesting to me. This is the result: $ cargo run -r This is expected to run for a while... please wait... Generating input arrays... Generating input arrays... Generating output array... Generating expected output... Duration for allocating buffers: 2.015258s Executing 1st kernel (1)... Duration for copying result back to host: 5.75µs Executing 1st kernel (2)... Duration for copying result back to host: 542ns Executing 2nd kernel (1)... Duration for copying result back to host: 1µs Executing 2nd kernel (2)... Duration for copying result back to host: 458ns Duration expected: 183.406167ms Duration for 1st kernel (1): 1.894994875s Duration for 1st kernel (2): 537.318208ms Duration for 2nd kernel (1): 501.33275ms Duration for 2nd kernel (2): 497.339916ms You have successfully run the kernels! The speed is slower when executing in the MSL kernel, while I reckon of the dataset is quite big ($2^{29}$) The first kernel execution takes more time to launch. Is there any way to optimize the MSL in this case? And in most case, when you design the algorithm into parallelism, what would be the concerns? The machine I am using is M1 Pro with 14-core GPU and 16 GB memory. Does anyone have idea / explanation for why these happen? Thank you
1
0
746
Sep ’24
Options to have MSAA in Tile-Based Deferred Renderer
Hi folks, I'm working on a Tile based Deferred renderer, similar to this Apple example. I'm wondering how to add MSAA to the renderer, and I see two choices: Copy the single-sampled texture at the end of the GBuffer/Lighting render pass to a multi-sampled texture and resolve from that Make all render targets (GBuffer) multi-sampled and deal with sampling/resolving all intermediate textures as well as the final, combined texture. Which is the proper approach, and are there any examples of how to implement it? Thanks!
0
0
686
Sep ’24
Metal addCompletedHandler causes crash with Swift 6 (iOS)
The following code runs fine when compiled with Swift 5, but crashes when compiled with Swift 6 (stack trace below). In the draw method, commenting out the addCompletedHandler line fixes the problem. I'm testing on iOS 18.0 and see the same behavior in both the simulator and on a device. What's going on here? import Metal import MetalKit import UIKit class ViewController: UIViewController { @IBOutlet var metalView: MTKView! private var commandQueue: MTLCommandQueue? override func viewDidLoad() { super.viewDidLoad() guard let device = MTLCreateSystemDefaultDevice() else { fatalError("expected a Metal device") } self.commandQueue = device.makeCommandQueue() metalView.device = device metalView.enableSetNeedsDisplay = true metalView.isPaused = true metalView.delegate = self } } extension ViewController: MTKViewDelegate { func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {} func draw(in view: MTKView) { guard let commandQueue, let commandBuffer = commandQueue.makeCommandBuffer() else { return } commandBuffer.addCompletedHandler { _ in } // works with Swift 5, crashes with Swift 6 commandBuffer.commit() } } Here's the stack trace: Thread 10 Queue : connection Queue (serial) #0 0x000000010581c3f8 in _dispatch_assert_queue_fail () #1 0x000000010581c384 in dispatch_assert_queue () #2 0x00000002444c63e0 in swift_task_isCurrentExecutorImpl () #3 0x0000000104d71ec4 in closure #1 in ViewController.draw(in:) () #4 0x0000000104d71f58 in thunk for @escaping @callee_guaranteed (@guaranteed MTLCommandBuffer) -> () () #5 0x0000000105ef1950 in __47-[CaptureMTLCommandBuffer _preCommitWithIndex:]_block_invoke_2 () #6 0x00000001c50b35b0 in -[MTLToolsCommandBuffer invokeCompletedHandlers] () #7 0x000000019e94d444 in MTLDispatchListApply () #8 0x000000019e94f558 in -[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:] () #9 0x000000019e95352c in -[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:] () #10 0x0000000226ef50b0 in handleMainConnectionReplies () #11 0x00000001800c9690 in _xpc_connection_call_event_handler () #12 0x00000001800cad90 in _xpc_connection_mach_event () #13 0x000000010581a86c in _dispatch_client_callout4 () #14 0x0000000105837950 in _dispatch_mach_msg_invoke () #15 0x0000000105822870 in _dispatch_lane_serial_drain () #16 0x0000000105838c10 in _dispatch_mach_invoke () #17 0x0000000105822870 in _dispatch_lane_serial_drain () #18 0x00000001058237b0 in _dispatch_lane_invoke () #19 0x00000001058301f0 in _dispatch_root_queue_drain_deferred_wlh () #20 0x000000010582f75c in _dispatch_workloop_worker_thread () #21 0x00000001050abb74 in _pthread_wqthread ()
3
1
1.1k
Sep ’24
Running 120Hz with low latency on M1 Max
I am trying to get a little game prototype up and running using Metal using the metal-cpp libraries where I run everything natively at 120Hz with a coupled renderer using Vsync turned on so that I have the absolute physically minimum input to photon latency possible. // Create the metal view SDL_MetalView metal_view = SDL_Metal_CreateView(window); CA::MetalLayer *swap_chain = (CA::MetalLayer *)SDL_Metal_GetLayer(metal_view); // Set up the Metal device MTL::Device *device = MTL::CreateSystemDefaultDevice(); swap_chain->setDevice(device); swap_chain->setPixelFormat(MTL::PixelFormat::PixelFormatBGRA8Unorm); swap_chain->setDisplaySyncEnabled(true); swap_chain->setMaximumDrawableCount(2); I am using SDL3 just for creating the window. Now when I go through my game / render loop - I stall for a long time on getting the next drawable which is understandable - my app runs in about 2-3ms. m_CurrentContext->m_Drawable = m_SwapChain->nextDrawable(); m_CurrentContext->m_CommandBuffer = m_CommandQueue->commandBuffer()->retain(); char frame_label[32]; snprintf(frame_label, sizeof(frame_label), "Frame %d", m_FrameIndex); m_CurrentContext->m_CommandBuffer->setLabel(NS::String::string(frame_label, NS::UTF8StringEncoding)); m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal] = MTL::RenderPassDescriptor::alloc()->init(); MTL::RenderPassColorAttachmentDescriptor* cd = m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal]->colorAttachments()->object(0); cd->setTexture(m_CurrentContext->m_Drawable->texture()); cd->setLoadAction(MTL::LoadActionClear); cd->setClearColor(MTL::ClearColor( 0.53f, 0.81f, 0.98f, 1.0f )); cd->setStoreAction(MTL::StoreActionStore); However my ProMotion display does not reliably run at 120Hz when fullscreen and using the direct to display system - it seems to run faster when windowed in composite which is the opposite of what I would expect. The Metal HUD says 120Hz, but the delay to getting the next drawable and looking at what Instruments is saying tells otherwise. When I profile it, the game loop has completed and is sitting there waiting for the next drawable, but the screen does not want to complete in 8.33ms, so the whole thing slows down for no discernible reason. Also as a game developer it is very strange for the command buffer to actually need the drawable texture free to be allowed to encode commands - usually the command buffers and swapping the front and back render buffers are not directly dependent on each other. Usually you only actually need the render buffer texture free when you want to draw to it. I could give myself another drawable, but because I am completing in less than 3ms, all it would do would be to add another frame of latency. I also looked at the FramePacing example and its behaviour is even worse at having high framerate with low latency - the direct to display is always rejected for some reason. Is this just a flaw in the Metal API? Or am I missing something important? I hope someone can help - the behaviour of the display is baffling.
7
0
1k
Sep ’24
Creating Metal Textures from kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarVideoRange ('&xv0') buffers
I'm testing on an iPhone 12 Pro, running iOS 17.5.1. Playing an HDR video with AVPlayer without explicitly specifying a pixel format (but specifying Metal Compatibility as below) gives buffers with the pixel format kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarVideoRange (&xv0). _videoOutput = [[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:@{ (NSString*)kCVPixelBufferMetalCompatibilityKey: @(YES) } I can't find an appropriate metal format to use for these buffers to access the data in a shader. Using MTLPixelFormatR16Unorm for the Y plane and MTLPixelFormatRG16Unorm for UV plane causes GPU command buffer aborts. My suspicion is that this compressed format isn't actually metal compatible due to the lack of padding bytes between pixels. Explicitly selecting kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange (which uses 16 bits per pixel) for the AVPlayerItemVideoOutput works, but I'd ideally like to use the compressed formats if possible for the bandwidth savings. With SDR video, the pixel format is the lossless 8-bit one, and there are no problems binding those buffers to metal textures. I'm just looking for confirmation there's currently no appropriate metal format for binding the packed 10-bit planes. And if that's the case, is it a bug that AVPlayerVideoOutput uses this format despite requesting Metal compatibility?
1
0
1.1k
Sep ’24
App using MetalKit creates many IOSurfaces in rapid succession, causing MTKView to freeze and app to hang
I've got an iOS app that is using MetalKit to display raw video frames coming in from a network source. I read the pixel data in the packets into a single MTLTexture rows at a time, which is drawn into an MTKView each time a frame has been completely sent over the network. The app works, but only for several seconds (a seemingly random duration), before the MTKView seemingly freezes (while packets are still being received). Watching the debugger while my app was running revealed that the freezing of the display happened when there was a large spike in memory. Seeing the memory profile in Instruments revealed that the spike was related to a rapid creation of many IOSurfaces and IOAccelerators. Profiling CPU Usage shows that CAMetalLayerPrivateNextDrawableLocked is what happens during this rapid creation of surfaces. What does this function do? Being a complete newbie to iOS programming as a whole, I wonder if this issue comes from a misuse of the MetalKit library. Below is the code that I'm using to render the video frames themselves: class MTKViewController: UIViewController, MTKViewDelegate { /// Metal texture to be drawn whenever the view controller is asked to render its view. private var metalView: MTKView! private var device = MTLCreateSystemDefaultDevice() private var commandQueue: MTLCommandQueue? private var renderPipelineState: MTLRenderPipelineState? private var texture: MTLTexture? private var networkListener: NetworkListener! private var textureGenerator: TextureGenerator! override public func loadView() { super.loadView() assert(device != nil, "Failed creating a default system Metal device. Please, make sure Metal is available on your hardware.") initializeMetalView() initializeRenderPipelineState() networkListener = NetworkListener() textureGenerator = TextureGenerator(width: streamWidth, height: streamHeight, bytesPerPixel: 4, rowsPerPacket: 8, device: device!) networkListener.start(port: NWEndpoint.Port(8080)) networkListener.dataRecievedCallback = { data in self.textureGenerator.process(data: data) } textureGenerator.onTextureBuiltCallback = { texture in self.texture = texture self.draw(in: self.metalView) } commandQueue = device?.makeCommandQueue() } public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) { /// need implement? } public func draw(in view: MTKView) { guard let texture = texture, let _ = device else { return } let commandBuffer = commandQueue!.makeCommandBuffer()! guard let currentRenderPassDescriptor = metalView.currentRenderPassDescriptor, let currentDrawable = metalView.currentDrawable, let renderPipelineState = renderPipelineState else { return } currentRenderPassDescriptor.renderTargetWidth = streamWidth currentRenderPassDescriptor.renderTargetHeight = streamHeight let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor)! encoder.pushDebugGroup("RenderFrame") encoder.setRenderPipelineState(renderPipelineState) encoder.setFragmentTexture(texture, index: 0) encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4, instanceCount: 1) encoder.popDebugGroup() encoder.endEncoding() commandBuffer.present(currentDrawable) commandBuffer.commit() } private func initializeMetalView() { metalView = MTKView(frame: CGRect(x: 0, y: 0, width: streamWidth, height: streamWidth), device: device) metalView.delegate = self metalView.framebufferOnly = true metalView.colorPixelFormat = .bgra8Unorm metalView.contentScaleFactor = UIScreen.main.scale metalView.autoresizingMask = [.flexibleWidth, .flexibleHeight] view.insertSubview(metalView, at: 0) } /// initializes render pipeline state with a default vertex function mapping texture to the view's frame and a simple fragment function returning texture pixel's value. private func initializeRenderPipelineState() { guard let device = device, let library = device.makeDefaultLibrary() else { return } let pipelineDescriptor = MTLRenderPipelineDescriptor() pipelineDescriptor.rasterSampleCount = 1 pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm pipelineDescriptor.depthAttachmentPixelFormat = .invalid /// Vertex function to map the texture to the view controller's view pipelineDescriptor.vertexFunction = library.makeFunction(name: "mapTexture") /// Fragment function to display texture's pixels in the area bounded by vertices of `mapTexture` shader pipelineDescriptor.fragmentFunction = library.makeFunction(name: "displayTexture") do { renderPipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor) } catch { assertionFailure("Failed creating a render state pipeline. Can't render the texture without one.") return } } } My question is simply: what gives?
1
0
1k
Sep ’24
Metal os_log not working
I wanted to try the new logging feature for Metal but could not get it to work. I modified the PerformingCalculationsOnAGPU example by adding os_log_default.log_debug("Hello thread: %d", index); to log the current thread id. But never saw any messages neither in the console nor in Xcode. I also added the -fmetal-enable-logging flag. I am running the Sequoia release candidate 15.0 (24A335) on M1 Max and Xcode 16.0 (16A242). What am I missing?
2
0
991
Sep ’24
Wrong hitTest results in iOS 17.2
We’re experiencing an issue with wrong SceneKit hit testing results in iOS 17.2 compared with iOS 16.1 when using the either Metal or OpenGLES2 engines. Tapping on a 3D model to place a SCNNode // pointInScene: tapped point let hitResults = sceneView.hitTest(pointInScene, options: nil) return hitResults.first { $0.node.name?.compare("node_name") == .orderedSame }
3
0
1.1k
Sep ’24
Disable Automatic Color Space conversion on Vision Pro Metal Shader
I am trying to convert a ThreeJS project to Metal for the Vision Pro. The issue is ThreeJS doesn't do any color space conversion (when I output a color in a fragment shader and then read it using the digital color meter in SRGB mode I get the same value I inputed in the fragment shader) This is not the case when using metal. When setting up my LayerRenderer I set the colorFormat to rgba16Unorm since it is the only non srgb color format supported on the vision pro apps. However switching between bgra8Unorm_srgb and rgba16Unorm seems to have no affect. when I set up the renderPassDescriptor I use the drawable colorTexture renderPassDescriptor.colorAttachments[0].texture = drawable.colorTextures[0] and when printing its pixel format it seems to be passed from the configuration. If there is anyway to disable this behavior or perform an inverse function of such that I get the original value out from the shader, that would be appreciated.
0
0
710
Aug ’24
CIImageProcessorKernel using Metal Compute Pipeline error
Greetings! I have been battling with a bit of a tough issue. My use case is running a pixelwise regression model on a 2D array of images using CIImageProcessorKernel and a custom Metal Shader. It mostly works great, but the issue that arises is that if the regression calculation in Metal takes too long, an error occurs and the resulting output texture has strange artifacts, for example: The specific error is: Error excuting command buffer = Error Domain=MTLCommandBufferErrorDomain Code=1 "Internal Error (0000000e:Internal Error)" UserInfo={NSLocalizedDescription=Internal Error (0000000e:Internal Error), NSUnderlyingError=0x60000320ca20 {Error Domain=IOGPUCommandQueueErrorDomain Code=14 "(null)"}} (com.apple.CoreImage) There are multiple levels of concurrency: Swift Concurrency calling the Core Image code (which shouldn't have an impact) and of course the Metal command buffer. Is there anyway to ensure the compute command encoder can complete its work? Here is the full implementation of my CIImageProcessorKernel subclass: class ParametricKernel: CIImageProcessorKernel { static let device = MTLCreateSystemDefaultDevice()! override class var outputFormat: CIFormat { return .BGRA8 } override class func formatForInput(at input: Int32) -> CIFormat { return .BGRA8 } override class func process(with inputs: [CIImageProcessorInput]?, arguments: [String : Any]?, output: CIImageProcessorOutput) throws { guard let commandBuffer = output.metalCommandBuffer, let images = arguments?["images"] as? [CGImage], let mask = arguments?["mask"] as? CGImage, let fillTime = arguments?["fillTime"] as? CGFloat, let betaLimit = arguments?["betaLimit"] as? CGFloat, let alphaLimit = arguments?["alphaLimit"] as? CGFloat, let errorScaling = arguments?["errorScaling"] as? CGFloat, let timing = arguments?["timing"], let TTRThreshold = arguments?["ttrthreshold"] as? CGFloat, let input = inputs?.first, let sourceTexture = input.metalTexture, let destinationTexture = output.metalTexture else { return } guard let kernelFunction = device.makeDefaultLibrary()?.makeFunction(name: "parametric") else { return } guard let commandEncoder = commandBuffer.makeComputeCommandEncoder() else { return } let imagesTexture = Texture.textureFromImages(images) let pipelineState = try device.makeComputePipelineState(function: kernelFunction) commandEncoder.setComputePipelineState(pipelineState) commandEncoder.setTexture(imagesTexture, index: 0) let maskTexture = Texture.textureFromImages([mask]) commandEncoder.setTexture(maskTexture, index: 1) commandEncoder.setTexture(destinationTexture, index: 2) var errorScalingFloat = Float(errorScaling) let errorBuffer = device.makeBuffer(bytes: &errorScalingFloat, length: MemoryLayout<Float>.size, options: []) commandEncoder.setBuffer(errorBuffer, offset: 0, index: 1) // Other buffers omitted.... let threadsPerThreadgroup = MTLSizeMake(16, 16, 1) let width = Int(ceil(Float(sourceTexture.width) / Float(threadsPerThreadgroup.width))) let height = Int(ceil(Float(sourceTexture.height) / Float(threadsPerThreadgroup.height))) let threadGroupCount = MTLSizeMake(width, height, 1) commandEncoder.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadsPerThreadgroup) commandEncoder.endEncoding() } }
3
0
991
Aug ’24
How many warps can be run in parallel on a single shader core?
The Metal feature set tables specifies that beginning with the Apple4 family, the "Maximum threads per threadgroup" is 1024. Given that a single threadgroup is guaranteed to be run on the same GPU shader core, it means that a shader core of any new Apple GPU must be capable of running at least 1024/32 = 32 warps in parallel. From the WWDC session "Scale compute workloads across Apple GPUs (6:17)": For relatively complex kernels, 1K to 2K concurrent threads per shader core is considered a very good occupancy. The cited sentence suggests that a single shader core is capable of running at least 2K (I assume this is meant to be 2048) threads in parallel, so 2048/32 = 64 warps running in parallel. However, I am curious what is the maximum theoretical amount of warps running in parallel on a single shader core (it sounds like it is more than 64). The WWDC session mentions 2K to be only "very good" occupancy. How many threads would be "the best possible" occupancy?
1
0
848
Aug ’24
crash log shows abort() called inside Metal driver code?
We have been having a mysterious crash in our media server app that I've never seen before. After fixing a number of other rare thread safety crashes relating to Metal buffers, this rare crash happens inside a Metal com.Metal.CompletionQueueDispatch? I have no clue what is happening here. It looks to me like Metal is specifically calling abort() for some reason. All of the other threads in the crash log appear to be in a normal state. Thread 70 Crashed:: updateAllMedia Dispatch queue: com.Metal.CompletionQueueDispatch 0 libsystem_kernel.dylib 0x1af572d38 __pthread_kill + 8 1 libsystem_pthread.dylib 0x1af5a7ee0 pthread_kill + 288 2 libsystem_c.dylib 0x1af4e2330 abort + 168 3 libc++abi.dylib 0x1af562b18 abort_message + 132 4 libc++abi.dylib 0x1af552a3c demangling_terminate_handler() + 312 5 libobjc.A.dylib 0x1af4481c8 _objc_terminate() + 160 6 libc++abi.dylib 0x1af561eb4 std::__terminate(void (*)()) + 20 7 libc++abi.dylib 0x1af561e50 std::terminate() + 64 8 libdispatch.dylib 0x1af3e4288 _dispatch_client_callout4 + 40 9 libdispatch.dylib 0x1af40053c _dispatch_mach_msg_invoke + 464 10 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376 11 libdispatch.dylib 0x1af40125c _dispatch_mach_invoke + 456 12 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376 13 libdispatch.dylib 0x1af3ec438 _dispatch_lane_invoke + 444 14 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376 15 libdispatch.dylib 0x1af3ec404 _dispatch_lane_invoke + 392 16 libdispatch.dylib 0x1af3f6c98 _dispatch_workloop_worker_thread + 648 17 libsystem_pthread.dylib 0x1af5a4360 _pthread_wqthread + 288 18 libsystem_pthread.dylib 0x1af5a3080 start_wqthread + 8 Note that the thread name "updateAllMedia" is a misnomer because this thread appears to be a general Metal dispatch queue. I wish there was a debugging option in Metal that called "setThreadName" to name its internal threads.
1
0
1.2k
Aug ’24
Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task
I have a test application that draws a large number of simple textured polygons (sprites). Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task. (In this case, nanosleep is used to adjust the amount of METAL commands per unit time so that they are approximately the same) This appears to be a drawing-related thread, but there is no overhead when displaySyncEnabled is TRUE. What are these differences? A specific application is the SDL test program, SDL/test/testsprite.c. https://github.com/libsdl-org/SDL/issues/10475
1
1
618
Aug ’24
Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task
I have a test application that draws a large number of simple textured polygons (sprites). Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task. In this case, nanosleep() is used to adjust the amount of METAL commands per unit time so that they are approximately the same. This appears to be a drawing-related thread, but there is no overhead when displaySyncEnabled is TRUE. What are these differences? A specific application is the SDL test program, SDL/test/testsprite.c. https://github.com/libsdl-org/SDL/issues/10475
1
0
472
Aug ’24
Cannot Display MTKView on a sheeted view on macOS15
I use xcode16 and swiftUI for programming on a macos15 system. There is a problem. When I render a picture through mtkview, it is normal when displayed on a regular view. However, when the view is displayed through the .sheet method, the image cannot be displayed. There is no error message from xcode. import Foundation import MetalKit import SwiftUI struct CIImageDisplayView: NSViewRepresentable { typealias NSViewType = MTKView var ciImage: CIImage init(ciImage: CIImage) { self.ciImage = ciImage } func makeNSView(context: Context) -&gt; MTKView { let view = MTKView() view.delegate = context.coordinator view.preferredFramesPerSecond = 60 view.enableSetNeedsDisplay = true view.isPaused = true view.framebufferOnly = false if let defaultDevice = MTLCreateSystemDefaultDevice() { view.device = defaultDevice } view.delegate = context.coordinator return view } func updateNSView(_ nsView: MTKView, context: Context) { } func makeCoordinator() -&gt; RawDisplayRender { RawDisplayRender(ciImage: self.ciImage) } class RawDisplayRender: NSObject, MTKViewDelegate { // MARK: Metal resources var device: MTLDevice! var commandQueue: MTLCommandQueue! // MARK: Core Image resources var context: CIContext! var ciImage: CIImage init(ciImage: CIImage) { self.ciImage = ciImage self.device = MTLCreateSystemDefaultDevice() self.commandQueue = self.device.makeCommandQueue() self.context = CIContext(mtlDevice: self.device) } func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {} func draw(in view: MTKView) { guard let currentDrawable = view.currentDrawable, let commandBuffer = commandQueue.makeCommandBuffer() else { return } let dSize = view.drawableSize let drawImage = self.ciImage let destination = CIRenderDestination(width: Int(dSize.width), height: Int(dSize.height), pixelFormat: view.colorPixelFormat, commandBuffer: commandBuffer, mtlTextureProvider: { () -&gt; MTLTexture in return currentDrawable.texture }) _ = try? self.context.startTask(toClear: destination) _ = try? self.context.startTask(toRender: drawImage, from: drawImage.extent, to: destination, at: CGPoint(x: (dSize.width - drawImage.extent.width) / 2, y: 0)) commandBuffer.present(currentDrawable) commandBuffer.commit() } } } struct ShowCIImageView: View { let cii = CIImage.init(contentsOf: Bundle.main.url(forResource: "9-10", withExtension: "jpg")!)! var body: some View { CIImageDisplayView.init(ciImage: cii).frame(width: 500, height: 500).background(.red) } } struct ContentView: View { @State var showImage = false var body: some View { VStack { Image(systemName: "globe") .imageScale(.large) .foregroundStyle(.tint) Text("Hello, world!") ShowCIImageView() Button { showImage = true } label: { Text("showImage") } } .frame(width: 800, height: 800) .padding() .sheet(isPresented: $showImage) { ShowCIImageView() } } }
Replies
2
Boosts
1
Views
703
Activity
Oct ’24
Drawing 3D lines with 2 Vertices
Guten Tag, my project is simple, first I want draw wired Hexa,-Tetra- and Octahedrons. I draw a cube with Metal but I didn't found rotation, translation and scale. I have searched help , the examples I found are too complicated for me. Mit freundlichen Grüßen VanceRegnet
Replies
2
Boosts
0
Views
920
Activity
Oct ’24
MTKTextureLoader loading texture error on visionOS2.0
hello everyone. I got a texture loading error on visionOS 2.0: Can't create texture(Error Domain=MTKTextureLoaderErrorDomain Code=0 "Pixel format(MTLPixelFormatInvalid) is not valid on this device" UserInfo={NSLocalizedDescription=Pixel format(MTLPixelFormatInvalid) is not valid on this device, MTKTextureLoaderErrorKey=Pixel format(MTLPixelFormatInvalid) is not valid on this device} But this texture can load correctly on visionOS1.3. I don't know what happen between visionOS1.3 and visionOS2.0. The texture is a ktx file which stores cubemap that encoding in astc6x6hdr. And the ktx texture has a glInternalFormat info: GL_COMPRESSED_RGBA_ASTC_6x6. I wonder if visionOS2.0 no longer supports astc6x6hdr cubemap format, or there is something wrong with my assets.
Replies
1
Boosts
0
Views
563
Activity
Oct ’24
Why is the speed of metal shading kernel so slow?
Hi, I am recently writing metal shader language to parallelize the algorithms to accelerate the speed of it. I created a simple example to show the acceleration result of it. Since Rust is used in our algorithm, so I used metal-rs as the wrapper to execute the MSL kernels from rust side. In this example, I am calculating the result of two arrays, and kernel looks like: kernel void two_array_addition_2( constant uint* a [[buffer(0)]], constant uint* b [[buffer(1)]], device uint* c [[buffer(2)]], uint idx [[thread_position_in_grid]] ) { c[idx] = a[idx] + b[idx]; } in the main.rs, you can see a function called execute_kernel() , this function has all it needs to execute the kernel in MSL (such as commandEncoder, piplelineState, etc). use core::mem; use metal::{Buffer, MTLSize}; use objc::rc::autoreleasepool; use std::time::Instant; use two_array_addition::abstractions::state::MetalState; fn execute_kernel( name: &str, state: &MetalState, input_a: &Buffer, input_b: &Buffer, output_c: &Buffer, ) -> Vec<u32> { // assert!(input_a.len() == input_b.len() && input_a.len() == output_c.len()); // let len = input_a.len() as u64; let len = input_a.length() as u64 / mem::size_of::<u32>() as u64; // 1. Init the MetalState // - we inited it // 2. Set up Pipeline State let pipeline = state.setup_pipeline(name).unwrap(); // 3. Allocate the buffers for A, B, and C // - we allocated outside of this function let mut result: &[u32] = &[]; autoreleasepool(|| { // 4. Create the command buffer & command encoder let (command_buffer, command_encoder) = state.setup_command( &pipeline, Some(&[(0, input_a), (1, input_b), (2, output_c)]), ); // 5. command encoder dispatch the threadgroup size and num of threads per threadgroup let threadgroup_count = MTLSize::new((len + 256 - 1) / 256, 1, 1); let thread_per_threadgroup = MTLSize::new(256, 1, 1); // let grid_size = MTLSize::new(len, 1, 1); // let threadgroup_count = MTLSize::new(pipeline.max_total_threads_per_threadgroup(), 1, 1); command_encoder.dispatch_thread_groups(threadgroup_count, thread_per_threadgroup); command_encoder.end_encoding(); command_buffer.commit(); command_buffer.wait_until_completed(); // 6. Copy the result back to the host let start = Instant::now(); result = MetalState::retrieve_contents::<u32>(output_c); let duration = start.elapsed(); println!("Duration for copying result back to host: {:?}", duration); }); result.to_vec() } The performance of the result is kinda interesting to me. This is the result: $ cargo run -r This is expected to run for a while... please wait... Generating input arrays... Generating input arrays... Generating output array... Generating expected output... Duration for allocating buffers: 2.015258s Executing 1st kernel (1)... Duration for copying result back to host: 5.75µs Executing 1st kernel (2)... Duration for copying result back to host: 542ns Executing 2nd kernel (1)... Duration for copying result back to host: 1µs Executing 2nd kernel (2)... Duration for copying result back to host: 458ns Duration expected: 183.406167ms Duration for 1st kernel (1): 1.894994875s Duration for 1st kernel (2): 537.318208ms Duration for 2nd kernel (1): 501.33275ms Duration for 2nd kernel (2): 497.339916ms You have successfully run the kernels! The speed is slower when executing in the MSL kernel, while I reckon of the dataset is quite big ($2^{29}$) The first kernel execution takes more time to launch. Is there any way to optimize the MSL in this case? And in most case, when you design the algorithm into parallelism, what would be the concerns? The machine I am using is M1 Pro with 14-core GPU and 16 GB memory. Does anyone have idea / explanation for why these happen? Thank you
Replies
1
Boosts
0
Views
746
Activity
Sep ’24
Options to have MSAA in Tile-Based Deferred Renderer
Hi folks, I'm working on a Tile based Deferred renderer, similar to this Apple example. I'm wondering how to add MSAA to the renderer, and I see two choices: Copy the single-sampled texture at the end of the GBuffer/Lighting render pass to a multi-sampled texture and resolve from that Make all render targets (GBuffer) multi-sampled and deal with sampling/resolving all intermediate textures as well as the final, combined texture. Which is the proper approach, and are there any examples of how to implement it? Thanks!
Replies
0
Boosts
0
Views
686
Activity
Sep ’24
Metal addCompletedHandler causes crash with Swift 6 (iOS)
The following code runs fine when compiled with Swift 5, but crashes when compiled with Swift 6 (stack trace below). In the draw method, commenting out the addCompletedHandler line fixes the problem. I'm testing on iOS 18.0 and see the same behavior in both the simulator and on a device. What's going on here? import Metal import MetalKit import UIKit class ViewController: UIViewController { @IBOutlet var metalView: MTKView! private var commandQueue: MTLCommandQueue? override func viewDidLoad() { super.viewDidLoad() guard let device = MTLCreateSystemDefaultDevice() else { fatalError("expected a Metal device") } self.commandQueue = device.makeCommandQueue() metalView.device = device metalView.enableSetNeedsDisplay = true metalView.isPaused = true metalView.delegate = self } } extension ViewController: MTKViewDelegate { func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {} func draw(in view: MTKView) { guard let commandQueue, let commandBuffer = commandQueue.makeCommandBuffer() else { return } commandBuffer.addCompletedHandler { _ in } // works with Swift 5, crashes with Swift 6 commandBuffer.commit() } } Here's the stack trace: Thread 10 Queue : connection Queue (serial) #0 0x000000010581c3f8 in _dispatch_assert_queue_fail () #1 0x000000010581c384 in dispatch_assert_queue () #2 0x00000002444c63e0 in swift_task_isCurrentExecutorImpl () #3 0x0000000104d71ec4 in closure #1 in ViewController.draw(in:) () #4 0x0000000104d71f58 in thunk for @escaping @callee_guaranteed (@guaranteed MTLCommandBuffer) -> () () #5 0x0000000105ef1950 in __47-[CaptureMTLCommandBuffer _preCommitWithIndex:]_block_invoke_2 () #6 0x00000001c50b35b0 in -[MTLToolsCommandBuffer invokeCompletedHandlers] () #7 0x000000019e94d444 in MTLDispatchListApply () #8 0x000000019e94f558 in -[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:] () #9 0x000000019e95352c in -[_MTLCommandQueue commandBufferDidComplete:startTime:completionTime:error:] () #10 0x0000000226ef50b0 in handleMainConnectionReplies () #11 0x00000001800c9690 in _xpc_connection_call_event_handler () #12 0x00000001800cad90 in _xpc_connection_mach_event () #13 0x000000010581a86c in _dispatch_client_callout4 () #14 0x0000000105837950 in _dispatch_mach_msg_invoke () #15 0x0000000105822870 in _dispatch_lane_serial_drain () #16 0x0000000105838c10 in _dispatch_mach_invoke () #17 0x0000000105822870 in _dispatch_lane_serial_drain () #18 0x00000001058237b0 in _dispatch_lane_invoke () #19 0x00000001058301f0 in _dispatch_root_queue_drain_deferred_wlh () #20 0x000000010582f75c in _dispatch_workloop_worker_thread () #21 0x00000001050abb74 in _pthread_wqthread ()
Replies
3
Boosts
1
Views
1.1k
Activity
Sep ’24
"Drawing fully immersive content using Metal" swift is not all in Swift code
Hi there, I'm trying to test the "Drawing fully immersive content using Metal" , but when I select Language: Swift, it still shows Objective C code in some sample codes. Please check and update the document Swift Code, thank you.
Replies
1
Boosts
0
Views
665
Activity
Sep ’24
Running 120Hz with low latency on M1 Max
I am trying to get a little game prototype up and running using Metal using the metal-cpp libraries where I run everything natively at 120Hz with a coupled renderer using Vsync turned on so that I have the absolute physically minimum input to photon latency possible. // Create the metal view SDL_MetalView metal_view = SDL_Metal_CreateView(window); CA::MetalLayer *swap_chain = (CA::MetalLayer *)SDL_Metal_GetLayer(metal_view); // Set up the Metal device MTL::Device *device = MTL::CreateSystemDefaultDevice(); swap_chain->setDevice(device); swap_chain->setPixelFormat(MTL::PixelFormat::PixelFormatBGRA8Unorm); swap_chain->setDisplaySyncEnabled(true); swap_chain->setMaximumDrawableCount(2); I am using SDL3 just for creating the window. Now when I go through my game / render loop - I stall for a long time on getting the next drawable which is understandable - my app runs in about 2-3ms. m_CurrentContext->m_Drawable = m_SwapChain->nextDrawable(); m_CurrentContext->m_CommandBuffer = m_CommandQueue->commandBuffer()->retain(); char frame_label[32]; snprintf(frame_label, sizeof(frame_label), "Frame %d", m_FrameIndex); m_CurrentContext->m_CommandBuffer->setLabel(NS::String::string(frame_label, NS::UTF8StringEncoding)); m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal] = MTL::RenderPassDescriptor::alloc()->init(); MTL::RenderPassColorAttachmentDescriptor* cd = m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal]->colorAttachments()->object(0); cd->setTexture(m_CurrentContext->m_Drawable->texture()); cd->setLoadAction(MTL::LoadActionClear); cd->setClearColor(MTL::ClearColor( 0.53f, 0.81f, 0.98f, 1.0f )); cd->setStoreAction(MTL::StoreActionStore); However my ProMotion display does not reliably run at 120Hz when fullscreen and using the direct to display system - it seems to run faster when windowed in composite which is the opposite of what I would expect. The Metal HUD says 120Hz, but the delay to getting the next drawable and looking at what Instruments is saying tells otherwise. When I profile it, the game loop has completed and is sitting there waiting for the next drawable, but the screen does not want to complete in 8.33ms, so the whole thing slows down for no discernible reason. Also as a game developer it is very strange for the command buffer to actually need the drawable texture free to be allowed to encode commands - usually the command buffers and swapping the front and back render buffers are not directly dependent on each other. Usually you only actually need the render buffer texture free when you want to draw to it. I could give myself another drawable, but because I am completing in less than 3ms, all it would do would be to add another frame of latency. I also looked at the FramePacing example and its behaviour is even worse at having high framerate with low latency - the direct to display is always rejected for some reason. Is this just a flaw in the Metal API? Or am I missing something important? I hope someone can help - the behaviour of the display is baffling.
Replies
7
Boosts
0
Views
1k
Activity
Sep ’24
Creating Metal Textures from kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarVideoRange ('&xv0') buffers
I'm testing on an iPhone 12 Pro, running iOS 17.5.1. Playing an HDR video with AVPlayer without explicitly specifying a pixel format (but specifying Metal Compatibility as below) gives buffers with the pixel format kCVPixelFormatType_Lossless_420YpCbCr10PackedBiPlanarVideoRange (&xv0). _videoOutput = [[AVPlayerItemVideoOutput alloc] initWithPixelBufferAttributes:@{ (NSString*)kCVPixelBufferMetalCompatibilityKey: @(YES) } I can't find an appropriate metal format to use for these buffers to access the data in a shader. Using MTLPixelFormatR16Unorm for the Y plane and MTLPixelFormatRG16Unorm for UV plane causes GPU command buffer aborts. My suspicion is that this compressed format isn't actually metal compatible due to the lack of padding bytes between pixels. Explicitly selecting kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange (which uses 16 bits per pixel) for the AVPlayerItemVideoOutput works, but I'd ideally like to use the compressed formats if possible for the bandwidth savings. With SDR video, the pixel format is the lossless 8-bit one, and there are no problems binding those buffers to metal textures. I'm just looking for confirmation there's currently no appropriate metal format for binding the packed 10-bit planes. And if that's the case, is it a bug that AVPlayerVideoOutput uses this format despite requesting Metal compatibility?
Replies
1
Boosts
0
Views
1.1k
Activity
Sep ’24
App using MetalKit creates many IOSurfaces in rapid succession, causing MTKView to freeze and app to hang
I've got an iOS app that is using MetalKit to display raw video frames coming in from a network source. I read the pixel data in the packets into a single MTLTexture rows at a time, which is drawn into an MTKView each time a frame has been completely sent over the network. The app works, but only for several seconds (a seemingly random duration), before the MTKView seemingly freezes (while packets are still being received). Watching the debugger while my app was running revealed that the freezing of the display happened when there was a large spike in memory. Seeing the memory profile in Instruments revealed that the spike was related to a rapid creation of many IOSurfaces and IOAccelerators. Profiling CPU Usage shows that CAMetalLayerPrivateNextDrawableLocked is what happens during this rapid creation of surfaces. What does this function do? Being a complete newbie to iOS programming as a whole, I wonder if this issue comes from a misuse of the MetalKit library. Below is the code that I'm using to render the video frames themselves: class MTKViewController: UIViewController, MTKViewDelegate { /// Metal texture to be drawn whenever the view controller is asked to render its view. private var metalView: MTKView! private var device = MTLCreateSystemDefaultDevice() private var commandQueue: MTLCommandQueue? private var renderPipelineState: MTLRenderPipelineState? private var texture: MTLTexture? private var networkListener: NetworkListener! private var textureGenerator: TextureGenerator! override public func loadView() { super.loadView() assert(device != nil, "Failed creating a default system Metal device. Please, make sure Metal is available on your hardware.") initializeMetalView() initializeRenderPipelineState() networkListener = NetworkListener() textureGenerator = TextureGenerator(width: streamWidth, height: streamHeight, bytesPerPixel: 4, rowsPerPacket: 8, device: device!) networkListener.start(port: NWEndpoint.Port(8080)) networkListener.dataRecievedCallback = { data in self.textureGenerator.process(data: data) } textureGenerator.onTextureBuiltCallback = { texture in self.texture = texture self.draw(in: self.metalView) } commandQueue = device?.makeCommandQueue() } public func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) { /// need implement? } public func draw(in view: MTKView) { guard let texture = texture, let _ = device else { return } let commandBuffer = commandQueue!.makeCommandBuffer()! guard let currentRenderPassDescriptor = metalView.currentRenderPassDescriptor, let currentDrawable = metalView.currentDrawable, let renderPipelineState = renderPipelineState else { return } currentRenderPassDescriptor.renderTargetWidth = streamWidth currentRenderPassDescriptor.renderTargetHeight = streamHeight let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor)! encoder.pushDebugGroup("RenderFrame") encoder.setRenderPipelineState(renderPipelineState) encoder.setFragmentTexture(texture, index: 0) encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4, instanceCount: 1) encoder.popDebugGroup() encoder.endEncoding() commandBuffer.present(currentDrawable) commandBuffer.commit() } private func initializeMetalView() { metalView = MTKView(frame: CGRect(x: 0, y: 0, width: streamWidth, height: streamWidth), device: device) metalView.delegate = self metalView.framebufferOnly = true metalView.colorPixelFormat = .bgra8Unorm metalView.contentScaleFactor = UIScreen.main.scale metalView.autoresizingMask = [.flexibleWidth, .flexibleHeight] view.insertSubview(metalView, at: 0) } /// initializes render pipeline state with a default vertex function mapping texture to the view's frame and a simple fragment function returning texture pixel's value. private func initializeRenderPipelineState() { guard let device = device, let library = device.makeDefaultLibrary() else { return } let pipelineDescriptor = MTLRenderPipelineDescriptor() pipelineDescriptor.rasterSampleCount = 1 pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm pipelineDescriptor.depthAttachmentPixelFormat = .invalid /// Vertex function to map the texture to the view controller's view pipelineDescriptor.vertexFunction = library.makeFunction(name: "mapTexture") /// Fragment function to display texture's pixels in the area bounded by vertices of `mapTexture` shader pipelineDescriptor.fragmentFunction = library.makeFunction(name: "displayTexture") do { renderPipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor) } catch { assertionFailure("Failed creating a render state pipeline. Can't render the texture without one.") return } } } My question is simply: what gives?
Replies
1
Boosts
0
Views
1k
Activity
Sep ’24
Metal os_log not working
I wanted to try the new logging feature for Metal but could not get it to work. I modified the PerformingCalculationsOnAGPU example by adding os_log_default.log_debug("Hello thread: %d", index); to log the current thread id. But never saw any messages neither in the console nor in Xcode. I also added the -fmetal-enable-logging flag. I am running the Sequoia release candidate 15.0 (24A335) on M1 Max and Xcode 16.0 (16A242). What am I missing?
Replies
2
Boosts
0
Views
991
Activity
Sep ’24
Wrong hitTest results in iOS 17.2
We’re experiencing an issue with wrong SceneKit hit testing results in iOS 17.2 compared with iOS 16.1 when using the either Metal or OpenGLES2 engines. Tapping on a 3D model to place a SCNNode // pointInScene: tapped point let hitResults = sceneView.hitTest(pointInScene, options: nil) return hitResults.first { $0.node.name?.compare("node_name") == .orderedSame }
Replies
3
Boosts
0
Views
1.1k
Activity
Sep ’24
Disable Automatic Color Space conversion on Vision Pro Metal Shader
I am trying to convert a ThreeJS project to Metal for the Vision Pro. The issue is ThreeJS doesn't do any color space conversion (when I output a color in a fragment shader and then read it using the digital color meter in SRGB mode I get the same value I inputed in the fragment shader) This is not the case when using metal. When setting up my LayerRenderer I set the colorFormat to rgba16Unorm since it is the only non srgb color format supported on the vision pro apps. However switching between bgra8Unorm_srgb and rgba16Unorm seems to have no affect. when I set up the renderPassDescriptor I use the drawable colorTexture renderPassDescriptor.colorAttachments[0].texture = drawable.colorTextures[0] and when printing its pixel format it seems to be passed from the configuration. If there is anyway to disable this behavior or perform an inverse function of such that I get the original value out from the shader, that would be appreciated.
Replies
0
Boosts
0
Views
710
Activity
Aug ’24
CIImageProcessorKernel using Metal Compute Pipeline error
Greetings! I have been battling with a bit of a tough issue. My use case is running a pixelwise regression model on a 2D array of images using CIImageProcessorKernel and a custom Metal Shader. It mostly works great, but the issue that arises is that if the regression calculation in Metal takes too long, an error occurs and the resulting output texture has strange artifacts, for example: The specific error is: Error excuting command buffer = Error Domain=MTLCommandBufferErrorDomain Code=1 "Internal Error (0000000e:Internal Error)" UserInfo={NSLocalizedDescription=Internal Error (0000000e:Internal Error), NSUnderlyingError=0x60000320ca20 {Error Domain=IOGPUCommandQueueErrorDomain Code=14 "(null)"}} (com.apple.CoreImage) There are multiple levels of concurrency: Swift Concurrency calling the Core Image code (which shouldn't have an impact) and of course the Metal command buffer. Is there anyway to ensure the compute command encoder can complete its work? Here is the full implementation of my CIImageProcessorKernel subclass: class ParametricKernel: CIImageProcessorKernel { static let device = MTLCreateSystemDefaultDevice()! override class var outputFormat: CIFormat { return .BGRA8 } override class func formatForInput(at input: Int32) -> CIFormat { return .BGRA8 } override class func process(with inputs: [CIImageProcessorInput]?, arguments: [String : Any]?, output: CIImageProcessorOutput) throws { guard let commandBuffer = output.metalCommandBuffer, let images = arguments?["images"] as? [CGImage], let mask = arguments?["mask"] as? CGImage, let fillTime = arguments?["fillTime"] as? CGFloat, let betaLimit = arguments?["betaLimit"] as? CGFloat, let alphaLimit = arguments?["alphaLimit"] as? CGFloat, let errorScaling = arguments?["errorScaling"] as? CGFloat, let timing = arguments?["timing"], let TTRThreshold = arguments?["ttrthreshold"] as? CGFloat, let input = inputs?.first, let sourceTexture = input.metalTexture, let destinationTexture = output.metalTexture else { return } guard let kernelFunction = device.makeDefaultLibrary()?.makeFunction(name: "parametric") else { return } guard let commandEncoder = commandBuffer.makeComputeCommandEncoder() else { return } let imagesTexture = Texture.textureFromImages(images) let pipelineState = try device.makeComputePipelineState(function: kernelFunction) commandEncoder.setComputePipelineState(pipelineState) commandEncoder.setTexture(imagesTexture, index: 0) let maskTexture = Texture.textureFromImages([mask]) commandEncoder.setTexture(maskTexture, index: 1) commandEncoder.setTexture(destinationTexture, index: 2) var errorScalingFloat = Float(errorScaling) let errorBuffer = device.makeBuffer(bytes: &errorScalingFloat, length: MemoryLayout<Float>.size, options: []) commandEncoder.setBuffer(errorBuffer, offset: 0, index: 1) // Other buffers omitted.... let threadsPerThreadgroup = MTLSizeMake(16, 16, 1) let width = Int(ceil(Float(sourceTexture.width) / Float(threadsPerThreadgroup.width))) let height = Int(ceil(Float(sourceTexture.height) / Float(threadsPerThreadgroup.height))) let threadGroupCount = MTLSizeMake(width, height, 1) commandEncoder.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadsPerThreadgroup) commandEncoder.endEncoding() } }
Replies
3
Boosts
0
Views
991
Activity
Aug ’24
How many warps can be run in parallel on a single shader core?
The Metal feature set tables specifies that beginning with the Apple4 family, the "Maximum threads per threadgroup" is 1024. Given that a single threadgroup is guaranteed to be run on the same GPU shader core, it means that a shader core of any new Apple GPU must be capable of running at least 1024/32 = 32 warps in parallel. From the WWDC session "Scale compute workloads across Apple GPUs (6:17)": For relatively complex kernels, 1K to 2K concurrent threads per shader core is considered a very good occupancy. The cited sentence suggests that a single shader core is capable of running at least 2K (I assume this is meant to be 2048) threads in parallel, so 2048/32 = 64 warps running in parallel. However, I am curious what is the maximum theoretical amount of warps running in parallel on a single shader core (it sounds like it is more than 64). The WWDC session mentions 2K to be only "very good" occupancy. How many threads would be "the best possible" occupancy?
Replies
1
Boosts
0
Views
848
Activity
Aug ’24
What does kIOGPUCommandBufferCallbackErrorSubmissionsIgnored mean?
Our app encountered the following error: Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
Replies
1
Boosts
0
Views
944
Activity
Aug ’24
crash log shows abort() called inside Metal driver code?
We have been having a mysterious crash in our media server app that I've never seen before. After fixing a number of other rare thread safety crashes relating to Metal buffers, this rare crash happens inside a Metal com.Metal.CompletionQueueDispatch? I have no clue what is happening here. It looks to me like Metal is specifically calling abort() for some reason. All of the other threads in the crash log appear to be in a normal state. Thread 70 Crashed:: updateAllMedia Dispatch queue: com.Metal.CompletionQueueDispatch 0 libsystem_kernel.dylib 0x1af572d38 __pthread_kill + 8 1 libsystem_pthread.dylib 0x1af5a7ee0 pthread_kill + 288 2 libsystem_c.dylib 0x1af4e2330 abort + 168 3 libc++abi.dylib 0x1af562b18 abort_message + 132 4 libc++abi.dylib 0x1af552a3c demangling_terminate_handler() + 312 5 libobjc.A.dylib 0x1af4481c8 _objc_terminate() + 160 6 libc++abi.dylib 0x1af561eb4 std::__terminate(void (*)()) + 20 7 libc++abi.dylib 0x1af561e50 std::terminate() + 64 8 libdispatch.dylib 0x1af3e4288 _dispatch_client_callout4 + 40 9 libdispatch.dylib 0x1af40053c _dispatch_mach_msg_invoke + 464 10 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376 11 libdispatch.dylib 0x1af40125c _dispatch_mach_invoke + 456 12 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376 13 libdispatch.dylib 0x1af3ec438 _dispatch_lane_invoke + 444 14 libdispatch.dylib 0x1af3eb784 _dispatch_lane_serial_drain + 376 15 libdispatch.dylib 0x1af3ec404 _dispatch_lane_invoke + 392 16 libdispatch.dylib 0x1af3f6c98 _dispatch_workloop_worker_thread + 648 17 libsystem_pthread.dylib 0x1af5a4360 _pthread_wqthread + 288 18 libsystem_pthread.dylib 0x1af5a3080 start_wqthread + 8 Note that the thread name "updateAllMedia" is a misnomer because this thread appears to be a general Metal dispatch queue. I wish there was a debugging option in Metal that called "setThreadName" to name its internal threads.
Replies
1
Boosts
0
Views
1.2k
Activity
Aug ’24
VisionOS 2 beta 5 ,unity textmesh shader errors
VisionOS 2 beta 5 ,unity text shader errors
Replies
0
Boosts
0
Views
386
Activity
Aug ’24
Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task
I have a test application that draws a large number of simple textured polygons (sprites). Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task. (In this case, nanosleep is used to adjust the amount of METAL commands per unit time so that they are approximately the same) This appears to be a drawing-related thread, but there is no overhead when displaySyncEnabled is TRUE. What are these differences? A specific application is the SDL test program, SDL/test/testsprite.c. https://github.com/libsdl-org/SDL/issues/10475
Replies
1
Boosts
1
Views
618
Activity
Aug ’24
Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task
I have a test application that draws a large number of simple textured polygons (sprites). Setting CAMetalLayer's displaySyncEnabled to FALSE will cause load on InterruptEventSourceBridge thread in kernel_task. In this case, nanosleep() is used to adjust the amount of METAL commands per unit time so that they are approximately the same. This appears to be a drawing-related thread, but there is no overhead when displaySyncEnabled is TRUE. What are these differences? A specific application is the SDL test program, SDL/test/testsprite.c. https://github.com/libsdl-org/SDL/issues/10475
Replies
1
Boosts
0
Views
472
Activity
Aug ’24