Hello all, I saw this interesting VisionOS app: https://apps.apple.com/us/app/splitscreen-multi-display/id6478007837
I was wondering if there was any documentation on the Swift APIs that were used to create this app.
Discuss spatial computing on Apple platforms and how to design and build an entirely new universe of apps and games for Apple Vision Pro.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I am trying the image tracking of ARKit on VisionPro, but there seems to be some problem when adding reference image.
Here is my code:
let images = ReferenceImage.loadReferenceImages(inGroupNamed: "photos")
print("Images: \(images)")
try await appState!.arkitSession.run([imageTracking])
It can successfully print those images, however sometimes it will print the error message like this:
ARImageTrackingRemoteService: Adding reference image <ARReferenceImage: 0x3032399e0 name="chair" physicalSize=(0.070, 0.093)> failed.
When this error message is printed, the corresponding image can not be tracked.
I do not understand why this will happen, because sometimes the image can be successfully added, but other time not, even for the same image. It makes my app not stable.
Besides, there are some other error messages, and I do not know whether it is related:
ARPredictorRemoteService <0x1042154a0>: Query queue is not running.
Execution of the command buffer was aborted due to an error during execution. Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted)
In the WWDC session titled "Deep dive into volumes and immersive spaces", the developers discussed adding a Spatial Tracking Session and an Anchor Entity to detect the floor. They then glossed over some important details. They added a spatial tap gesture to let the user place content relative to the floor anchor, but they left a lot of information.
.gesture(
SpatialTapGesture(
coordinateSpace: .immersiveSpace
)
.targetedToAnyEntity()
.onEnded { value in
handleTapOnFloor(value: value)
}
)
My understanding is that an entity has to have input and collision components for gestures like this to work. How can we add a collision to an AnchorEntity when we don't know its size or shape?
I've been trying for days to understand what is happening here and I just don't get it. It is even more frustrating that the example project that Apple released does not contain any of these features.
I would like to be able
Detect the floor plane
Get the position/transform of the floor plane
Add a collider to the floor plane
Enable collisions and physics on the floor plane
Enable gestures on the floor plane
It seems to me that the Anchor Entity is placed as an entirely arbitrary position. It has absolutely no relationship to the rectangle with the floor label that I can see in the Xcode visualization. It is just a point, not a plane or rect that I can use.
I've tried manually calculating the collision shape after the anchor is detected, but nothing that I have tried works. I can't tap on the floor with gestures. I can't drop entities onto the floor. I can't seem to do ANYTHING at all with this floor anchor other than place entity at the totally arbitrary location somewhere on the floor.
Is there anyway at all with Spatial Tracking Session and Anchor Entity to get the actual plane that was detected?
struct FloorExample: View {
@State var trackingSession: SpatialTrackingSession = SpatialTrackingSession()
@State var subject: Entity?
@State var floor: AnchorEntity?
var body: some View {
RealityView { content, attachments in
let session = SpatialTrackingSession()
let configuration = SpatialTrackingSession.Configuration(tracking: [.plane])
_ = await session.run(configuration)
self.trackingSession = session
let floorAnchor = AnchorEntity(.plane(.horizontal, classification: .floor, minimumBounds: SIMD2(x: 0.1, y: 0.1)))
floorAnchor.anchoring.physicsSimulation = .none
floorAnchor.name = "FloorAnchorEntity"
floorAnchor.components.set(InputTargetComponent())
floorAnchor.components.set(CollisionComponent(shapes: .init()))
content.add(floorAnchor)
self.floor = floorAnchor
// This is just here to let me see where visinoOS decided to "place" the floor anchor.
let floorPlaced = ModelEntity(
mesh: .generateSphere(radius: 0.1),
materials: [SimpleMaterial(color: .black, isMetallic: false)])
floorAnchor.addChild(floorPlaced)
if let scene = try? await Entity(named: "AnchorLabsFloor", in: realityKitContentBundle) {
content.add(scene)
if let subject = scene.findEntity(named: "StepSphereRed") {
self.subject = subject
}
// I can see when the anchor is added
_ = content.subscribe(to: SceneEvents.AnchoredStateChanged.self) { event in
event.anchor.generateCollisionShapes(recursive: true) // this doesn't seem to work
print("**anchor changed** \(event)")
print("**anchor** \(event.anchor)")
}
// place the reset button near the user
if let panel = attachments.entity(for: "Panel") {
panel.position = [0, 1, -0.5]
content.add(panel)
}
}
} update: { content, attachments in
} attachments: {
Attachment(id: "Panel", {
Button(action: {
print("**button pressed**")
if let subject = self.subject {
subject.position = [-0.5, 1.5, -1.5]
// Remove the physics body and assign a new one - hack to remove momentum
if let physics = subject.components[PhysicsBodyComponent.self] {
subject.components.remove(PhysicsBodyComponent.self)
subject.components.set(physics)
}
}
}, label: {
Text("Reset Sphere")
})
})
}
}
}
I need to loop my videoMaterial and I don't know how to make it happen in my code.
I have included an image of my videoMaterial code.
Any help making this happen with be greatly appreciated.
Thank you,
Christopher
Topic:
Spatial Computing
SubTopic:
ARKit
Since only the user can take a screenshot using the Apple Vision Pro's top buttons, the only workaround available to an immersive app that needs a screenshot to document the user's creative interior design choices is
ask the user to take a screenshot
wait until the user taps a button indicating the screenshot has been taken
then the app asks the user to select the screenshot when the app opens the PhotoPicker
when the user presses Done, the screenshot is handed off to the app.
One wonders why there is no Apple Api for doing this in a simple privacy protective way such as:
When called, the Apple api captures the screenshot in Apple secured memory
The api displays the screenshot to the user with appropriate privacy warnings and asks if the user wants to
a. share this screenshot with the app, or
b. cancel,
c. retake the screenshot
If the user approves, the app receives the screenshot
Hello Community,
I’m currently working with the sample code “CapturingDepthUsingTheLiDARCamera” and using it to capture the depth map of an image taken with the iPhone 14 Pro.
From this depth map, I generate a point cloud using the intrinsic camera parameters.
I've noticed that objects not facing the camera directly appear distorted in the resulting point cloud.
For example: An object with surfaces that are perpendicular to each other appears with a sharper angle in the point cloud — around 60° instead of 90°.
My question is:
Is this due to the general accuracy limitations of the LiDAR sensor? Or could it be related to the sample code?
To obtain the depth map, I’m using:
AVCapturePhoto.depthData.converting(toDepthDataType: kCVPixelFormatType_DepthFloat32)
Thanks in advance for your help!
I want to step into portal world. I've know PortalCrossingComponent can make an entity to cross portal, but how to make device cross into portal world?
Hey, I have Enterprise Access on the account and have added the passthrough capability and the entitlement on the main project and the "Broadcast Upload" extension, too.
The broadcast works except it returns a black screen.
I am attaching some screenshots below of the entitlement file. I have tried searching online to no avail, so any help would be greatly appreciated. I am also attaching the code.
import Foundation
import AVFoundation
import ReplayKit
class VideoAssetWriter {
private var isRecording = false
private var outputStream: OutputStream?
private func setupConnection() {
guard outputStream == nil else { return }
print("setting up connection.")
let serverIP = macIP
let port = 12345
var readStream: Unmanaged<CFReadStream>?
var writeStream: Unmanaged<CFWriteStream>?
CFStreamCreatePairWithSocketToHost(kCFAllocatorDefault,
serverIP as CFString,
UInt32(port),
&readStream,
&writeStream)
guard let writeStream = writeStream?.takeRetainedValue() else {
print("Failed to create write stream")
return
}
self.outputStream = writeStream as OutputStream
self.outputStream?.open()
}
func startRecording() {
isRecording = true
}
func processVideoSampleBuffer(_ sampleBuffer: CMSampleBuffer) {
print("Processing Sample 1")
guard isRecording else { return }
print("Processing Sample 2")
sendVideoChunkToServer(sampleBuffer)
}
private func sendVideoChunkToServer(_ sampleBuffer: CMSampleBuffer) {
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
print("Processing Sample 3")
let ciImage = CIImage(cvPixelBuffer: imageBuffer)
let context = CIContext()
guard let cgImage = context.createCGImage(ciImage, from: ciImage.extent) else { return }
print("Processing Sample 4")
let image = UIImage(cgImage: cgImage)
if let imageData = image.jpegData(compressionQuality: 0.5) {
guard imageData.count <= 10_000_000 else {
print("Frame too large: \(imageData.count) bytes")
return
}
if outputStream == nil {
setupConnection()
}
print("sending frame size up connection.")
// Convert to network byte order (big-endian)
var frameSize = UInt32(imageData.count).bigEndian
let sizeData = Data(bytes: &frameSize, count: MemoryLayout<UInt32>.size)
_ = sizeData.withUnsafeBytes { outputStream?.write($0.baseAddress!.assumingMemoryBound(to: UInt8.self), maxLength: sizeData.count) }
print("sending image data up connection.")
// Send frame data
_ = imageData.withUnsafeBytes { outputStream?.write($0.baseAddress!.assumingMemoryBound(to: UInt8.self), maxLength: imageData.count) }
}
}
func stopRecording() {
isRecording = false
outputStream?.close()
outputStream = nil
}
}
This is the broadcast picker view wrapper:
// Broadcast Picker View wrapper
struct BroadcastButtonView: UIViewRepresentable {
func makeUIView(context: Context) -> RPSystemBroadcastPickerView {
let broadcastPickerView = RPSystemBroadcastPickerView(
frame: CGRect(x: 0, y: 0, width: 200, height: 200)
)
// Make sure this matches your broadcast extension bundle identifier
broadcastPickerView.preferredExtension = "my-extension-bundle-identifier"
broadcastPickerView.showsMicrophoneButton = false
return broadcastPickerView
}
func updateUIView(_ uiView: RPSystemBroadcastPickerView, context: Context) {
}
}
The extension SampleHandler:
override func broadcastPaused() {
print("paused broadcast")
// User has requested to pause the broadcast. Samples will stop being delivered.
}
override func broadcastResumed() {
print("resumed broadcast")
// User has requested to resume the broadcast. Samples delivery will resume.
}
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
print("broadcast received")
assetWriter?.processVideoSampleBuffer(sampleBuffer)
}
Looking forward to any and all help.
Information Property list:
Information property list for the extension:
The capabilities:
I noticed in the latest macOS beta 3 that there was this update:
A new algorithm significantly improves PhotogrammetrySession reconstruction quality of low-texture objects not captured with the ObjectCaptureSession front end. It will be downloaded and cached once in the background when the PhotogrammetrySession is used at runtime. If network isn’t available at that time, the old low quality model will be used until the new one can be downloaded. There is no code change needed to get this improved model. (145220451)
I am not noticing any difference to before with the reconstructions I tested so I am assuming it's reverting to the old model but in the logs there is no way to see if it succeeds or fails to download that new model.
do you have any more information on what was improved here with some examples and what we should be looking for? also how can confirm the download of that new model has not failed?
Hi Apple Team and Developers,
First of all, I’d like to express my appreciation for the incredible results achieved using PhotogrammetrySession. I’ve been developing a portrait scanning app using Object Capture, and in many tests—especially with human models—I’ve found the reconstructed body surfaces are remarkably smooth and clean, often outperforming tools like Metashape and RealityCapture in terms of aesthetic results.
However, I’ve encountered some challenges when working with complex areas like long hair overlapping the face. For instance, with female models where strands of hair partially occlude the face, the resulting mesh tends to merge the hair and facial geometry. This leads to distorted or “melted” facial features, likely due to ambiguity in the geometry estimation phase.
Feature Suggestion:
Would it be possible to allow developers to supply two versions of the input images:
• One version (original) for texture generation
• A pre-processed version (e.g., contrast-enhanced or CLAHE filtered) to guide mesh reconstruction only
This would give us the flexibility to enhance edge features or shadow detail without affecting the final texture appearance. In other photogrammetry pipelines, applying image enhancement selectively before dense reconstruction improves geometry quality in low-contrast areas.
Question:
Is there any plan to support this kind of two-path workflow in future versions of PhotogrammetrySession? Or perhaps expose more intermediate stages or tunable parameters to developers?
Also, any hints on what we can expect from WWDC 2025 regarding improvements to Object Capture or related vision/3D technologies?
Thanks again for this powerful API. Looking forward to hearing insights from the team and other developers.
Warm regards,
KitCheng
I'm developing an AR application for the iPad pro where the primary purpose is to overlay 3D design data on top of production parts. For alignment, we are using Vuforia (model targets) which work really well locally. The further the device is moved from the point of original alignment, we are seeing quite a bit of overlay error (drift?).
My primary questions are:
Are there any best practices to stabilize frame-to-frame tracking when using model targets? We are noticing drift as soon as the device starts moving (the drift appears to occur specifically in the direction the device is moving). After about 15 feet of movement, we are observing about 3-6" of overlay error
These use cases can be over 100 feet long. In order to reset drift, we understand we'll need multiple alignment points (model targets) along the way. Is there a standard/best practice for this? Ex: have a new alignment point every x-feet?
We are using plane anchors to set our alignment. Typically we attach it to the nearest plane; however, the anchor point can be very far away (the origin of the model, which often is not near where the virtual content is). Could this be the issue? The anchor is far from the plane that we attach it too. Would moving the anchor closer to the plane we attach it too improve stability? After a few steps, the plane we originally attach too will be out of FoV anyway.
Thanks in advance!
Topic:
Spatial Computing
SubTopic:
ARKit
According to the official documentation, the .blur(radius:) modifier could apply gaussian blur to a realityview. However, when applied directly to a RealityView, nothing inside it (neither 2D attachments nor 3D entities) appears to be blurred.
Here’s the test code:
struct ContentView: View {
var body: some View {
VStack(spacing: 20) {
Text("Above the RealityView")
.font(.title)
RealityView { content, attachments in
if let text = attachments.entity(for: "2dView") {
text.position.y = 0.1
content.add(text)
}
let box = ModelEntity(
mesh: .generateBox(size: 0.1),
materials: [SimpleMaterial(color: .red, isMetallic: true)]
)
content.add(box)
} attachments: {
Attachment(id: "2dView") {
Text("Above the Box")
.font(.title)
}
}
.frame(width: 300, height: 300)
.border(.blue)
.blur(radius: 99) // Has no visual effect
Text("Below the RealityView")
.font(.subheadline)
}
.padding()
}
}
My question:
How can I make .blur(radius:) visually affect the content rendered in a RealityView?
Can you provide a working example that .blur() to visually affect any part of a RealityView?
Thanks!
In my Reality Composer Pro workflow for Vision Pro development, I’m using xcrun realitytool image to pre-compress textures into .ktx format, typically using ASTC block compression. These textures are used for cubemaps and environment assets.
I’ve noticed that regardless of the image content—whether it’s a highly detailed photo or a completely black image—once compressed with the same ASTC block size (e.g., ASTC_8x8), the resulting .ktx file size is nearly identical. There appears to be no content-aware logic that adapts the compression ratio to the actual texture complexity.
In contrast, Unreal Engine behaves differently: even when all cubemap faces are imported at the same resolution as DDS textures, the engine performs content-aware compression during packaging:
Low-complexity images are compressed more aggressively
The final packaged file size varies based on content complexity
Since Reality Composer Pro requires textures to be pre-compressed as .ktx, there’s no opportunity for runtime optimization or per-image compression adjustment.
Just wondering: is there any recommended way to implement content-aware compression for .ktx textures in Reality Composer Pro?
Or any best practices to optimize .ktx sizes based on image complexity?
Thanks!
Here is the sample project from apple of Object Tracking.
https://developer.apple.com/documentation/visionOS/exploring_object_tracking_with_arkit
can we improve it tracking accuracy and tracking when object is moving little faster, so the 3d object that draw still follow it and make it more accurate.
help me please i cant remove little messed up pixels and they are getting more and more please help me!!
2025 Macbook pro no protection screen actual screen
In the DestinationVideo demo, the onAppear in UpNextView is triggered again when it is closed, but I only want it to be triggered once. How can I achieve this?
Alternatively, I would like to capture the button click events in the player menu, as shown in the screenshot below.
In my Reality Composer Pro workflow for Vision Pro development, I’m using xcrun realitytool image to pre-compress textures into .ktx format, typically using ASTC block compression. These textures are used for cubemaps and environment assets.
I’ve noticed that regardless of the image content—whether it’s a highly detailed photo or a completely black image—once compressed with the same ASTC block size (e.g., ASTC_8x8), the resulting .ktx file size is nearly identical. There appears to be no content-aware logic that adapts the compression ratio to the actual texture complexity.
In contrast, Unreal Engine behaves differently: even when all cubemap faces are imported at the same resolution as DDS textures, the engine performs content-aware compression during packaging:
Low-complexity images are compressed more aggressively
The final packaged file size varies based on content complexity
Since Reality Composer Pro requires textures to be pre-compressed as .ktx, there’s no opportunity for runtime optimization or per-image compression adjustment.
Just wondering: is there any recommended way to implement content-aware compression for .ktx textures in Reality Composer Pro?
Or any best practices to optimize .ktx sizes based on image complexity?
Thanks!
Seeing this magical sand table, the unfolding and folding effects are similar to spreading out cards, which is very interesting. But I don't know how to achieve it. I want to see if there are any ways to achieve this effect and give some ideas. May I ask if this effect can be achieved under the existing API
I'm working on an iOS app using ARKit and RealityKit where I scan QR codes and want to place 3D models at the exact position of the QR code in the real world.
Is it possible to accurately place a 3D model at the exact position of a QR code in AR using ARKit and RealityKit? Specifically, I want the model to appear at the precise location where the QR code is detected, rather than just somewhere in the AR space.
If this is possible, could you point me in the right direction or recommend the best approach to achieve this?
Thank you for your help!
Hi there,
I was looking to add a particle emitter to my augmented reality app I'm developing using RealityKit. I'm targeting iOS. I noticed in the documentation for the ParticleEmitterComponent that it looks like iOS 18.0+ is supported, but when I try to use the ParticleEmitterComponent in my code in XCode, I get an error that it isn't found. Furthermore, this StackOverflow post seems to indicate that particle systems are not available for iOS. Would it be possible to get clarification on this?