Hi,
I'm encountering an issue in our app that uses RoomPlan and ARsession for scanning.
After prolonged use—especially under heavy load from both the scanning process and other unrelated app operations—the iPhone becomes very hot, and the following warning begins to appear more frequently:
"ARSession <0x107559680>: The delegate of ARSession is retaining 11 ARFrames. The camera will stop delivering camera images if the delegate keeps holding on to too many ARFrames. This could be a threading or memory management issue in the delegate and should be fixed."
I was able to reproduce this behavior using Apple’s RoomPlanExampleApp, with only one change: I introduced a CPU-intensive workload at the end of the startSession() function:
DispatchQueue.global().asyncAfter(deadline: .now() + 5) {
for i in 0..<4 {
var value = 10_000
DispatchQueue.global().async {
while true {
value *= 10_000
value /= 10_000
value ^= 10_000
value = 10_000
}
}
}
}
I suspect this is some RoomPlan API problem that's why a filed an feedback: 17441091
Discuss spatial computing on Apple platforms and how to design and build an entirely new universe of apps and games for Apple Vision Pro.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
In my Reality Composer Pro workflow for Vision Pro development, I’m using xcrun realitytool image to pre-compress textures into .ktx format, typically using ASTC block compression. These textures are used for cubemaps and environment assets.
I’ve noticed that regardless of the image content—whether it’s a highly detailed photo or a completely black image—once compressed with the same ASTC block size (e.g., ASTC_8x8), the resulting .ktx file size is nearly identical. There appears to be no content-aware logic that adapts the compression ratio to the actual texture complexity.
In contrast, Unreal Engine behaves differently: even when all cubemap faces are imported at the same resolution as DDS textures, the engine performs content-aware compression during packaging:
Low-complexity images are compressed more aggressively
The final packaged file size varies based on content complexity
Since Reality Composer Pro requires textures to be pre-compressed as .ktx, there’s no opportunity for runtime optimization or per-image compression adjustment.
Just wondering: is there any recommended way to implement content-aware compression for .ktx textures in Reality Composer Pro?
Or any best practices to optimize .ktx sizes based on image complexity?
Thanks!
Can we constrain or clamp translation with the new ManipulationComponent? For example, allow free movement within certain bounds.
I am running a Spatial Rendering App template demo, it shows “No People Found ” “There is no one nearby to share with”.
How can I stream videos rendered by Mac to my vision pro
I am using macOS 26.0, visionOS 26, Xcode 26
Topic:
Spatial Computing
SubTopic:
General
I have tested the MagnifyGesture code below on multiple devices:
Vision Pro - working
iPhone - working
iPad - working
macOS - not working
In Reality Composer Pro, I have also added the below components to the test model entity:
Input Target
Collision
For macOS, I tried the touchpad pinch gesture and mouse scroll wheel, but neither approach works. How to resolve this issue? Thank you.
import SwiftUI
import RealityKit
import RealityKitContent
struct ContentView: View {
var body: some View {
RealityView { content in
// Add the initial RealityKit content
if let immersiveContentEntity = try? await Entity(named: "Immersive", in: realityKitContentBundle) {
content.add(immersiveContentEntity)
}
}
.gesture(MagnifyGesture()
.targetedToAnyEntity()
.onChanged(onMagnifyChanged)
.onEnded(onMagnifyEnded))
}
func onMagnifyChanged(_ value: EntityTargetValue<MagnifyGesture.Value>) {
print("onMagnifyChanged")
}
func onMagnifyEnded(_ value: EntityTargetValue<MagnifyGesture.Value>) {
print("onMagnifyEnded")
}
}
Topic:
Spatial Computing
SubTopic:
General
We have successfully obtained the permissions for "Main Camera access" and "Passthrough in screen capture" from Apple. Currently, the video streams we have received are from the physical world and do not include the digital world. How can we obtain video streams from both the physical and digital worlds?
thank you!
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
Tags:
Enterprise
Swift
Reality Composer Pro
visionOS
Hi Apple Team,
I’m working on a human portrait scanning application using PhotogrammetrySession, and I’ve been very impressed by the results. Thank you for building such a powerful and accessible photogrammetry solution into macOS!
I do, however, have a question regarding mesh detail limitations on different Mac hardware configurations.
When using PhotogrammetrySession.Request.Detail.custom and trying to set maximumPolygonCount = 1000000, I see the following log message:
Clamped max poly count: 1000000 to device limit. 250000 is used.
This is on an M1 Max with 32 GB RAM.
I’m aware that PhotogrammetrySession.limits can report values like maximumInputImageDimension and maximumNumberOfInputImages, but I haven’t found documentation on how the maximumPolygonCount is determined, and what hardware specs influence it.
Is it tied more to:
• GPU performance (e.g. neural/graphics cores)?
• CPU architecture?
• Memory size or bandwidth?
• Or is it fixed per SoC generation?
I’d love to understand what kind of hardware upgrades (e.g. moving to M4 Pro or increasing RAM) could allow me to increase mesh complexity and generate more detailed models.
Any insights would be greatly appreciated—and if this is covered in upcoming WWDC sessions or documentation, I’d be happy to tune in.
Thanks in advance!
KitCheng
I am allowing users to go through and capture different rooms, and add a custom label to that room. Is there a way to store data about this in the captured room so that it persists into the final merge? As it is now, My users mark all their merges with custom labels, but after merging there is no way to remember which room is which in the merging process so they have to go through and manually add the labels back. For larger floor plans this is not ideal.
I am trying to loop my videoMaterial. I have researched the AXQueuePlayer and AVPlayerLooper and tried to implement them into my code.
Please see attached.
There are no errors showing up but the videoMaterial is no longer working.
Please see the attached for the working code that plays the videoMaterial.
I am stumped can anyone help me solve this?
Thank you.
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
How can I create 180-degree apple immersive videos using game engine
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
When scanning multiple rooms (10+) in a single structure using ARWorldMap for coordinate space consistency, RoomCaptureSession throws CaptureError.exceedSceneSizeLimit. The instructions here (https://developer.apple.com/documentation/roomplan/scanning-the-rooms-of-a-single-structure) provide exactly what I am doing to keep the underlying ARSession alive (by calling captureSession.stop(pause: false)) and save the results before a user moves to the next room. Scanning 11 or so rooms will cause the user to hit the exceedSceneSizeLimit error. The ARWorldMap is about 58 MB and always is around this size when hitting this issue. No anchors are present and all the data seems to be from tracking data.
On iPad devices (where I do not see this issue) the ARWorldMap grows as a significantly slower rate in size.
I save the ARWorldMap after each room is scanned and confirmed by the user. If I use the ARMap to initialize the ARSession (as described in the docs) the session will immediately error with "exceedSceneSizeLimit" once the captureSession.run() is executed. Occasionally it will allow me/the user to scan again, but either breaks mid scan or the following.
This has been working fine for the past 2 years and users have been able to scan dozens of rooms without issue. It seems only lately that it has been a problem.
I would expect the ARWorldMap to be allowed for much bigger sizes. At this point I can just about scan more area of my house with a single scan than I can when I use different captureSessions.
Few observations:
This happens on my iPhone 15 Pro Max, my iPhone 17 Pro, but not my iPad M4 (maybe memory related?). It is possible if scanning many more rooms it would happen on the iPad too.
I have tried things such as resetting the ARConfig on the underlying ARSession to reset some, but this doesn't work.
I have tried to create a new ARWorldMap and move the origin to the older map to clear out tracking data. This almost works but causes a mess of issues when a user moves at all due to the unshared coordinate space.
I believe there are three active issues regarding this: FB14454922, FB15035788, FB20642944
Could we get an update for this issue? It is a production issue and severely limits my user experience in my production application.
Hi Apple Developer Community,
I'm developing an eye-tracking application using ARKit's ARFaceTrackingConfiguration and ARFaceAnchor.blendShapes for gaze detection using Xcode. I'm experiencing several calibration and accuracy issues and would appreciate insights from the community.
Current Implementation
Using ARFaceAnchor.blendShapes (.eyeLookUpLeft, .eyeLookDownLeft, .eyeLookInLeft, .eyeLookOutLeft, etc.)
Implementing custom sensitivity curves and smoothing algorithms
Applying baseline correction and coordinate mapping
Using quadratic regression for calibration point mapping
Issues I'm Facing
1. Calibration Mismatch
Red dot position doesn't align with where I'm actually looking
Significant offset between intended gaze point and actual cursor position
Calibration seems to drift or become inaccurate over time
2. Extreme Eye Movement Requirements
Need to make exaggerated eye movements to reach screen edges/corners
Natural eye movements don't translate to proportional cursor movement
Difficulty reaching certain screen regions even with calibration
3. Sensitivity and Stability Issues
Cursor jitters or jumps around when looking at center
Too much sensitivity to micro-movements
Inconsistent behavior between calibration and normal operation
4. I also noticed that tracking on calibration screen as well as tracking on reading screen works better as expected when head movement is there, but I do not want much head movement. I want tracking with normal eye movement while reading an Ebook.
Primary Question: Word-Level Eye Tracking Feasibility
Is word-level eye tracking (tracking gaze as users read through individual words in an ebook) technically feasible with current iPhone/iPad hardware?
I understand that Apple's built-in eye tracking is primarily an accessibility feature for UI navigation. However, I'm wondering if the TrueDepth camera and ARKit's eye tracking capabilities are sufficient for:
Tracking natural reading patterns (left-to-right, line-by-line progression)
Detecting which specific words a user is looking at
Maintaining accuracy for sustained reading sessions (15-30 minutes)
Working reliably across different users and lighting conditions
Questions for the Community
Hardware Limitations: Are iPhone/iPad TrueDepth cameras capable of the precision needed for word-level tracking, or is this beyond current hardware capabilities?
Calibration Best Practices: What calibration strategies have worked best for accurate gaze mapping? How many calibration points are typically needed?
Reading-Specific Challenges: Are there particular challenges when tracking reading behavior vs. general gaze tracking?
Alternative Approaches: Are there better approaches than ARKit blend shapes for this use case?
Current Setup
Devices: iPhone 14 Pro
iOS Version: iOS 18.3
ARKit Version: Latest available
Any insights, experiences, or technical guidance would be greatly appreciated. I'm particularly interested in hearing from developers who have worked on similar eye tracking applications or have experience with the limitations and capabilities of ARKit's eye tracking features.
Thank you for your time and expertise!
I'm working on a project that uses imageTrackingProvider through ARKit on VisionPro, and I want to detect multiple images(about 5) and show info at the same time.
However, I found that it seems only 1 image could be detected by device at one time.
And the api of maximumNumberOfTrackedImages doing this seems not available for visionOS but only iOS.
Anyone knows possible ways to detect multiple images at the same time on VisionPro?
Topic:
Spatial Computing
SubTopic:
ARKit
具体表现为:在Unity编辑器中材质显示正常,但部署到Vision Pro真机后部分材质丢失或Shader效果异常(如透明通道失效、光照计算错误等)。此问题影响了开发进度,希望得到技术支持的帮助
Specific results: The materials are displayed normally in the Unity editor, but after being deployed to the Vision Pro real machine, some materials are lost or the Shader effect is abnormal (such as transparent channel failure, antenna calculation error, etc.). This problem has affected the development progress, and I hope to get help from technical support
Topic:
Spatial Computing
SubTopic:
Reality Composer Pro
Hope to achieve stable transmission
And the colors are different. The colors in the glasses are not consistent with the colors projected on the screen.
Hi,
after upgrading to 2.4.1 (from 1.0) my vision stucks on "Retrieving configuration" screen. Apple Store didn't support my case since it has been sold in USA and the product isn't still present in italian market. I don't have dev strap, how can I manage the issue?
Thank you
Topic:
Spatial Computing
SubTopic:
General
Hello,
I'm developing a visionOS application for Apple Vision Pro that aims to scan unknown physical objects, capture their 3D data (such as meshes or point clouds), and export them as 3D models. Ideally, I'd also like to visualize these reconstructions in real-time within the headset.
This functionality is similar to what's available in Reality Composer on iPad and iPhone, but I'm seeking to implement it natively on Vision Pro.
I've reviewed the visionOS documentation but haven't found clear guidance on accessing LiDAR depth data or performing scene reconstruction.
Specifically, I'm interested in:
1.Accessing LiDAR or depth data from Vision Pro's sensors.
2.Utilizing ARKit's scene reconstruction capabilities on visionOS.
3.Exporting captured 3D data as models (e.g., USDZ or OBJ formats).
Are there APIs or frameworks in visionOS that support these features?
Topic:
Spatial Computing
SubTopic:
General
In my Reality Composer Pro workflow for Vision Pro development, I’m using xcrun realitytool image to pre-compress textures into .ktx format, typically using ASTC block compression. These textures are used for cubemaps and environment assets.
I’ve noticed that regardless of the image content—whether it’s a highly detailed photo or a completely black image—once compressed with the same ASTC block size (e.g., ASTC_8x8), the resulting .ktx file size is nearly identical. There appears to be no content-aware logic that adapts the compression ratio to the actual texture complexity.
In contrast, Unreal Engine behaves differently: even when all cubemap faces are imported at the same resolution as DDS textures, the engine performs content-aware compression during packaging:
Low-complexity images are compressed more aggressively
The final packaged file size varies based on content complexity
Since Reality Composer Pro requires textures to be pre-compressed as .ktx, there’s no opportunity for runtime optimization or per-image compression adjustment.
Just wondering: is there any recommended way to implement content-aware compression for .ktx textures in Reality Composer Pro?
Or any best practices to optimize .ktx sizes based on image complexity?
Thanks!
Hi guys,
In visionOS, when using a ZStack decorated with .glassBackgroundEffect(), you can see the 3D glass background from the front, but when viewed from the side, the view appears to have no thickness.
However, I noticed that in an app built by Apple, when viewing a glass background view from the side, it appears to have thickness.
I tried adding .frame(depth:) to a glass background view, but it appears as two separate layers spaced by the depth value.
My question is:
Is there a view modifier that adds visual thickness to a glass background view, as shown in the picture?
Or, if not, how should I write a custom view modifier to achieve this effect? Thanks!
传输后的直播流分辨率显著下降,画面细节丢失、清晰度不足,导致 3D 家具商品的纹理、尺寸等关键信息无法精准展示,影响用户对商品的判断。
期望
优化流传输过程中的分辨率压缩策略,减少传输过程中的画质损耗,提升 Mac 端接收的直播流清晰度,匹配 3D 商品展示的高精度需求。