Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

LanguageModelSession always returns very lengthy responses
No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
3
0
323
Sep ’25
ILMessageFilterExtension memory limit
I’m considering creating an ILMessageFilterExtension using a mini LLM/SLM to detect fraud and I’ve read it has strict memory limits yet I can’t find it in the documentation. What’s the set limit or any other constraints impacting the feasibility of running 100-500mb model?
0
0
80
Apr ’25
Hardware Support for Low Precision Data Types?
Hi all, I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision! Thanks in advance.
0
0
312
Nov ’25
Missing module 'coremltools.libmilstoragepython'
Hello! I'm following the Foundation Models adapter training guide (https://developer.apple.com/apple-intelligence/foundation-models-adapter/) on my NVIDIA DGX Spark box. I'm able to train on my own data but the example notebook fails when I try to export the artifact as an fmadapter. I get the following error for the code block I'm trying to run. I haven't touched any of the code in the export folder. I tried exporting it on my Mac too and got the same error as well (given below). Would appreciate some more clarity around this. Thank you. Code Block: from export.export_fmadapter import Metadata, export_fmadapter metadata = Metadata( author="3P developer", description="An adapter that writes play scripts.", ) export_fmadapter( output_dir="./", adapter_name="myPlaywritingAdapter", metadata=metadata, checkpoint="adapter-final.pt", draft_checkpoint="draft-model-final.pt", ) Error: --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[10], line 1 ----> 1 from export.export_fmadapter import Metadata, export_fmadapter 3 metadata = Metadata( 4 author="3P developer", 5 description="An adapter that writes play scripts.", 6 ) 8 export_fmadapter( 9 output_dir="./", 10 adapter_name="myPlaywritingAdapter", (...) 13 draft_checkpoint="draft-model-final.pt", 14 ) File /workspace/export/export_fmadapter.py:11 8 from typing import Any 10 from .constants import BASE_SIGNATURE, MIL_PATH ---> 11 from .export_utils import AdapterConverter, AdapterSpec, DraftModelConverter, camelize 13 logger = logging.getLogger(__name__) 16 class MetadataKeys(enum.StrEnum): File /workspace/export/export_utils.py:15 13 import torch 14 import yaml ---> 15 from coremltools.libmilstoragepython import _BlobStorageWriter as BlobWriter 16 from coremltools.models.neural_network.quantization_utils import _get_kmeans_lookup_table_and_weight 17 from coremltools.optimize._utils import LutParams ModuleNotFoundError: No module named 'coremltools.libmilstoragepython'
4
0
773
Oct ’25
SpeechTranscriber time indexes - detect pauses?
I'm experimenting with the new SpeechTranscriber in macOS/iOS 26, transcribing speech from a prerecorded mp4 file. Speed and quality are amazing! I've told the transcriber to include time indexes. Each run is always exactly one word, which can be very useful. When I look at the indexes the end of one run is always identical to the start of the next run, even if there's a pause. I'd like to identify pauses, perhaps to generate something like phrases for subtitling. With each run of text going into the next I can't do this, other than using punctuation - which might be rather rough. Any suggestions on detecting pauses, or getting that kind of metadata from the transcriber? Here's a short sample, showing each run with the start, end, and characters in the run: 105.9 --> 107.04 I 107.04 --> 107.16 think 107.16 --> 108.0 more 108.0 --> 108.42 lighting 108.42 --> 108.6 is 108.6 --> 108.72 definitely 108.72 --> 109.2 needed, 109.2 --> 109.92 downtown. 109.98 --> 110.4 My 110.4 --> 110.52 only 110.52 --> 110.7 question 110.7 --> 111.06 is, 111.06 --> 111.48 poll 111.48 --> 111.78 five, 111.78 --> 111.84 that 111.84 --> 112.08 you're 112.08 --> 112.38 increasing 112.38 --> 112.5 the 112.5 --> 113.34 50,000? 113.4 --> 113.58 Where 113.58 --> 113.88 exactly
0
0
255
Jun ’25
Downloading my fine tuned model from huggingface
I have used mlx_lm.lora to fine tune a mistral-7b-v0.3-4bit model with my data. I fused the mistral model with my adapters and upload the fused model to my directory on huggingface. I was able to use mlx_lm.generate to use the fused model in Terminal. However, I don't know how to load the model in Swift. I've used Imports import SwiftUI import MLX import MLXLMCommon import MLXLLM let modelFactory = LLMModelFactory.shared let configuration = ModelConfiguration( id: "pharmpk/pk-mistral-7b-v0.3-4bit" ) // Load the model off the main actor, then assign on the main actor let loaded = try await modelFactory.loadContainer(configuration: configuration) { progress in print("Downloading progress: \(progress.fractionCompleted * 100)%") } await MainActor.run { self.model = loaded } I'm getting an error runModel error: downloadError("A server with the specified hostname could not be found.") Any suggestions? Thanks, David PS, I can load the model from the app bundle // directory: Bundle.main.resourceURL! but it's too big to upload for Testflight
1
0
557
Oct ’25
Is it possible to pass the streaming output of Foundation Models down a function chain
I am writing a custom package wrapping Foundation Models which provides a chain-of-thought with intermittent self-evaluation among other things. At first I was designing this package with the command line in mind, but after seeing how well it augments the models and makes them more intelligent I wanted to try and build a SwiftUI wrapper around the package. When I started I was using synchronous generation rather than streaming, but to give the best user experience (as I've seen in the WWDC sessions) it is necessary to provide constant feedback to the user that something is happening. I have created a super simplified example of my setup so it's easier to understand. First, there is the Reasoning conversation item, which can be converted to an XML representation which is then fed back into the model (I've found XML works best for structured input) public typealias ConversationContext = XMLDocument extension ConversationContext { public func toPlainText() -> String { return xmlString(options: [.nodePrettyPrint]) } } /// Represents a reasoning item in a conversation, which includes a title and reasoning content. /// Reasoning items are used to provide detailed explanations or justifications for certain decisions or responses within a conversation. @Generable(description: "A reasoning item in a conversation, containing content and a title.") struct ConversationReasoningItem: ConversationItem { @Guide(description: "The content of the reasoning item, which is your thinking process or explanation") public var reasoningContent: String @Guide(description: "A short summary of the reasoning content, digestible in an interface.") public var title: String @Guide(description: "Indicates whether reasoning is complete") public var done: Bool } extension ConversationReasoningItem: ConversationContextProvider { public func toContext() -> ConversationContext { // <ReasoningItem title="${title}"> // ${reasoningContent} // </ReasoningItem> let root = XMLElement(name: "ReasoningItem") root.addAttribute(XMLNode.attribute(withName: "title", stringValue: title) as! XMLNode) root.stringValue = reasoningContent return ConversationContext(rootElement: root) } } Then there is the generator, which creates a reasoning item from a user query and previously generated items: struct ReasoningItemGenerator { var instructions: String { """ <omitted for brevity> """ } func generate(from input: (String, [ConversationReasoningItem])) async throws -> sending LanguageModelSession.ResponseStream<ConversationReasoningItem> { let session = LanguageModelSession(instructions: instructions) // build the context for the reasoning item out of the user's query and the previous reasoning items let userQuery = "User's query: \(input.0)" let reasoningItemsText = input.1.map { $0.toContext().toPlainText() }.joined(separator: "\n") let context = userQuery + "\n" + reasoningItemsText let reasoningItemResponse = try await session.streamResponse( to: context, generating: ConversationReasoningItem.self) return reasoningItemResponse } } I'm not sure if returning LanguageModelSession.ResponseStream<ConversationReasoningItem> is the right move, I am just trying to imitate what session.streamResponse returns. Then there is the orchestrator, which I can't figure out. It receives the streamed ConversationReasoningItems from the Generator and is responsible for streaming those to SwiftUI later and also for evaluating each reasoning item after it is complete to see if it needs to be regenerated (to keep the model on-track). I want the users of the orchestrator to receive partially generated reasoning items as they are being generated by the generator. Later, when they finish, if the evaluation passes, the item is kept, but if it fails, the reasoning item should be removed from the stream before a new one is generated. So in-flight reasoning items should be outputted aggresively. I really am having trouble figuring this out so if someone with more knowledge about asynchronous stuff in Swift, or- even better- someone who has worked on the Foundation Models framework could point me in the right direction, that would be awesome!
0
0
286
Jul ’25
CoreML model for news scoring
Is it possible to train a model using CreateML to infer a relevance numeric score of a news article based on similar trained data, something like a sentiment score ? I created a Text Classifier that assigns a category label which works perfect but I would like a solution that calculates a numeric value, not a label.
2
0
208
Mar ’25
ML contraints & Timeout clarificaitions for Message Filtering Extension
Hello everyone, I’m currently working with the Message Filtering Extension and would really appreciate some clarification around its performance and operational constraints. While the extension is extremely powerful and useful, I’ve found that some important details are either unclear or not well covered in the available documentation. There are two main areas I’m trying to understand better: Machine learning model constraints within the extension In our case, we already have an existing ML model that classifies messages (and are not dependant on Apple's built-in models). We’re evaluating whether and how it can be used inside the extension. Specifically, I’m trying to understand: Are there documented limits on the size of an ML model (e.g., maximum bundle size or model file size in MB)? What are the memory constraints for a model once loaded into memory by the extension? Under what conditions would the system terminate or “kick out” the extension due to memory or performance pressure? Message processing timeouts and execution constraints What is the timeout for processing a single received message? At what point will the OS stop waiting for the extension’s response and allow the message by default (for example, if the extension does not respond in time)? Any guidance, official references, or practical experience from Apple engineers or other developers would be greatly appreciated. Thanks in advance for your help,
0
0
249
Jan ’26
Inquiry About Building an App for Object Detection, Background Removal, and Animation
Hi all! Nice to meet you., I am planning to build an iOS application that can: Capture an image using the camera or select one from the gallery. Remove the background and keep only the detected main object. Add a border (outline) around the detected object’s shape. Apply an animation along that border (e.g., moving light or glowing effect). Include a transition animation when removing the background — for example, breaking the background into pieces as it disappears. The app Capword has a similar feature for object isolation, and I’d like to build something like that. Could you please provide any guidance, frameworks, or sample code related to: Object segmentation and background removal in Swift (Vision or Core ML). Applying custom borders and shape animations around detected objects. Recognizing the object name (e.g., “person”, “cat”, “car”) after segmentation. Thank you very much for your support. Best regards, SINN SOKLYHOR
0
0
196
Nov ’25
How can I change the output dimensions of a CoreML model in Xcode when the outputs come from a NonMaximumSuppression layer?
After exerting a custom model with nms=True. In Xcode, the outputs show as: confidence: MultiArray (0 × 5) coordinates: MultiArray (0 × 4) I want to set fixed shapes (e.g., 100 × 5, 100 × 4), but Xcode does not allow editing—the shape fields are locked. The model graph shows both outputs come directly from a NonMaximumSuppression layer. Is it possible to set fixed output dimensions for NMS outputs in CoreML?
2
0
213
2w
Is there an API to check if a Core ML compiled model is already cached?
Hello Apple Developer Community, I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays. Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system? If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached. Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!
0
0
153
May ’25
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
1
0
271
Jul ’25
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
227
1d
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
0
0
380
1w
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
0
0
200
2w
ANE Performance for on-device Foundation model
I'm running MacOs 26 Beta 5. I noticed that I can no longer achieve 100% usage on the ANE as I could before with Apple Foundations on-device model. Has Apple activated some kind of throttling or power limiting of the ANE? I cannot get above 3w or 40% usage now since upgrading. I'm on the high power energy mode. I there an API rate limit being applied? I kave a M4 Pro mini with 64 GB of memory.
0
0
342
Aug ’25
LanguageModelSession always returns very lengthy responses
No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
Replies
3
Boosts
0
Views
323
Activity
Sep ’25
ILMessageFilterExtension memory limit
I’m considering creating an ILMessageFilterExtension using a mini LLM/SLM to detect fraud and I’ve read it has strict memory limits yet I can’t find it in the documentation. What’s the set limit or any other constraints impacting the feasibility of running 100-500mb model?
Replies
0
Boosts
0
Views
80
Activity
Apr ’25
Hardware Support for Low Precision Data Types?
Hi all, I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision! Thanks in advance.
Replies
0
Boosts
0
Views
312
Activity
Nov ’25
Siri cut user's voice words in German version
My app used app intents. And when user said "Prüfung der Bluetooth Funktion", screen can show the whole words. But in my app, it only can get "Bluetooth Funktion". This behaviour only happened in German version. In English version, everything worked well. Is anyone can support me? Why German version siri cut my words?
Replies
0
Boosts
0
Views
642
Activity
Nov ’25
Missing module 'coremltools.libmilstoragepython'
Hello! I'm following the Foundation Models adapter training guide (https://developer.apple.com/apple-intelligence/foundation-models-adapter/) on my NVIDIA DGX Spark box. I'm able to train on my own data but the example notebook fails when I try to export the artifact as an fmadapter. I get the following error for the code block I'm trying to run. I haven't touched any of the code in the export folder. I tried exporting it on my Mac too and got the same error as well (given below). Would appreciate some more clarity around this. Thank you. Code Block: from export.export_fmadapter import Metadata, export_fmadapter metadata = Metadata( author="3P developer", description="An adapter that writes play scripts.", ) export_fmadapter( output_dir="./", adapter_name="myPlaywritingAdapter", metadata=metadata, checkpoint="adapter-final.pt", draft_checkpoint="draft-model-final.pt", ) Error: --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[10], line 1 ----> 1 from export.export_fmadapter import Metadata, export_fmadapter 3 metadata = Metadata( 4 author="3P developer", 5 description="An adapter that writes play scripts.", 6 ) 8 export_fmadapter( 9 output_dir="./", 10 adapter_name="myPlaywritingAdapter", (...) 13 draft_checkpoint="draft-model-final.pt", 14 ) File /workspace/export/export_fmadapter.py:11 8 from typing import Any 10 from .constants import BASE_SIGNATURE, MIL_PATH ---> 11 from .export_utils import AdapterConverter, AdapterSpec, DraftModelConverter, camelize 13 logger = logging.getLogger(__name__) 16 class MetadataKeys(enum.StrEnum): File /workspace/export/export_utils.py:15 13 import torch 14 import yaml ---> 15 from coremltools.libmilstoragepython import _BlobStorageWriter as BlobWriter 16 from coremltools.models.neural_network.quantization_utils import _get_kmeans_lookup_table_and_weight 17 from coremltools.optimize._utils import LutParams ModuleNotFoundError: No module named 'coremltools.libmilstoragepython'
Replies
4
Boosts
0
Views
773
Activity
Oct ’25
SpeechTranscriber time indexes - detect pauses?
I'm experimenting with the new SpeechTranscriber in macOS/iOS 26, transcribing speech from a prerecorded mp4 file. Speed and quality are amazing! I've told the transcriber to include time indexes. Each run is always exactly one word, which can be very useful. When I look at the indexes the end of one run is always identical to the start of the next run, even if there's a pause. I'd like to identify pauses, perhaps to generate something like phrases for subtitling. With each run of text going into the next I can't do this, other than using punctuation - which might be rather rough. Any suggestions on detecting pauses, or getting that kind of metadata from the transcriber? Here's a short sample, showing each run with the start, end, and characters in the run: 105.9 --> 107.04 I 107.04 --> 107.16 think 107.16 --> 108.0 more 108.0 --> 108.42 lighting 108.42 --> 108.6 is 108.6 --> 108.72 definitely 108.72 --> 109.2 needed, 109.2 --> 109.92 downtown. 109.98 --> 110.4 My 110.4 --> 110.52 only 110.52 --> 110.7 question 110.7 --> 111.06 is, 111.06 --> 111.48 poll 111.48 --> 111.78 five, 111.78 --> 111.84 that 111.84 --> 112.08 you're 112.08 --> 112.38 increasing 112.38 --> 112.5 the 112.5 --> 113.34 50,000? 113.4 --> 113.58 Where 113.58 --> 113.88 exactly
Replies
0
Boosts
0
Views
255
Activity
Jun ’25
Downloading my fine tuned model from huggingface
I have used mlx_lm.lora to fine tune a mistral-7b-v0.3-4bit model with my data. I fused the mistral model with my adapters and upload the fused model to my directory on huggingface. I was able to use mlx_lm.generate to use the fused model in Terminal. However, I don't know how to load the model in Swift. I've used Imports import SwiftUI import MLX import MLXLMCommon import MLXLLM let modelFactory = LLMModelFactory.shared let configuration = ModelConfiguration( id: "pharmpk/pk-mistral-7b-v0.3-4bit" ) // Load the model off the main actor, then assign on the main actor let loaded = try await modelFactory.loadContainer(configuration: configuration) { progress in print("Downloading progress: \(progress.fractionCompleted * 100)%") } await MainActor.run { self.model = loaded } I'm getting an error runModel error: downloadError("A server with the specified hostname could not be found.") Any suggestions? Thanks, David PS, I can load the model from the app bundle // directory: Bundle.main.resourceURL! but it's too big to upload for Testflight
Replies
1
Boosts
0
Views
557
Activity
Oct ’25
Siri UI returned to original design
Good morning all has anyone encountered the issue of Siri returning back to her original user interface on IOS-26? I’m trying to figure out the cause. I’ve sent feedback via the feedback app. Just seeing if anyone else has the same issue.
Replies
1
Boosts
0
Views
147
Activity
Jun ’25
Is it possible to pass the streaming output of Foundation Models down a function chain
I am writing a custom package wrapping Foundation Models which provides a chain-of-thought with intermittent self-evaluation among other things. At first I was designing this package with the command line in mind, but after seeing how well it augments the models and makes them more intelligent I wanted to try and build a SwiftUI wrapper around the package. When I started I was using synchronous generation rather than streaming, but to give the best user experience (as I've seen in the WWDC sessions) it is necessary to provide constant feedback to the user that something is happening. I have created a super simplified example of my setup so it's easier to understand. First, there is the Reasoning conversation item, which can be converted to an XML representation which is then fed back into the model (I've found XML works best for structured input) public typealias ConversationContext = XMLDocument extension ConversationContext { public func toPlainText() -> String { return xmlString(options: [.nodePrettyPrint]) } } /// Represents a reasoning item in a conversation, which includes a title and reasoning content. /// Reasoning items are used to provide detailed explanations or justifications for certain decisions or responses within a conversation. @Generable(description: "A reasoning item in a conversation, containing content and a title.") struct ConversationReasoningItem: ConversationItem { @Guide(description: "The content of the reasoning item, which is your thinking process or explanation") public var reasoningContent: String @Guide(description: "A short summary of the reasoning content, digestible in an interface.") public var title: String @Guide(description: "Indicates whether reasoning is complete") public var done: Bool } extension ConversationReasoningItem: ConversationContextProvider { public func toContext() -> ConversationContext { // <ReasoningItem title="${title}"> // ${reasoningContent} // </ReasoningItem> let root = XMLElement(name: "ReasoningItem") root.addAttribute(XMLNode.attribute(withName: "title", stringValue: title) as! XMLNode) root.stringValue = reasoningContent return ConversationContext(rootElement: root) } } Then there is the generator, which creates a reasoning item from a user query and previously generated items: struct ReasoningItemGenerator { var instructions: String { """ <omitted for brevity> """ } func generate(from input: (String, [ConversationReasoningItem])) async throws -> sending LanguageModelSession.ResponseStream<ConversationReasoningItem> { let session = LanguageModelSession(instructions: instructions) // build the context for the reasoning item out of the user's query and the previous reasoning items let userQuery = "User's query: \(input.0)" let reasoningItemsText = input.1.map { $0.toContext().toPlainText() }.joined(separator: "\n") let context = userQuery + "\n" + reasoningItemsText let reasoningItemResponse = try await session.streamResponse( to: context, generating: ConversationReasoningItem.self) return reasoningItemResponse } } I'm not sure if returning LanguageModelSession.ResponseStream<ConversationReasoningItem> is the right move, I am just trying to imitate what session.streamResponse returns. Then there is the orchestrator, which I can't figure out. It receives the streamed ConversationReasoningItems from the Generator and is responsible for streaming those to SwiftUI later and also for evaluating each reasoning item after it is complete to see if it needs to be regenerated (to keep the model on-track). I want the users of the orchestrator to receive partially generated reasoning items as they are being generated by the generator. Later, when they finish, if the evaluation passes, the item is kept, but if it fails, the reasoning item should be removed from the stream before a new one is generated. So in-flight reasoning items should be outputted aggresively. I really am having trouble figuring this out so if someone with more knowledge about asynchronous stuff in Swift, or- even better- someone who has worked on the Foundation Models framework could point me in the right direction, that would be awesome!
Replies
0
Boosts
0
Views
286
Activity
Jul ’25
CoreML model for news scoring
Is it possible to train a model using CreateML to infer a relevance numeric score of a news article based on similar trained data, something like a sentiment score ? I created a Text Classifier that assigns a category label which works perfect but I would like a solution that calculates a numeric value, not a label.
Replies
2
Boosts
0
Views
208
Activity
Mar ’25
ML contraints & Timeout clarificaitions for Message Filtering Extension
Hello everyone, I’m currently working with the Message Filtering Extension and would really appreciate some clarification around its performance and operational constraints. While the extension is extremely powerful and useful, I’ve found that some important details are either unclear or not well covered in the available documentation. There are two main areas I’m trying to understand better: Machine learning model constraints within the extension In our case, we already have an existing ML model that classifies messages (and are not dependant on Apple's built-in models). We’re evaluating whether and how it can be used inside the extension. Specifically, I’m trying to understand: Are there documented limits on the size of an ML model (e.g., maximum bundle size or model file size in MB)? What are the memory constraints for a model once loaded into memory by the extension? Under what conditions would the system terminate or “kick out” the extension due to memory or performance pressure? Message processing timeouts and execution constraints What is the timeout for processing a single received message? At what point will the OS stop waiting for the extension’s response and allow the message by default (for example, if the extension does not respond in time)? Any guidance, official references, or practical experience from Apple engineers or other developers would be greatly appreciated. Thanks in advance for your help,
Replies
0
Boosts
0
Views
249
Activity
Jan ’26
Inquiry About Building an App for Object Detection, Background Removal, and Animation
Hi all! Nice to meet you., I am planning to build an iOS application that can: Capture an image using the camera or select one from the gallery. Remove the background and keep only the detected main object. Add a border (outline) around the detected object’s shape. Apply an animation along that border (e.g., moving light or glowing effect). Include a transition animation when removing the background — for example, breaking the background into pieces as it disappears. The app Capword has a similar feature for object isolation, and I’d like to build something like that. Could you please provide any guidance, frameworks, or sample code related to: Object segmentation and background removal in Swift (Vision or Core ML). Applying custom borders and shape animations around detected objects. Recognizing the object name (e.g., “person”, “cat”, “car”) after segmentation. Thank you very much for your support. Best regards, SINN SOKLYHOR
Replies
0
Boosts
0
Views
196
Activity
Nov ’25
How can I change the output dimensions of a CoreML model in Xcode when the outputs come from a NonMaximumSuppression layer?
After exerting a custom model with nms=True. In Xcode, the outputs show as: confidence: MultiArray (0 × 5) coordinates: MultiArray (0 × 4) I want to set fixed shapes (e.g., 100 × 5, 100 × 4), but Xcode does not allow editing—the shape fields are locked. The model graph shows both outputs come directly from a NonMaximumSuppression layer. Is it possible to set fixed output dimensions for NMS outputs in CoreML?
Replies
2
Boosts
0
Views
213
Activity
2w
Is there an API to check if a Core ML compiled model is already cached?
Hello Apple Developer Community, I'm investigating Core ML model loading behavior and noticed that even when the compiled model path remains unchanged after an APP update, the first run still triggers an "uncached load" process. This seems to impact user experience with unnecessary delays. Question: Does Core ML provide any public API to check whether a compiled model (from a specific .mlmodelc path) is already cached in the system? If such API exists, we'd like to use it for pre-loading decision logic - only perform background pre-load when the model isn't cached. Has anyone encountered similar scenarios or found official solutions? Any insights would be greatly appreciated!
Replies
0
Boosts
0
Views
153
Activity
May ’25
face and body detection in the Vision framework a local model or a cloud model?
Is the face and body detection service in the Vision framework a local model or a cloud model? Is there a performance report? https://developer.apple.com/documentation/vision
Replies
1
Boosts
0
Views
503
Activity
Sep ’25
SpeechAnalyzer / AssetInventory and preinstalled assets
During testing the “Bringing advanced speech-to-text capabilities to your app” sample app demonstrating the use of iOS 26 SpeechAnalyzer, I noticed that the language model for the English locale was presumably already downloaded. Upon checking the documentation of AssetInventory, I found out that indeed, the language model can be preinstalled on the system. Can someone from the dev team share more info about what assets are preinstalled by the system? For example, can we safely assume that the English language model will almost certainly be already preinstalled by the OS if the phone has the English locale?
Replies
1
Boosts
0
Views
271
Activity
Jul ’25
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
227
Activity
1d
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
Replies
0
Boosts
0
Views
380
Activity
1w
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
Replies
0
Boosts
0
Views
200
Activity
2w
ANE Performance for on-device Foundation model
I'm running MacOs 26 Beta 5. I noticed that I can no longer achieve 100% usage on the ANE as I could before with Apple Foundations on-device model. Has Apple activated some kind of throttling or power limiting of the ANE? I cannot get above 3w or 40% usage now since upgrading. I'm on the high power energy mode. I there an API rate limit being applied? I kave a M4 Pro mini with 64 GB of memory.
Replies
0
Boosts
0
Views
342
Activity
Aug ’25