Natural Language

Analyze natural language text and deduce its language-specific metadata using Natural Language.

Natural Language Documentation

Post

Replies

Boosts

Views

Activity

Autocorrection and predictive text support for additional Cyrillic languages

Hello Apple Keyboard / Internationalization team, I would like to ask about autocorrection and predictive text support for additional Cyrillic-based languages, especially Kazakh, Kyrgyz, Chuvash, and Ingush. These languages use Cyrillic scripts with their own letters, spelling rules, and word-frequency patterns. When users type in these languages, Russian-based autocorrection or missing language-specific correction can produce incorrect suggestions or replacements. My questions are: Are there plans to expand autocorrection and predictive text support for more Cyrillic-based languages? Is there a recommended way for developers or language communities to provide dictionaries, word-frequency lists, corpora, or other linguistic data to help improve autocorrection? Should this type of request be submitted through Feedback Assistant, Developer Forums, or another Apple channel? I have corpus-based frequency data and language resources for multiple Cyrillic-based languages and would be happy to share them if useful. Thank you. Ali Kuzhuget

App & System Services Internationalization Internationalization Natural Language Localization

186

Cyrillic keyboard long-press support for additional languages

Hello Apple Keyboard / Internationalization team, In the current beta, I noticed new keyboard support for Tuvan and Sakha. Thank you — this is very important for Cyrillic-based languages and their communities. I also noticed improvements to the Russian keyboard long-press options, but some Cyrillic letters used by other languages still seem to be missing. For example, Ossetian uses Ӕ ӕ, and this character does not appear as a long-press option. My questions are: Are there plans to expand the Russian keyboard long-press mappings to cover more Cyrillic-based languages? Is there a recommended way for language communities or developers to provide corpus/frequency data and character mappings to help improve keyboard support? Should this type of request be submitted through Feedback Assistant, Developer Forums, or another channel? I have corpus-based frequency data and long-press mapping data for many Cyrillic-based languages and would be happy to share it if useful. Thank you. Ali Kuzhuget

App & System Services Internationalization Internationalization InputMethodKit Natural Language Localization

226

Autocorrection and predictive text support for additional Cyrillic languages

Machine Learning & AI General Natural Language

126

Problem running NLContextualEmbeddingModel in simulator

Environment MacOC 26 Xcode Version 26.0 beta 7 (17A5305k) simulator: iPhone 16 pro iOS: iOS 26 Problem NLContextualEmbedding.load() fails with the following error In simulator Failed to load embedding from MIL representation: filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] Failed to load embedding model 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289' assetRequestFailed(Optional(Error Domain=NLNaturalLanguageErrorDomain Code=7 "Embedding model requires compilation" UserInfo={NSLocalizedDescription=Embedding model requires compilation})) in #Playground I'm new to this embedding model. Not sure if it's caused by my code or environment. Code snippet import Foundation import NaturalLanguage import Playgrounds #Playground { // Prefer initializing by script for broader coverage; returns NLContextualEmbedding? guard let embeddingModel = NLContextualEmbedding(script: .latin) else { print("Failed to create NLContextualEmbedding") return } print(embeddingModel.hasAvailableAssets) do { try embeddingModel.load() print("Model loaded") } catch { print("Failed to load model: \(error)") } }

Machine Learning & AI General Beta Natural Language

3.1k

May ’26

Detection of Unavailable Characters (Tofu Box) in a String

Hi, I wanted to know what is the best way to detect whether a part of string has an unavailable character, '□' (tofu box or last resort character). So far it seems to be that we will have to parse all the strings and individually check for each character and whether or not it is a part of the Unicode Scalar. And since we are a business application that deals with a lot of data as strings, this will be rather performance heavy. So wanted to know if there were any other better or more efficient ways to go about this?

UI Frameworks General Natural Language Core Text Live Text Writing Tools

270

Sep ’25

NLTagger.requestAssets hangs indefinitely

When calling NLTagger.requestAssets with some languages, it hangs indefinitely both in the simulator and a device. This happens consistently for some languages like greek. An example call is NLTagger.requestAssets(for: .greek, tagScheme: .lemma). Other languages like french return immediately. I captured some logs from Console and found what looks like the repeated attempts to download the asset. I would expect the call to eventually terminate, either loading the asset or failing with an error.

Machine Learning & AI General Natural Language

281

May ’25

Named Entity Recognition Model for Measurements

In an under-development MacOS & iOS app, I need to identify various measurements from OCR'ed text: length, weight, counts per inch, area, percentage. The unit type (e.g. UnitLength) needs to be identified as well as the measurement's unit (e.g. .inches) in order to convert the measurement to the app's internal standard (e.g. centimetres), the value of which is stored the relevant CoreData entity. The use of NLTagger and NLTokenizer is problematic because of the various representations of the measurements: e.g. "50g.", "50 g", "50 grams", "1 3/4 oz." Currently, I use a bespoke algorithm based on String contains and step-wise evaluation of characters, which is reasonably accurate but requires frequent updating as further representations are detected. I'm aware of the Python SpaCy model being capable of NER Measurement recognition, but am reluctant to incorporate a Python-based solution into a production app. (ref [https://developer.apple.com/forums/thread/30092]) My preference is for an open-source NER Measurement model that can be used as, or converted to, some form of a Swift compatible Machine Learning model. Does anyone know of such a model?

Machine Learning & AI General Swift Natural Language Machine Learning

176

Mar ’25

Urdu Language Keyboard Bug

Hello Apple, i've been using ios for many years and never had any issues with urdu language keyboard, but since the new 18.4 beta update some words are not working correctly as it should like a name of my friend who's name is "راعنیہ" but the new updated version cannot type is together and keep seperating like "راعنی ہ" its so frustrating to use like that and its not just one but so many other words that it just cannot do properly also the new font and no gap concept its hurting my eyes so much while reading or even typing.. i hope apple fixes that asap.. thankyou

Media Technologies General HTML Natural Language

522

Feb ’25

Create ML Trouble Loading CSV to Train Word Tagger With Commas in Training Data

I'm using Numbers to build a spreadsheet that I'm exporting as a CSV. I then import this file into Create ML to train a word tagger model. Everything has been working fine for all the models I've trained so far, but now I'm coming across a use case that has been breaking the import process: commas within the training data. This is a case that none of Apple's examples show. My project takes Navajo text that has been tokenized by syllables and labels the parts-of-speech. Case that works... Raw text: Naaltsoos yídéeshtah. Tokens column: Naal,tsoos, ,yí,déesh,tah,. Labels column: NObj,NObj,Space,Verb,Verb,VStem,Punct Case that breaks... Raw text: óola, béésh łigaii, tłʼoh naadą́ą́ʼ, wáin, akʼah, dóó á,shįįh Tokens column with tokenized text (commas quoted): óo,la,",", ,béésh, ,łi,gaii,",", ,tłʼoh, ,naa,dą́ą́ʼ,",", ,wáin,",", ,a,kʼah,",", ,dóó, ,á,shįįh (Create ML reports mismatched columns) Tokens column with tokenized text (commas escaped): óo,la,\,, ,béésh, ,łi,gaii,\,, ,tłʼoh, ,naa,dą́ą́ʼ,\,, ,wáin,\,, ,a,kʼah,\,, ,dóó, ,á,shįįh (Create ML reports mismatched columns) Tokens column with tokenized text (commas escape-quoted): óo,la,\",\", ,béésh, ,łi,gaii,\",\", ,tłʼoh, ,naa,dą́ą́ʼ,\",\", ,wáin,\",\", ,a,kʼah,\",\", ,dóó, ,á,shįįh (record not detected by Create ML) Tokens column with tokenized text (commas escape-quoted): óo,la,"","", ,béésh, ,łi,gaii,"","", ,tłʼoh, ,naa,dą́ą́ʼ,"","", ,wáin,"","", ,a,kʼah,"","", ,dóó, ,á,shįįh (Create ML reports mismatched columns) Labels column: NSub,NSub,Punct,Space,NSub,Space,NSub,NSub,Punct,Space,NSub,Space,NSub,NSub,Punct,Space,NSub,Punct,Space,NSub,NSub,Punct,Space,Conj,Space,NSub,NSub Sample From Spreadsheet Solution Needed It's simple enough to escape commas within CSV files, but the format needed by Create ML essentially combines entire CSV records into single columns, so I'm ending up needing a CSV record that contains a mixture of commas to use for parsing and ones to use as character literals. That's where this gets complicated. For this particular use case (which seems like it would frequently arise when training a word tagger model), how should I properly escape a comma literal?

Machine Learning & AI Create ML Natural Language Machine Learning Create ML TabularData

864

Jan ’25

NLModel won't initialize in MessageFilterExtension

i'm trying to create an NLModel within a MessageFilterExtension handler. The code works fine in the main app, but when I try to use it in the extension it fails to initialize. Just this doesn't even work and gets the error below. Single line that fails. SMS_Classifier is the class xcode generated for my model. This line works fine in the main app. let mlModel = try SMS_Classifier(configuration: MLModelConfiguration()).model Error Unable to locate Asset for contextual word embedding model for local en. MLModelAsset: load failed with error Error Domain=com.apple.CoreML Code=0 "initialization of text classifier model with model data failed" UserInfo={NSLocalizedDescription=initialization of text classifier model with model data failed} Any ideas?

Machine Learning & AI General Natural Language

1.1k

Jan ’25

Adding Central Kurdish language to VoiceOver functionality

addition of Central Kurdish language support for Text-to-Speech (TTS) and VoiceOver functionality on Apple products. Our TTS model boasts an impressive 99.9% accuracy, making it a highly reliable tool for this purpose. This initiative would bring meaningful benefits to over 10,000 visually impaired and more than 40,000 illiterate individuals in the Kurdistan Region of Iraq, empowering them to access digital information, navigate devices, and perform tasks more independently. The integration of Central Kurdish VoiceOver support would make a significant difference in improving accessibility and quality of life for these individuals, promoting inclusivity and digital literacy in the region.

Accessibility & Inclusion General Natural Language

594

Nov ’24

NLtagger not filtering words such as "And, to, a, in"

what am I not understanding here. in short the view loads text from the jsons descriptions and then should filter out the words. and return and display a list of most used words, debugging shows words being identified by the code but does not filter them out private func loadWordCounts() { DispatchQueue.global(qos: .background).async { let fileManager = FileManager.default guard let documentsDirectory = try? fileManager.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: false) else { return } let descriptions = loadDescriptions(fileManager: fileManager, documentsDirectory: documentsDirectory) var counts = countWords(in: descriptions) let tagsToRemove: Set<NLTag> = [ .verb, .pronoun, .determiner, .particle, .preposition, .conjunction, .interjection, .classifier ] for (word, _) in counts { let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = word let (tag, _) = tagger.tag(at: word.startIndex, unit: .word, scheme: .lexicalClass) if let unwrappedTag = tag, tagsToRemove.contains(unwrappedTag) { counts[word] = 0 } } DispatchQueue.main.async { self.wordCounts = counts } } }

Machine Learning & AI General Natural Language

625

Oct ’24

iOS17.4.1及以上,系统语言是中文,关闭NFC,app内调用NFC弹窗显示英文

iOS17.4.1及以上,系统语言是中文,关闭系统NFC,app内调用CoreNFC弹窗显示英文（正常应该显示中文,在iOS17.4.1是正常显示中文的）

App & System Services General Natural Language

1.1k

Aug ’24

iOS 18.1 beta - App crashes at runtime while using Translation.TranslationError in project

I'm trying to cast the error thrown by TranslationSession.translations(from:) as Translation.TranslationError. However, the app crashes at runtime whenever Translation.TranslationError is used in the project. Environment: iOS Version: 18.1 beta Xcode Version: 16 beta yld[14615]: Symbol not found: _$s11Translation0A5ErrorVMa Referenced from: <3426152D-A738-30C1-8F06-47D2C6A1B75B> /private/var/containers/Bundle/Application/043A25BC-E53E-4B28-B71A-C21F77C0D76D/TranslationAPI.app/TranslationAPI.debug.dylib Expected in: /System/Library/Frameworks/Translation.framework/Translation

Machine Learning & AI Core ML ML Compute Natural Language Live Text Apple Intelligence

1.3k

Aug ’24

Is it possible to convert a model trained with CRFSuite to NLModel / CoreML?

I was wondering if there is a quick way to convert a model trained with the open source CRFSuite for use with NLTagger? It seems like retraining should be possible but was wondering if automatic conversion was supported?

Machine Learning & AI General Natural Language Core ML

1.6k

Jun ’24

Can't get Text Input Source region designator

I have a macOS application with a minimum version of macOS 12.0. I need to be able to get the current keyboard region designator. Example: The user selects a input source of English Canadian. What I want as a result of this fact is en-CA locale identifier. I get the current keyboard language with the following code func keyboardLanguage() -> String?{ let keyboard = TISCopyCurrentKeyboardInputSource().takeRetainedValue() let languagesPtr = TISGetInputSourceProperty(keyboard, kTISPropertyInputSourceLanguages)! let languages = Unmanaged<AnyObject>.fromOpaque(languagesPtr).takeUnretainedValue() as? [String] return languages?.first } This returns the language as en, but I don't see how I can get the region from Text Input Sources. I can get the input source id let keyboard = TISCopyCurrentKeyboardInputSource().takeRetainedValue() let idPtr = TISGetInputSourceProperty(keyboard, kTISPropertyInputSourceID)! let id = Unmanaged<AnyObject>.fromOpaque(idPtr).takeUnretainedValue() as? String print(String(describing: id)) This prints com.apple.keylayout.Canadian which points to the Canadian region but is not a region designator. I can possible parse this id and map it to a region designator but first I'm not sure if I will capture all of the regions and secondly what happens if the format of the id changes? If someone can point to the correct API to use it will be much appreciated.

Accessibility & Inclusion General Internationalization Natural Language Localization

Apr ’24

Getting a list of words recognized by Speech

Is there a way to extract the list of words recognized by the Speech framework? I'm trying to filter out words that won't appear in the transcription output, but to do that I'll need a list of words that can appear. SFSpeechLanguageModel.Configuration can be initialized with a vocabulary, but there doesn't seem to be a way to read it, and while there are ways to create custom vocabularies, I have yet to find a way to retrieve it. I added the Natural Language tag in case the framework might contribute to a solution

Machine Learning & AI General Speech Natural Language

910

Feb ’24

WeatherKit localization options

I am working on an app that pulls data from weatherKit, including the conditionCode property, the content of which is displayed to the user. I wish to localize the data pulled from weatherKit but when pulling data from: weatherkit.apple.com/api/v1/weather/de/{latitude}/{longitude} The conditionCode and other strings is in english. Same is true if the language parameter is set to es, ja or something else. Am I doing something wrong or is localization yet to be supported in weatherKit? I can't find any documentation on this.

Accessibility & Inclusion General Localization Natural Language Internationalization WeatherKit

3.4k

Jan ’24

NLTagger does not enumerate anymore?

I am using NLTagger to tag lexical classes of words, but it suddenly just stopped working. I boiled my code down to the most basic version, but it's never executing the closure of the enumerateTags() function. What do I have to change or what should I try? for e in sentenceArray { let cupcake = "I like you, have a cupcake" tagger.string = cupcake tagger.enumerateTags(in: cupcake.startIndex..<cupcake.endIndex, unit: .word, scheme: .nameTypeOrLexicalClass) { tag, range in print("TAG") return true }

Machine Learning & AI General Natural Language

2.4k

Jan ’24

iOS: Get next word predictions

Does iOS provide an API for getting text predictions based on previous text? I tried with UITextChecker.completions as such let str = "Hello" let range = NSMakeRange(str.utf16.count, 0) let tc = UITextChecker() let completions = tc.completions(forPartialWordRange: range, in: str, language: "en-US") print(completions) However, this only works for completing words, not sentences. Does iOS have a way of doing this? I read somewhere that macOS does. If not, what workarounds/alternatives would you recommend?