Foundation ModelフレームワークへのLLMプロバイダーの導入

Foundation ModelフレームワークへのLLMプロバイダーの導入

新しいモデル用にLanguageModelExecutorを実装することで、Foundation Modelフレームワークを拡張できます。LanguageModelSessionのトランスクリプトとの接続、セッション状態の効果的な管理、KVキャッシュの利用の最適化を行うための方法を紹介します。カスタムのセグメントタイプをサポートし、生成AIの高度な機能を活用する方法も確認しましょう。

関連する章
- 0:00 - Introduction
- 3:37 - Packaging
- 4:48 - Protocol
- 14:50 - Authentication
- 15:51 - Customization
- 19:47 - Next steps
リソース
関連ビデオ

WWDC26
こんにちは！ Christopher Webbです Machine Learning Research チームのエンジニアです今日はお話しできることをうれしく思います Foundation Modelフレームワークの新しい使い方についてです Foundation Modelフレームワークは以前にご紹介しました Appleのオンデバイス言語モデルへのアクセスを提供するためのものですこのframeworkをほぼすべてのLLMで使えるよう拡張しましたローカルでもサーバーベースでも大企業から個人デベロッパまで誰でも利用できます frameworkを使って独自のモデル連携を簡単に構築できます
オンデバイスSystem Language Modelはゼロから作り直しましたより賢く指示への追従性も向上しプロンプトに画像を直接含めることもできますシステムモデルに加えて 3つの新しいオプションを追加しました Private Cloud Computeは多くの Apple Intelligence機能を支えるモデルです推論機能を備え 32Kトークンのコンテキストウィンドウとプライバシー保証も提供します Core AIを使えばローカルモデルを効率的に実行でき ANEも活用できます MLXはHugging Face上で利用可能な数千ものモデルを MLX-Community経由で使えるようにします
これらはすべて新しいパブリックプロトコル上に構築されているためデベロッパは最先端のAIモデルを同じframeworkを使ってアプリに組み込めます AnthropicとGoogleはまもなく Foundation Modelフレームワークを独自のSwiftパッケージで拡張します最先端のClaudeとGeminiのモデルをすべてのSwiftデベロッパが使えるようになりますどのモデルを使う場合でも Appleのもの、独自のもの、コミュニティのものであっても呼び出し方は同じですすべてのモデルがLanguage Model プロトコルに準拠しているからですアプリデベロッパの方にはこれらのモデルを呼び出す方法をおなじみのAPIでご紹介しますモデルプロバイダの方には独自のLanguage Model パッケージの作成方法をご説明しますまずは使いやすさをプレビューでご覧くださいオンデバイスのFoundation Modelです作成してセッションに渡し respond関数を呼び出しますさらに多くのモデルオプションもありますより高い性能が必要なら Private Cloud Computeを試してくださいモデルを切り替えるだけです独自モデルを配布したい場合は CoreAIをリソースに指定するだけです最新のオープンソースモデルを試したい場合はモデルIDを渡すだけで frameworkが残りを処理します
Language Modelプロトコル上に構築されたモデルを使うと Foundation Modelの優れた機能をすべて利用できます Dynamic Profilesなどです追加内容の概要については「What's new in the Foundation Models framework」をご覧ください
モデルを簡単に切り替えられる理由はすべてのLanguageModelが同じプロトコルに従っているからです System Language Model PCC Core AI MLX そしてコミュニティ製のものも同様ですモデルプロバイダの方はぜひ参加してください！方法をご説明しますモデルをframeworkに組み込む 4つのステップがありますまずパッケージングから始めます丁寧に作られたSwiftパッケージはデベロッパの導入を容易にします次にプロトコルを実装しますモデルを記述する型とそれを実行するExecutorを定義します次に実装方法についてお話ししますサーバーベースモデルの認証ベストプラクティスを含めてご紹介します最後はカスタマイズです調整が必要ならプロトコルの構成要素をニーズに合わせて変更できますそれが可能ですレスポンスメタデータの添付からまったく新しいモダリティの定義まで対応できますまずはパッケージングです Swift Package Managerの使用をお勧めしますデベロッパがパッケージを簡単に追加できるようにアプリの依存関係として設定できます Package.swiftの設定方法とリリースの公開方法をご説明します重要な考慮事項としてサポートするプラットフォームがあります Foundation ModelはiOS macOS visionOS watchOSをサポートしデベロッパがさまざまな体験を作れるようにしています同様のサポートをお勧めします Foundation Modelフレームワークはオープンソースとして公開されるためパッケージはデベロッパにとっても役立つ可能性がありますサーバーにSwiftをデプロイする Linuxのサポートもご検討ください 3つ目は依存関係です依存関係はすべてバイト数に換算されデベロッパがユーザーに配布しますパッケージにリンクされる依存関係を慎重に検討してください
パッケージの公開はgitタグの作成と同じくらい簡単です Swift Package Managerは分散型なのでリポジトリのURLが配布チャネルになりますデベロッパはURLを貼り付けて Xcodeでモデルの組み込みを始めアプリに統合できます詳細は「Creating Swift Packages」をご覧くださいパッケージの準備ができたらプロトコルに移りますプロトコルはモデルと Foundation Modelフレームワークをつなぐ橋ですプロトコルには2つの重要な要素があります 1つ目はLanguageModelです frameworkにモデルを記述しますモデルの機能を宣言し capabilitiesを通じて frameworkがモデルのExecutorを設定するために必要な configurationを提供します
2つ目はLanguageModelExecutorで実際の処理が行われます Configurationを受け取る initializerがあり最初のリクエスト前にリソースを準備する prewarm関数と respond関数がありますセッションにストリーミングで生成結果を返します
Configurationは2つの型をつなぐものです Modelが提供し frameworkがそれを使って Executorを構築しますコードでプロトコルを確認したのでモデルのconfigurationがどのようにExecutorと連携するかを理解しましょう各セッションはExecutorストアを持ちます Model1が到着すると frameworkはモデルのconfigurationでストアを検索しますが一致するExecutorが見つかりませんそのためLanguageModelSessionが新しいExecutorを作成して保存します Model2は同じconfigurationを生成し ConfigurationはHashableなので frameworkは一致することを認識し同じExecutorを返しますルックアップキーはモデルではなく configurationです Model3は異なるconfigurationを生成するため独自のExecutorが割り当てられますユニークなconfigurationはストア内の1つのExecutorに対応します
コードではどのように見えるでしょうか LanguageModelの実装例です capabilitiesを宣言し frameworkがExecutorを見つけるために使う configurationを返します
Executorが実際の処理を担いますウェイトの読み込みリソース管理セッションへのトークンのストリーミングです frameworkはモデルが提供する configurationから構築しリクエストのたびにモデルを渡しますこの分離によりModelの構築がシンプルに保たれますセッションが解放されるとストアも解放されます保存されたすべてのExecutorが解放され deinitが実行されウェイトが解放され接続がクローズされますすべて自動的に行われますクリーンアップコードを自分で書く必要はありませんそのライフサイクル内で Executorにはもうひとつのfunctionがあります prewarmですリクエストが届く前にデベロッパはframeworkに prewarmを依頼できますコストのかかるセットアップを事前に行うチャンスですウェイトの読み込みや接続のオープンなど最初のレスポンスを遅くする可能性のある処理です使い方を見てみましょう 1つのアプローチはそのセットアップをプライベートヘルパーに入れてウェイトを一度だけ読み込んでキャッシュすることです prewarmはヘルパーをすぐに呼び出し最初のリクエストが届く前にウェイトを準備しますただしprewarmの実行は保証されません
いずれにしてもウェイトは一度だけ読み込まれ Executorにコストのかかるセットアップがない場合サーバーバックモデルのような場合 prewarmは何もしなくて構いません respond functionが呼ばれると Executorが動作を開始します会話のトランスクリプトをモデルが期待する形式に変換しますデベロッパが設定したオプションを適用し生成イベントをセッションにストリーミングします
デベロッパの側から見るとセッションがインタラクションのすべての窓口ですモデルを初期化しセッションを作成して respondを呼んで待ちます ExecutorとパッケージのすべてはS セッションの内側にあり見えませんデベロッパはその仕組みを見ることはありませんが裏側で起きていることはこうです frameworkはトランスクリプトのエントリを渡しますが推論エンジンはネイティブ型しか処理できませんそのためExecutorが中間に入りエントリを推論エンジンが理解できるメッセージに変換して推論のために渡します推論エンジンが応答すると同じ変換が逆向きに実行されますメッセージをトランスクリプトエントリに戻しセッションにストリーミングします
まずトランスクリプトに注目しましょう Executorとの間を流れるものです
トランスクリプトはこれまでの会話でエントリのシーケンスとして表されます各エントリにはロールがありますデベロッパが設定するInstructions ユーザーからのPrompts モデルが行ったtool calls それらが返したoutputs そしてモデルが生成したResponsesです
俯瞰して見ると Executorの役割は各トランスクリプトエントリを推論エンジンが読めるメッセージに変換することですトランスクリプトの中身はどうなっているか Foundation Modelは 6つのエントリ型を定義しています
モデルは独自のロールを定義します Executorの役割はその2つをマッピングすることですモデルの形式がどうであってもこの例ではInstructions、 Prompt、Responseが system、user、assistantにマッピングされます tool calls、tool outputs そしてreasoningもassistantにマッピングされますこれらはモデルのターン中に行ったことの一部でありこのモデルにはこれらの専用ロールがないため assistantにマッピングしますモデルが専用のtoolロールを定義している場合はそちらにルーティングできますいずれにしても Executorがコントロールできます Executorは会話を読み取りますしかし各リクエストは履歴以上のものを含んでいますモデルがどのように応答すべきかというデベロッパの意図が込められており 2つの追加プロパティで表現されています
すべてのrequestオブジェクトには ContextOptionsを含めることができますそしてGenerationOptionsも ContextOptionsはプロンプトの内容を制御しますモデルに使わせたい推論レベルやレスポンスのスキーマなどです GenerationOptionsはデコーダループを制御しますサンプリング戦略 temperature 最大レスポンス長などです
respondの内側での見え方はこうです両方のオプションがリクエストで届き Executorがそれを取り出しモデル呼び出し時に渡します以上が受け取る側のすべてですトランスクリプト、オプション、すべて解析済みです次はデベロッパが目にするレスポンスですレスポンス側ではいくつか送るものがあります推論エンジンが生成するテキスト tool callsやreasoningもそれに付随するメタデータもすべてチャネルのイベントとして送られます推論エンジンが出力する各チャンクトークンやtool-callのフラグメントがイベントになります textDelta、toolCallDeltaなどです frameworkはそれらをトランスクリプトに書き込みます Foundation Modelはワンショットとストリーミングの両方のレスポンスを提供しますが実装は常にストリーミングでワンショットAPIは内部でデルタを収集します
ここまでモデル側から見てきましたモデルが生成するにつれてイベントが送られます少しデベロッパの立場になってみましょう respondを呼んで待っています最初に何が必要でしょうか
デベロッパとのExecutorのハンドシェイクです意図的な順序がありますまずメタデータの更新デベロッパがログやデバッグに使えるモデルIDとリクエストIDです次に使用量の更新課金のためのプロンプトトークン数ですこれらを最初に送ることでデベロッパはストリーム全体を待たずに各リクエストのコストを把握できます最後に、モデルが生成する各トークンに対して到着した瞬間にtext deltaを送ります frameworkはそれらのデルタを到着次第セッションにストリーミングするためユーザーはレスポンスを一度にではなく単語ごとに見ることができます先ほどframeworkがconfigurationで Executorをキャッシュする仕組みを説明しましたステートフルな連携の場合呼び出し間でKVキャッシュや永続セッションを保持する場合そのキャッシュによってネットワーク負荷を最小化し作業の重複を避けられますそれを活用するための設計と Executorが呼び出し間で作業を保持する方法を見ていきます Executorはrespondの呼び出しごとに完全なトランスクリプトを受け取ります前回処理した内容です instruction prompt そして生成したresponseです次の呼び出しが来ると新しいトランスクリプトを前回保存したものと比較しますほとんどの場合新しいエントリが追加されているだけで最後のresponseの後に新しいpromptが来ていますその場合は既存の状態を保持して新しい部分だけを処理できますしかし比較でエントリが削除または変更されていることがわかる場合もあります例えばデベロッパがコンテキストを節約するために古いエントリを削除した場合ですその場合トランスクリプトが分岐した箇所まで無効化が必要です frameworkはすべての呼び出しで完全なトランスクリプトを提供します Executorが何を一致とみなすか変更の扱い方を決めますモデルがデベロッパの要求に完全に応えられない場合もありますその場合Executorには 2つの選択肢があります近似するかthrowするかですできる限り柔軟に対応してデベロッパの意図を尊重してくださいしかし正直な近似ができない場合もありますデベロッパがトークン上限を設定しつつ必須フィールドのあるスキーマを指定した場合両方を満たす方法がない場合がありますその場合はthrowします Foundation ModelはLanguageModel Errorをまさにこのために提供していますコンテキストウィンドウのオーバーフローレート制限、拒否などですこれらをthrowすれば frameworkを使ったことのあるデベロッパならすでに対処法を知っています
組み込みのLanguageModelErrorのケースが状況をカバーしない場合は独自のエラー型を定義しますサービス固有の状況でしか意味をなさない失敗がありますサブスクリプション階層、機能、アカウントの状態などです意図を込めたケース名があればそれをcatchするデベロッパは何が起きたか正確に理解できますカスタムエラーは強力で必要な場面もありますしかし各エラーはデベロッパが新たに学ぶべきケースになります catchしてアプリで処理する必要があります適合する場合は組み込みのLanguageModelErrorを使い独自サービスでしか発生しない失敗にカスタムを残してくださいプロトコル要件の実装が完了しました次に認証の扱い方を説明しますパッケージ作者としての役割はデベロッパが正しいことをしやすくすることです initializerがAPI keyを文字列で受け取る場合デベロッパは最も簡単な方法に流れがちです代わりに、デベロッパが正しいことをできるようにトークンプロバイダまたはサインインフローを提供してくださいパッケージがデベロッパのためにアクセストークンを取得する場合は Keychainを使って安全に保存してください認証情報の扱いは話の半分ですデバイスの認証はもう半分ですクラウドベースの LanguageModelパッケージを配布する場合は詳しく調べる価値がありますこの関連セッションではデバイスの検証について説明しています改ざんされたビルドの検出ペイロードの署名 Appleの不正シグナルを使って不正なトラフィックからサービスを守る方法です「Secure your apps with App Attest」でご確認くださいモデルをパッケージ化しプロトコルを実装して認証を処理しました LanguageModelのしっかりとしたパッケージが完成し基本的な要素がすべて揃いました次は差別化の時間ですプロトコルはLanguageModelSessionを形作る余地を提供しますあなたのモデルだけが持つ機能に合わせてですレスポンスメタデータは軽量なオプションでレスポンスに追加情報を添付できますデベロッパが明確にアクセスできる方法を提供します
レスポンスに独自のカスタムメタデータを添付できますここではストリーミング完了後に ExecutorがtokensPerSecondを送信し timeToFirstTokenをチャネルに送りますメタデータの扱いを容易にするユーティリティやドキュメントをデベロッパに提供することをお勧めします明確なキー型付きアクセサ合理的なものであれば何でもメタデータの内部は辞書にすぎません文字列、数値、その他の組み込み型を含められますしかしより柔軟性が必要な場合もあります
カスタムセグメントがその答えです新しいセグメント型を定義して Executorで受け取り同じチャネルを通じて結果をストリーミングしますデベロッパはLanguageModelSessionを離れずに使えますカスタムセグメント型はプロトコルを拡張できます新しいモダリティが登場したとき音声、映像、その先のものもデベロッパはモデルにそのデータを型付きで構造的に送れます仕組みはこうですまずcustom segmentに準拠する型を定義しますカスタムセグメントは PromptRepresentableが必要なためデベロッパはプロンプトに直接渡すことができますテキストと同様にです Executorではトランスクリプトの customSegmentとして受け取りますすでに処理しているテキストエントリと並んで届きますモデルが応答するときチャネルを通じて結果を返します custom segment updateとしてですセグメントIDは新しいセグメントを追加するのかすでにストリーミング中のものを更新するのかを制御しますアプリへの結果のストリーミング方法を完全にコントロールできますカスタムセグメントを使えばもうひとつ取り上げる価値がありますサーバーサイドツールの推奨事項ですサーバーサイドツールはモデルが自律的に実行する機能です Web検索コード実行画像生成などですモデルがそれらを呼び出しサーバーが実行し Executorが結果のストリームを監視します 3つの詳細レベルを説明しますツールの処理をより多く表面化するものです Web検索を例として使いますサーバーサイドツールはモデルの名前付き型付き値ですデベロッパは使いたいツールでモデルを構築します Executorはリクエストごとにモデルを通じてそれらを受け取りますモデルが宣言する他のすべての capabilityと同じ方法でまず最もシンプルなパターンですツールをプライベートに実行して答えだけをストリーミングしますツールはモデルのレスポンスを根拠づけますが処理はExecutorの内側に留まります
追加するテキストデルタはframeworkがトランスクリプトにストリーミングしツールの痕跡は残りませんツールの出力でレスポンスを根拠づけることに加えてレスポンスに追加のメタデータを添付できますテキストデルタがメタデータを持っている場合引用文などです両方をチャネルに転送すると frameworkがトランスクリプトのテキストセグメントにメタデータを添付します
最後にツールの処理自体を表面化することもできますカスタムセグメントでツールの構造化出力をチャネルに転送しますテキストやメタデータと並べてモデルが生成したすべてをアプリに提供します 1つのチャネルを通じて転送するイベント添付するメタデータ設計するカスタムセグメントによりサーバーサイドツールはパッケージを使うアプリがユーザーに何を表示できるかを決めますもう1つ覚えておいてほしいことがありますパッケージを選ぶ側でも配布する側でもチェーン内の全員が背後にあるモデルのプライバシーへの影響を理解する必要がありますオンデバイスとクラウドベースのモデルではプライバシーの特性が大きく異なりますユーザーはどちらを利用するか知る権利がありますモデルをframeworkに組み込む方法を見てきましたこれらのセッションではデベロッパが何を作るかを紹介します「Integrate On-Device AI Models into Your App Using Core AI」ではローカルモデルをアプリに直接バンドルする方法を説明します「Build with the new Apple Foundation Model on Private Cloud Compute」では Appleのプライバシー保証によるサーバースケールの推論を詳しく説明します「Build agentic app experiences with the Foundation Models framework」ではデベロッパがdynamic profilesを使ってマルチステップのツールを使うワークフローをあなたのようなモデルの上に構築する方法を示しますこれからが楽しみです LanguageModelパッケージの活発なエコシステムが育つことを期待しています Swiftデベロッパがアプリに最適なモデルを自由に選べる環境です皆さんの作品を楽しみにしています

import FoundationModels
import MLXFoundationModels

// On-device Apple Foundation Model
let model = SystemLanguageModel()

// Private Cloud Compute model
// let model = PrivateCloudComputeLanguageModel()

// Custom Core AI model
// let model = try await CoreAILanguageModel(resourcesAt: modelURL)

// Open-source MLX model from HuggingFace
// let model = MLXLanguageModel(modelID: "mlx-community/my-model")

let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "...")
print(response.content)

3:46 - Configure Package.swift for your model package

// Package.swift

let package = Package(
    name: "MyModel",
    platforms: [
        .macOS(.v27), .iOS(.v27), .visionOS(.v27), .watchOS(.v27)
    ],
    products: [
        .library(name: "MyModel", targets: ["MyModel"])
    ],
    dependencies: [
        .package(url: "...", .upToNextMinor(from: "1.0.0"))
    ],
    targets: [
        .target(name: "MyModelRuntime"),
        // public: LanguageModel conformance
        .target(name: "MyModel", dependencies: ["MyModelRuntime"]),
        .testTarget(name: "MyModelTests", dependencies: ["MyModel"])
    ]
)

4:56 - LanguageModel and LanguageModelExecutor protocols

// LanguageModel protocol

public protocol LanguageModel: Sendable {
    var capabilities: LanguageModelCapabilities { get }
    var executorConfiguration: Executor.Configuration { get }
}

// LanguageModelExecutor protocol

public protocol LanguageModelExecutor: Sendable {
    init(configuration: Configuration) throws
    func prewarm(model: Model, transcript: Transcript)
    func respond(
        to request: LanguageModelExecutorGenerationRequest,
        model: Model,
        streamingInto channel: LanguageModelExecutorGenerationChannel
    ) async throws
}

6:25 - Implement LanguageModel and Executor conformances

// LanguageModel conformance
public struct MyLanguageModel: LanguageModel {
    typealias Executor = MyLanguageModelExecutor

    public var capabilities: LanguageModelCapabilities {
        LanguageModelCapabilities(capabilities: [
            .toolCalling, .guidedGeneration, .reasoning
        ])
    }

    public var executorConfiguration: Executor.Configuration {
        Executor.Configuration(/* ... */)
    }
}

// Executor conformance
public struct MyLanguageModelExecutor: LanguageModelExecutor {
    public typealias Model = MyLanguageModel

    public struct Configuration: Hashable, Sendable { /* ... */ }

    public init(configuration: Configuration) throws { /* ... */ }

    public func respond(
        to request: LanguageModelExecutorGenerationRequest,
        model: MyLanguageModel,
        streamingInto channel: LanguageModelExecutorGenerationChannel
    ) async throws { /* ... */ }
}

7:28 - Manage model resources with prewarm and respond

// One approach to managing resources

struct MyLanguageModelExecutor: LanguageModelExecutor {

    private mutating func loadModelIfNeeded() throws -> LoadedWeights {
        let weights = try loadedModel ?? loadWeights()
        loadedModel = weights
        return weights
    }

    func prewarm(transcript: Transcript) {
        loadedModel = try? loadModelIfNeeded()
    }

    func respond( ... ) async throws {
        let weights = try loadModelIfNeeded()
        // ...generate with 'weights'...
    }
}

9:00 - Map Transcript entries to model messages

// Transcript entries

let transcript = Transcript(entries: [
    .instructions( ... ),  // "You are a helpful assistant"

    .prompt( ... ),        // "What's the weather in Pittsburgh?"
    .toolCalls( ... ),     // getWeather(location: "Pittsburgh")
    .toolOutput( ... ),    // 65°F, sunny
    .response( ... ),      // "It's 65°F and sunny in Pittsburgh"

    .prompt( ... ),        // "What's the address of Apple Park?"
    .response( ... ),      // "One Apple Park Way, Cupertino, CA 95014"
])

10:42 - Read generation and context options from the request

// Parse generation and context options

func respond(
    to request: LanguageModelExecutorGenerationRequest,
    model: MyLanguageModel,
    streamingInto channel: LanguageModelExecutorGenerationChannel
) async throws {
    let reasoningLevel = request.contextOptions.reasoningLevel
    let temperature = request.generationOptions.temperature
    let maxTokens = request.generationOptions.maximumResponseTokens
}

11:47 - Stream tokens and metadata through the channel

// Streaming text tokens

func respond( ... ) async throws {
    // 1. Report metadata
    await channel.send(.response(action: .updateMetadata([
        "modelID": "my-model-2026-06-08",
        "requestID": request.id.uuidString
    ])))
    // 2. Report prompt token usage before generating
    await channel.send(.response(action: .updateUsage(
        input: .init(totalTokenCount: promptTokens, cachedTokenCount: cachedTokens),
        output: .init(totalTokenCount: 0, reasoningTokenCount: 0)
    )))
    // 3. Stream text deltas as the model generates
    for try await token in tokens {
        await channel.send(.response(action: .appendText(token)))
    }
}

13:33 - Honor the developer's intent or throw

// Honor the developer's intention where possible

// The developer set sampling: .greedy, but our service only takes temperature
if request.generationOptions.sampling?.kind == .greedy {
    serviceRequest.temperature = 0
}

// Otherwise, throw an error

// The token budget is too small to satisfy the schema
if let schema = request.schema,
   let budget = request.generationOptions.maximumResponseTokens,
   budget < minimumTokens(for: schema) {
    throw LanguageModelError.unsupportedCapability(
        .init(
            capability: .guidedGeneration,
            debugDescription: "Token budget too small to satisfy this schema."
        )
    )
}

13:57 - Built-in errors that any model can throw

// Built-in errors that any model can throw

public enum LanguageModelError: LocalizedError, CustomDebugStringConvertible {
    // Transcript grew past the model's context window. Trim entries and retry.
    case contextSizeExceeded(     )
    // Too many requests in a short window. Space them out or reduce load.
    case rateLimited(     )
    // Model declined to answer. Fall back to a message of your choosing.
    case refusal(     )
    // Safety guardrails tripped on the prompt or the response.
    case guardrailViolation(     )
    // Model lacks a feature you used, such as guided generation or tools.
    case unsupportedCapability(     )
    // Prompt contains content the model can't process (bad files, unknown formats).
    case unsupportedTranscriptContent(     )
    // A generation guide (e.g., a regex pattern) isn't supported by this model.
    case unsupportedGenerationGuide(     )
    // Prompt asked for output in a language or locale the model doesn't support.
    case unsupportedLanguageOrLocale(     )
    // Request timed out before the model produced a response.
    case timeout(     )
}

14:14 - Handle errors from your model executor

// Custom errors

public enum MyModelError: Error, LocalizedError {
    // User hit monthly token limit. Prompt upgrade or wait for reset.
    case exceededSubscriptionTierLimit
    // Model variant isn't enabled on this account.
    case modelNotProvisioned
    // Billing or policy review locked this account.
    case accountSuspended

    public var errorDescription: String? {
        switch self {
        case .exceededSubscriptionTierLimit:
            String(localized: "Your plan limit has been reached.")
        // ...
        }
    }
}

16:08 - Attach custom metadata to responses

// Attach service-specific performance metadata

let elapsed = Date().timeIntervalSince(startTime)
let tokensPerSecond = Double(tokenCount) / elapsed
let timeToFirstToken = firstTokenTime?.timeIntervalSince(startTime) ?? 0

await channel.send(.metadataUpdate([
    "tokensPerSecond": tokensPerSecond,
    "timeToFirstToken": timeToFirstToken
]))

17:05 - Define and use custom Transcript segments

// Define a custom segment
public struct AudioSegment: Transcript.CustomSegment {
    public var id: String
    public var content: URL
}

// Pass it in a prompt
let recording = AudioSegment(id: UUID().uuidString, content: URL(filePath: "/path/to/recording.m4a"))
let response = try await session.respond {
    "Where was Frank Lloyd Wright's original architecture school located?"
    recording
}

// Emit a custom segment from the executor
for try await event in stream {
    switch event {
    case .audioFileGenerated(let file):
        await channel.send(.response(action: .updateCustomSegment(
            AudioSegment(id: file.id, content: file.url)
        )))
    }
}

18:09 - Implement server-side tools in your model

// Configure server-side tools
public struct MyLanguageModel: LanguageModel {
    public struct ServerTool: Sendable {
        public static let webSearch: ServerTool = ...
    }
    public init(serverTools: [ServerTool] = []) { }
}

// Surface tool results through the channel
let client = MyServerClient(serverTools: model.serverTools)
let response = try await client.send(prompt: .init(request))
for try await chunk in response {
    switch chunk {
    case .webSearch(let webSearch):
        await channel.send(.response(action: .updateCustomSegment(
            WebSearchSegment(url: webSearch.url, content: webSearch.html)
        )))
    case .textDelta(let textDelta):
        await channel.send(.response(action: .appendText(
            textDelta.text, tokenCount: textDelta.tokenCount
        )))
    }
}

- 0:00 - Introduction
- Overview of the Foundation Models framework opening to nearly any LLM. Covers improvements to the on-device System Language Model, three new model options (Private Cloud Compute, Core AI, and MLX), upcoming Anthropic and Google partner integrations, and a code preview showing how any model can be swapped into a LanguageModelSession using the same Swift API.
- 3:37 - Packaging
- How to package your LLM provider as a Swift package — configuring Package.swift with the right platform targets (iOS, macOS, visionOS, watchOS, and Linux), being deliberate about dependencies to minimize shipped bytes, and publishing a release via a git tag that developers can paste directly into Xcode.
- 4:48 - Protocol
- The two core protocol types bridging your model to the framework: LanguageModel (declares capabilities and provides a Configuration) and LanguageModelExecutor (handles prewarm, translates Transcript entries to your inference engine's native format, applies ContextOptions and GenerationOptions, and streams responses with metadata-first ordering). Covers executor caching by configuration and KV cache state reuse across calls, plus how to approximate unsupported options or throw LanguageModelError when needed.
- 14:50 - Authentication
- Best practices for credential handling — designing initializers that guide developers toward secure usage rather than plain API key strings, persisting tokens securely via Keychain, and using App Attest for device attestation to verify devices, catch tampered builds, and protect cloud-based language model services.
- 15:51 - Customization
- How to differentiate your model package beyond the protocol fundamentals — attaching custom response metadata (e.g., tokensPerSecond, timeToFirstToken), defining custom segment types for new input and output modalities (audio, video, and beyond), and implementing server-side tools (web search, code execution, image generation) at three levels of visibility: privately grounded, metadata-enriched, or fully surfaced through custom segments.
- 19:47 - Next steps
- Privacy considerations when choosing or shipping a model package — on-device versus cloud-based models have very different characteristics and users deserve to know which they're getting. Pointers to companion sessions on Core AI model integration, Private Cloud Compute, and building agentic app experiences on top of the new model ecosystem.

「今すぐ始める」を詳しく見る

最新情報

プラットフォームを詳しく見る

特集

テクノロジーを詳しく見る

特集

コミュニティを詳しく見る

特集

ドキュメントを詳しく見る

リリースノート

ダウンロードを詳しく見る

特集

サポートを詳しく見る

特集

クイックリンク

関連する章

リソース

関連ビデオ

WWDC26