Expert guide for Apple's modern Speech framework (macOS 26+, iOS 26+) featuring SpeechAnalyzer and SpeechTranscriber for on-device speech-to-text transcription.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: SpeechAnalyzer Framework Expert description: Expert guide for Apple's modern Speech framework (macOS 26+, iOS 26+) featuring SpeechAnalyzer and SpeechTranscriber for on-device speech-to-text transcription. version: 1.0 activation: Activate for queries on Speech framework, SpeechAnalyzer, SpeechTranscriber, on-device speech recognition, audio transcription, or migrating from WhisperKit/SFSpeechRecognizer.
SpeechAnalyzer Framework Expert
This skill provides comprehensive guidance on implementing Apple's modern Speech framework introduced for macOS 26+ and iOS 26+. The framework features SpeechAnalyzer and SpeechTranscriber for on-device speech-to-text transcription, offering 2.2x faster performance than Whisper Large V3 Turbo with out-of-process execution and automatic system updates.
Best Practices
-
Always allocate locale after model download: Call
transcriber.allocate(locale:)only after downloading assets viaAssetInventory.assetInstallationRequest()to avoid "Cannot use modules with unallocated locales" errors. -
Use proper audio format conversion: Convert audio buffers to the analyzer's required format using
AVAudioConverterwithprimeMethod = .noneto prevent timestamp drift. -
Follow the complete setup sequence: Instantiate transcriber → download models → allocate locale → create analyzer → get best format → create AsyncStream → start analyzer → consume results. Skipping steps causes runtime errors.
-
Leverage out-of-process execution: The framework runs transcription in a separate process, eliminating memory limits in your app and providing automatic system updates without app redeployment.
-
Handle both volatile and final results: Process
result.isFinalto distinguish between live transcription updates (volatile) and completed segments (final) for proper UI updates. -
Check locale support before use: Use
SpeechTranscriber.supportedLocalesandinstalledLocalesto verify language availability before creating transcriber instances. -
Implement proper AsyncStream lifecycle management: Create input streams with
AsyncStream<AnalyzerInput>.makeStream()and properly yield converted buffers to the continuation for reliable transcription. -
Use structured concurrency patterns: Leverage async/await and actor isolation for thread-safe transcription state management and result processing.
-
Monitor asset installation progress: Track download progress for large language models to provide user feedback during initial setup.
-
Handle transcription errors gracefully: Catch and handle errors from
analyzer.start()and result iteration to provide fallback behavior when transcription fails.
Speech Framework Guidelines
- Use
SpeechAnalyzeras the main coordinator, passing an array of modules like[SpeechTranscriber] - Always get the best audio format with
SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:)before processing - Feed audio as
AnalyzerInputobjects containing convertedAVAudioPCMBufferinstances - Consume transcription results as
AsyncSequenceusingfor try await result in transcriber.results - Support 10+ languages including English, Spanish, French, German, Italian, Japanese, Korean, Portuguese, Russian, Chinese
- Prefer on-device transcription for privacy-sensitive applications
- Use
reportingOptions: [.volatileResults]for real-time live captioning - Include
attributeOptions: [.audioTimeRange]for precise word-level timing information - Implement proper cleanup by canceling tasks and releasing analyzer references
Examples
Example 1: Basic SpeechAnalyzer Setup
User Prompt: "Show me how to set up SpeechAnalyzer with SpeechTranscriber for English transcription."
Expected Output:
import Speech
@MainActor
class TranscriptionManager {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var inputContinuation: AsyncStream<AnalyzerInput>.Continuation?
func setupTranscription() async throws {
let locale = Locale(identifier: "en-US")
// Step 1: Create transcriber
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange]
)
// Step 2: Download model if needed
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
try await request.downloadAndInstall()
}
// Step 3: CRITICAL - Allocate locale AFTER download
try await transcriber.allocate(locale: locale)
// Step 4: Create analyzer
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Step 5: Get best format
let format = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
print("Best format: \(format)")
// Step 6: Create AsyncStream for input
let (inputSequence, inputBuilder) = AsyncStream<AnalyzerInput>.makeStream()
self.inputContinuation = inputBuilder
// Step 7: Start analyzer
try await analyzer.start(inputSequence: inputSequence)
// Store references
self.transcriber = transcriber
self.analyzer = analyzer
// Step 8: Start consuming results
Task {
for try await result in transcriber.results {
if result.isFinal {
print("Final: \(result.transcription)")
} else {
print("Live: \(result.transcription)")
}
}
}
}
}
Example 2: Audio Buffer Conversion
User Prompt: "How do I convert AVAudioPCMBuffer to the format required by SpeechAnalyzer?"
Expected Output:
import AVFoundation
import Speech
class BufferConverter {
private var converter: AVAudioConverter?
func convertBuffer(_ buffer: AVAudioPCMBuffer,
to format: AVAudioFormat) throws -> AVAudioPCMBuffer {
// Create converter if needed or format changed
if converter == nil || converter?.outputFormat != format {
converter = AVAudioConverter(from: buffer.format, to: format)
// CRITICAL: Prevents timestamp drift
converter?.primeMethod = .none
}
guard let converter = converter else {
throw ConversionError.converterCreationFailed
}
// Calculate output frame capacity
let ratio = format.sampleRate / buffer.format.sampleRate
let outputFrameCapacity = AVAudioFrameCount(ceil(Double(buffer.frameLength) * ratio))
guard let outputBuffer = AVAudioPCMBuffer(
pcmFormat: format,
frameCapacity: outputFrameCapacity
) else {
throw ConversionError.bufferAllocationFailed
}
var error: NSError?
let status = converter.convert(to: outputBuffer, error: &error) { inPackets, outStatus in
outStatus.pointee = .haveData
return buffer
}
guard status != .error, error == nil else {
throw ConversionError.conversionFailed(error)
}
return outputBuffer
}
enum ConversionError: Error {
case converterCreationFailed
case bufferAllocationFailed
case conversionFailed(Error?)
}
}
Example 3: Complete Microphone Transcription
User Prompt: "Create a complete example that captures microphone audio and transcribes it in real-time."
Expected Output:
import AVFoundation
import Speech
import SwiftUI
@MainActor
@Observable
class LiveTranscriptionManager {
var liveText: String = ""
var finalText: String = ""
var isRecording = false
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var audioEngine: AVAudioEngine?
private var inputContinuation: AsyncStream<AnalyzerInput>.Continuation?
private var bufferConverter = BufferConverter()
private var analyzerFormat: AVAudioFormat?
func startTranscription() async throws {
let locale = Locale(identifier: "en-US")
// Setup transcriber and analyzer
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange]
)
// Download model if needed
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
try await request.downloadAndInstall()
}
// Allocate locale
try await transcriber.allocate(locale: locale)
// Create analyzer
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Get best format
let format = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
self.analyzerFormat = format
// Create input stream
let (inputSequence, inputBuilder) = AsyncStream<AnalyzerInput>.makeStream()
self.inputContinuation = inputBuilder
// Start analyzer
try await analyzer.start(inputSequence: inputSequence)
self.transcriber = transcriber
self.analyzer = analyzer
// Start consuming results
Task {
for try await result in transcriber.results {
if result.isFinal {
self.finalText += result.transcription + " "
self.liveText = ""
} else {
self.liveText = result.transcription
}
}
}
// Setup audio engine
try setupAudioEngine()
isRecording = true
}
private func setupAudioEngine() throws {
let audioEngine = AVAudioEngine()
let inputNode = audioEngine.inputNode
let inputFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 4096, format: inputFormat) { [weak self] buffer, time in
guard let self = self,
let analyzerFormat = self.analyzerFormat else { return }
do {
// Convert buffer to analyzer format
let convertedBuffer = try self.bufferConverter.convertBuffer(
buffer,
to: analyzerFormat
)
// Feed to analyzer
let input = AnalyzerInput(buffer: convertedBuffer)
self.inputContinuation?.yield(input)
} catch {
print("Buffer conversion error: \(error)")
}
}
audioEngine.prepare()
try audioEngine.start()
self.audioEngine = audioEngine
}
func stopTranscription() {
audioEngine?.stop()
audioEngine?.inputNode.removeTap(onBus: 0)
inputContinuation?.finish()
audioEngine = nil
analyzer = nil
transcriber = nil
isRecording = false
}
}
// SwiftUI View
struct LiveTranscriptionView: View {
@State private var manager = LiveTranscriptionManager()
var body: some View {
VStack(spacing: 20) {
Text("Live Transcription")
.font(.title)
ScrollView {
VStack(alignment: .leading, spacing: 10) {
if !manager.finalText.isEmpty {
Text(manager.finalText)
.foregroundColor(.primary)
}
if !manager.liveText.isEmpty {
Text(manager.liveText)
.foregroundColor(.secondary)
.italic()
}
}
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
}
.frame(maxHeight: 300)
.background(Color(.systemGray6))
.cornerRadius(10)
Button(manager.isRecording ? "Stop" : "Start Recording") {
Task {
if manager.isRecording {
manager.stopTranscription()
} else {
try? await manager.startTranscription()
}
}
}
.buttonStyle(.borderedProminent)
}
.padding()
}
}
Example 4: Locale Management and Model Download
User Prompt: "How do I check which languages are supported and download models for a specific locale?"
Expected Output:
import Speech
@MainActor
class LocaleManager {
func checkAndDownloadLocale(_ localeIdentifier: String) async throws -> Bool {
let locale = Locale(identifier: localeIdentifier)
// Check if locale is supported
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(locale) else {
print("Locale \(localeIdentifier) is not supported")
return false
}
// Check if already installed
let installed = await SpeechTranscriber.installedLocales
if installed.contains(locale) {
print("Locale \(localeIdentifier) is already installed")
return true
}
// Create temporary transcriber to check requirements
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [],
attributeOptions: []
)
// Request installation
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
print("Downloading models for \(localeIdentifier)...")
// Monitor progress (optional)
Task {
for try await progress in request.progress {
print("Download progress: \(progress.fractionCompleted * 100)%")
}
}
try await request.downloadAndInstall()
print("Installation complete")
return true
}
return true // Already available
}
func listAvailableLocales() async {
let supported = await SpeechTranscriber.supportedLocales
let installed = await SpeechTranscriber.installedLocales
print("Supported locales:")
for locale in supported {
let status = installed.contains(locale) ? "✓ Installed" : "⬇ Available"
print(" \(locale.identifier) - \(status)")
}
}
}
Example 5: Error Handling and Recovery
User Prompt: "What are common errors when using SpeechAnalyzer and how do I handle them?"
Expected Output:
import Speech
@MainActor
class RobustTranscriptionManager {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
enum TranscriptionError: Error {
case localeNotAllocated
case analyzerStartFailed
case modelNotDownloaded
case audioFormatMismatch
case transcriptionFailed(Error)
var userMessage: String {
switch self {
case .localeNotAllocated:
return "Language not initialized. Please try again."
case .analyzerStartFailed:
return "Failed to start transcription service."
case .modelNotDownloaded:
return "Language model needs to be downloaded first."
case .audioFormatMismatch:
return "Audio format is incompatible with transcription."
case .transcriptionFailed(let error):
return "Transcription error: \(error.localizedDescription)"
}
}
}
func safeSetupTranscription(locale: Locale) async throws {
do {
// Step 1: Verify locale support
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(locale) else {
throw TranscriptionError.modelNotDownloaded
}
// Step 2: Create transcriber
let transcriber = SpeechTranscriber(
locale: locale,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange]
)
// Step 3: Download model with error handling
if let request = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
do {
try await request.downloadAndInstall()
} catch {
throw TranscriptionError.modelNotDownloaded
}
}
// Step 4: Allocate locale with retry
do {
try await transcriber.allocate(locale: locale)
} catch {
// Common error: "Cannot use modules with unallocated locales"
throw TranscriptionError.localeNotAllocated
}
// Step 5: Create analyzer
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Step 6: Get format and validate
let format = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
guard format.sampleRate > 0 else {
throw TranscriptionError.audioFormatMismatch
}
// Step 7: Create input stream
let (inputSequence, _) = AsyncStream<AnalyzerInput>.makeStream()
// Step 8: Start analyzer with error handling
do {
try await analyzer.start(inputSequence: inputSequence)
} catch {
throw TranscriptionError.analyzerStartFailed
}
self.transcriber = transcriber
self.analyzer = analyzer
// Step 9: Consume results with error handling
Task {
do {
for try await result in transcriber.results {
processResult(result)
}
} catch {
print("Result iteration error: \(error)")
throw TranscriptionError.transcriptionFailed(error)
}
}
} catch let error as TranscriptionError {
print("Transcription setup failed: \(error.userMessage)")
throw error
} catch {
print("Unexpected error: \(error)")
throw TranscriptionError.transcriptionFailed(error)
}
}
private func processResult(_ result: SpeechTranscriptionResult) {
// Process transcription result
if result.isFinal {
print("Final: \(result.transcription)")
} else {
print("Volatile: \(result.transcription)")
}
}
}
More by pstuart
View allExpert guidance for Nuxt 4 projects with app/ directory structure, Vue 3 Composition API, TypeScript, Vitest, ESLint, and Tailwind CSS 4 with Vite plugin. Use when creating components, composables, tests, or any frontend code. Enforces Tailwind-only styling (no inline styles or custom CSS). Supports Nuxt layers.
Guide for building iOS apps using Swift 6, iOS 18+, SwiftUI, SwiftData, and modern concurrency patterns. Use when writing Swift/iOS code, designing app architecture, or modernizing legacy patterns. Prevents outdated patterns like Core Data, ObservableObject, DispatchQueue, and NavigationView.
Guidelines and templates for writing effective unit tests with XCTest, including test-driven development practices and mocking techniques.
Comprehensive guide to Fastlane automation for iOS and Android app deployment. Use when helping with mobile app releases, code signing, screenshots, TestFlight, App Store Connect, Google Play Store, beta distribution, CI/CD pipelines, or any Fastlane actions/lanes. Covers gym, match, pilot, deliver, supply, snapshot, screengrab, and 100+ other actions.
