name: ios-vision-ocr
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: ios-vision-ocr version: 0.1.0 author: cmtzco description: iOS Vision Framework OCR for text recognition from images. When to use: when extracting text from images, implementing document scanning, building photo analysis features, working with VNRecognizeTextRequest, or processing captured photos. What problems it solves: provides accurate on-device text recognition, supports multiple languages, handles image preprocessing, and processes images efficiently.
iOS Vision OCR
Overview
The Vision Framework's text recognition capabilities enable iOS apps to extract text from images and photos. This skill covers setting up VNRecognizeTextRequest, configuring image handlers, processing recognition results, and handling different image sources.
Identified Patterns
The following patterns were identified in this codebase that this skill addresses:
Pattern 1: Text Recognition Request Setup
- Location:
DadBrain/DadBrain/ViewModels/PhotoScanner.swift - Description: VNRecognizeTextRequest initialization with configuration
- Frequency: 1 complete implementation
Pattern 2: Image Handler Processing
- Location:
DadBrain/DadBrain/ViewModels/PhotoScanner.swift - Description: VNImageRequestHandler for processing captured images
- Frequency: 1 complete implementation
Pattern 3: Result Extraction
- Location:
DadBrain/DadBrain/ViewModels/PhotoScanner.swift - Description: Processing VNRecognizedTextObservation results
- Frequency: 1 complete implementation
TODO: Implementation
This skeleton skill needs Phase 2 refinement. Areas to develop:
Workflow Instructions
<!-- TODO: Add step-by-step workflow for Vision OCR integration -->- Import Vision framework
- Configure VNRecognizeTextRequest with recognition level
- Create VNImageRequestHandler from image data
- Perform recognition request
- Extract and process text results
Best Practices
<!-- TODO: Research and document Vision Framework best practices -->- Choosing recognition accuracy levels
- Handling multiple text regions
- Processing images efficiently
- Supporting multiple languages
Common Pitfalls
<!-- TODO: Document error-prone areas and how to avoid them -->- Forgetting to set recognition level
- Not handling image orientation
- Processing very large images (performance)
- Missing camera permission
Suggested Resources
Based on codebase analysis, consider adding:
Scripts (scripts/)
vision-text-extractor.swift- Standalone text extraction utilityocr-test-generator.swift- Test image generator for OCR
References (references/)
vision-framework-guide.md- Apple's Vision Framework documentationtext-recognition-config.md- VNRecognizeTextRequest configuration optionsimage-preprocessing.md- Preparing images for optimal OCR results
Assets (assets/)
recognition-templates/- Different recognition scenariostest-images/- Sample images for testing OCR
Evidence from Codebase
| File | Pattern | Relevance |
|---|---|---|
DadBrain/DadBrain/ViewModels/PhotoScanner.swift | Complete Vision OCR integration | Full implementation with result processing |
DadBrain/DadBrain/Resources/Info.plist | Camera permissions | NSCameraUsageDescription, NSPhotoLibraryUsageDescription |
Refinement Priority
Score: 6.6/10 Priority: Medium
Refinement Tasks
- Research Vision Framework documentation
- Create vision-text-extractor script
- Document recognition level options
- Add examples for different image sources
- Create optimization guide for performance
Skeleton generated by ContextHarness /baseline Run Phase 2 skill refinement to complete implementation
More by co-labs-co
View allGuide for implementing services with Protocol-based dependency injection in Python. Use when creating services that interact with external systems (APIs, databases, filesystems). Enables clean testing through mock injection while keeping production code simple.
Enforces conventional commit format for PR titles and commit messages, automating semantic versioning and GitHub releases. Use this skill when writing commit messages, creating PR titles, understanding version bumps, configuring release automation, or troubleshooting commitlint failures.
Configure and manage MCP (Model Context Protocol) servers for AI agent tooling. Use when adding MCP servers, configuring authentication (OAuth 2.1 or API keys), managing opencode.json, implementing token flows, or troubleshooting MCP connections. Covers registry patterns, PKCE authentication, and the Result-based service architecture.
Patterns for building Click CLI applications with Rich output formatting, shell completion, and interactive pickers. Use when creating new CLI commands, adding shell completion, implementing interactive selection, or testing CLI functionality. Covers command groups, formatters, exit codes, and CliRunner testing.
