image-ocr

@benchflow-ai/image-ocr

benchflow-ai

230

165 forks

Updated 1/18/2026

View on GitHub

Extract text content from images using Tesseract OCR via Python

Installation

$skills install @benchflow-ai/image-ocr

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorybenchflow-ai/skillsbench

Pathtasks/jpg-ocr-stat/environment/skills/image-ocr/SKILL.md

Branchmain

Scoped Name@benchflow-ai/image-ocr

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions

name: image-ocr description: Extract text content from images using Tesseract OCR via Python

Image OCR Skill

Purpose

This skill enables accurate text extraction from image files (JPG, PNG, etc.) using Tesseract OCR via the pytesseract Python library. It is suitable for scanned documents, screenshots, photos of text, receipts, forms, and other visual content containing text.

When to Use

Extracting text from scanned documents or photos
Reading text from screenshots or image captures
Processing batch image files that contain textual information
Converting visual documents to machine-readable text
Extracting structured data from forms, receipts, or tables in images

Required Libraries

The following Python libraries are required:

import pytesseract
from PIL import Image
import json
import os

Input Requirements

File formats: JPG, JPEG, PNG, WEBP
Image quality: Minimum 300 DPI recommended for printed text; clear and legible text
File size: Under 5MB per image (resize if necessary)
Text language: Specify if non-English to improve accuracy

Output Schema

All extracted content must be returned as valid JSON conforming to this schema:

{
  "success": true,
  "filename": "example.jpg",
  "extracted_text": "Full raw text extracted from the image...",
  "confidence": "high|medium|low",
  "metadata": {
    "language_detected": "en",
    "text_regions": 3,
    "has_tables": false,
    "has_handwriting": false
  },
  "warnings": [
    "Text partially obscured in bottom-right corner",
    "Low contrast detected in header section"
  ]
}

Field Descriptions

success: Boolean indicating whether text extraction completed
filename: Original image filename
extracted_text: Complete text content in reading order (top-to-bottom, left-to-right)
confidence: Overall OCR confidence level based on image quality and text clarity
metadata.language_detected: ISO 639-1 language code
metadata.text_regions: Number of distinct text blocks identified
metadata.has_tables: Whether tabular data structures were detected
metadata.has_handwriting: Whether handwritten text was detected
warnings: Array of quality issues or potential errors

Code Examples

Basic OCR Extraction

import pytesseract
from PIL import Image

def extract_text_from_image(image_path):
    """Extract text from a single image using Tesseract OCR."""
    img = Image.open(image_path)
    text = pytesseract.image_to_string(img)
    return text.strip()

OCR with Confidence Data

import pytesseract
from PIL import Image

def extract_with_confidence(image_path):
    """Extract text with per-word confidence scores."""
    img = Image.open(image_path)

    # Get detailed OCR data including confidence
    data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)

    words = []
    confidences = []

    for i, word in enumerate(data['text']):
        if word.strip():  # Skip empty strings
            words.append(word)
            confidences.append(data['conf'][i])

    # Calculate average confidence
    avg_confidence = sum(c for c in confidences if c > 0) / len([c for c in confidences if c > 0]) if confidences else 0

    return {
        'text': ' '.join(words),
        'average_confidence': avg_confidence,
        'word_count': len(words)
    }

Full OCR with JSON Output

import pytesseract
from PIL import Image
import json
import os

def ocr_to_json(image_path):
    """Perform OCR and return results as JSON."""
    filename = os.path.basename(image_path)
    warnings = []

    try:
        img = Image.open(image_path)

        # Get detailed OCR data
        data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)

        # Extract text preserving structure
        text = pytesseract.image_to_string(img)

        # Calculate confidence
        confidences = [c for c in data['conf'] if c > 0]
        avg_conf = sum(confidences) / len(confidences) if confidences else 0

        # Determine confidence level
        if avg_conf >= 80:
            confidence = "high"
        elif avg_conf >= 50:
            confidence = "medium"
        else:
            confidence = "low"
            warnings.append(f"Low OCR confidence: {avg_conf:.1f}%")

        # Count text regions (blocks)
        block_nums = set(data['block_num'])
        text_regions = len([b for b in block_nums if b > 0])

        result = {
            "success": True,
            "filename": filename,
            "extracted_text": text.strip(),
            "confidence": confidence,
            "metadata": {
                "language_detected": "en",
                "text_regions": text_regions,
                "has_tables": False,
                "has_handwriting": False
            },
            "warnings": warnings
        }

    except Exception as e:
        result = {
            "success": False,
            "filename": filename,
            "extracted_text": "",
            "confidence": "low",
            "metadata": {
                "language_detected": "unknown",
                "text_regions": 0,
                "has_tables": False,
                "has_handwriting": False
            },
            "warnings": [f"OCR failed: {str(e)}"]
        }

    return result

# Usage
result = ocr_to_json("document.jpg")
print(json.dumps(result, indent=2))

Batch Processing Multiple Images

import pytesseract
from PIL import Image
import json
import os
from pathlib import Path

def process_image_directory(directory_path, output_file):
    """Process all images in a directory and save results."""
    image_extensions = {'.jpg', '.jpeg', '.png', '.webp'}
    results = []

    for file_path in sorted(Path(directory_path).iterdir()):
        if file_path.suffix.lower() in image_extensions:
            result = ocr_to_json(str(file_path))
            results.append(result)
            print(f"Processed: {file_path.name}")

    # Save results
    with open(output_file, 'w') as f:
        json.dump(results, f, indent=2)

    return results

Tesseract Configuration Options

Language Selection

# Specify language (default is English)
text = pytesseract.image_to_string(img, lang='eng')

# Multiple languages
text = pytesseract.image_to_string(img, lang='eng+fra+deu')

Page Segmentation Modes (PSM)

Use --psm to control how Tesseract segments the image:

# PSM 3: Fully automatic page segmentation (default)
text = pytesseract.image_to_string(img, config='--psm 3')

# PSM 4: Assume single column of text
text = pytesseract.image_to_string(img, config='--psm 4')

# PSM 6: Assume uniform block of text
text = pytesseract.image_to_string(img, config='--psm 6')

# PSM 11: Sparse text - find as much text as possible
text = pytesseract.image_to_string(img, config='--psm 11')

Common PSM values:

0: Orientation and script detection (OSD) only
3: Fully automatic page segmentation (default)
4: Single column of text of variable sizes
6: Uniform block of text
7: Single text line
11: Sparse text
13: Raw line

Image Preprocessing

For better OCR accuracy, preprocess images:

from PIL import Image, ImageFilter, ImageOps

def preprocess_image(image_path):
    """Preprocess image for better OCR results."""
    img = Image.open(image_path)

    # Convert to grayscale
    img = img.convert('L')

    # Increase contrast
    img = ImageOps.autocontrast(img)

    # Apply slight sharpening
    img = img.filter(ImageFilter.SHARPEN)

    return img

# Use preprocessed image for OCR
img = preprocess_image("document.jpg")
text = pytesseract.image_to_string(img)

Advanced Preprocessing Strategies

For difficult images (low contrast, faded text, dark backgrounds), try multiple preprocessing approaches:

Grayscale + Autocontrast - Basic enhancement for most images
Inverted - Use ImageOps.invert() for dark backgrounds with light text
Scaling - Upscale small images (e.g., 2x) before OCR to improve character recognition
Thresholding - Convert to binary using img.point(lambda p: 255 if p > threshold else 0) with different threshold values (e.g., 100, 128)
Sharpening - Apply ImageFilter.SHARPEN to improve edge clarity

Multi-Pass OCR Strategy

For challenging images, a single OCR pass may miss text. Use multiple passes with different configurations:

Try multiple PSM modes - Different page segmentation modes work better for different layouts (e.g., --psm 6 for blocks, --psm 4 for columns, --psm 11 for sparse text)
Try multiple preprocessing variants - Run OCR on several preprocessed versions of the same image
Combine results - Aggregate text from all passes to maximize extraction coverage

def multi_pass_ocr(image_path):
    """Run OCR with multiple strategies and combine results."""
    img = Image.open(image_path)
    gray = ImageOps.grayscale(img)

    # Generate preprocessing variants
    variants = [
        ImageOps.autocontrast(gray),
        ImageOps.invert(ImageOps.autocontrast(gray)),
        gray.filter(ImageFilter.SHARPEN),
    ]

    # PSM modes to try
    psm_modes = ['--psm 6', '--psm 4', '--psm 11']

    all_text = []
    for variant in variants:
        for psm in psm_modes:
            try:
                text = pytesseract.image_to_string(variant, config=psm)
                if text.strip():
                    all_text.append(text)
            except Exception:
                pass

    # Combine all extracted text
    return "\n".join(all_text)

This approach improves extraction for receipts, faded documents, and images with varying quality.

Error Handling

Common Issues and Solutions

Issue: Tesseract not found

# Verify Tesseract is installed
try:
    pytesseract.get_tesseract_version()
except pytesseract.TesseractNotFoundError:
    print("Tesseract is not installed or not in PATH")

Issue: Poor OCR quality

Preprocess image (grayscale, contrast, sharpen)
Use appropriate PSM mode for the document type
Ensure image resolution is sufficient (300+ DPI)

Issue: Empty or garbage output

Check if image contains actual text
Try different PSM modes
Verify image is not corrupted

Quality Self-Check

Before returning results, verify:

Output is valid JSON (use json.loads() to validate)
All required fields are present (success, filename, extracted_text, confidence, metadata)
Text preserves logical reading order
Confidence level reflects actual OCR quality
Warnings array includes all detected issues
Special characters are properly escaped in JSON

Limitations

Tesseract works best with printed text; handwriting recognition is limited
Accuracy decreases with decorative fonts, artistic text, or extreme stylization
Mathematical equations and special notation may not extract accurately
Redacted or watermarked text cannot be recovered
Severe image degradation (blur, noise, low resolution) reduces accuracy
Complex multi-column layouts may require custom PSM configuration

Version History

1.0.0 (2026-01-13): Initial release with Tesseract/pytesseract OCR

More by benchflow-ai

View all

fjsp-baseline-repair-with-downtime-and-policy

230

Repair an (often imperfect) Flexible Job Shop Scheduling baseline into a downtime-feasible, precedence-correct schedule while staying within policy budgets and matching the evaluator’s exact metrics and “local minimal right-shift” checks.

temporal-python-testing

230

Test Temporal workflows with pytest, time-skipping, and mocking strategies. Covers unit testing, integration testing, replay testing, and local development setup. Use when implementing Temporal workflow tests or debugging test failures.

locational-marginal-prices

230

Extract locational marginal prices (LMPs) from DC-OPF solutions using dual values. Use when computing nodal electricity prices, reserve clearing prices, or performing price impact analysis.

Managed Package Architecture

230

This skill should be used when the user asks to "design package structure", "create managed package", "configure 2GP", "set up namespace", "version management", or mentions managed package topics like "LMA", "subscriber orgs", or "package versioning". Provides comprehensive guidance for second-generation managed package (2GP) architecture, ISV development patterns, and package lifecycle management.