OCR PDFs with docling while tracking per-page text and rasterize PDFs to images. Use for PDF ingestion, page-aware text extraction, rendering pages to images, or inspecting PDF metadata, with outputs saved under a local project directory.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: pdf-viewing description: OCR PDFs with docling while tracking per-page text and rasterize PDFs to images. Use for PDF ingestion, page-aware text extraction, rendering pages to images, or inspecting PDF metadata, with outputs saved under a local project directory.
Pdf Viewing
Overview
Use this skill to OCR PDFs with docling and preserve page numbers, rasterize pages into images, or inspect PDF metadata. Always save outputs under a local project directory (default: ./.pdf-artifacts/<pdf-stem>).
Quick Start
- Create a local venv in the project and install deps as dev dependencies:
uv venv uv add --dev docling typer pymupdf - Run the OCR tool:
uv run python scripts/pdf_tools.py ocr path/to/file.pdf --out-dir ./.pdf-artifacts/<pdf-stem> - Run the rasterizer:
uv run python scripts/pdf_tools.py rasterize path/to/file.pdf --out-dir ./.pdf-artifacts/<pdf-stem>
Tasks
OCR with page tracking
- Command:
uv run python scripts/pdf_tools.py ocr <pdf-path> - Output (default):
./.pdf-artifacts/<pdf-stem>/ocr-pages.jsonandocr-pages.txt - Behavior: Copies the input PDF into the output dir and writes per-page text with page numbers.
Explicit flags
--out-dir <dir>: output directory (relative paths are rooted at the project root)--project-root <dir>: base path for relative output paths--copy-pdf / --no-copy-pdf: control whether the input PDF is copied into the output dir--pages <ranges>: page ranges like1-3,5,8-10--pages-json <name>: JSON filename for per-page text--pages-text <name>: TXT filename for per-page text--manifest <name>: manifest filename (defaultmanifest.json)--overwrite / --no-overwrite: overwrite existing outputs--dry-run: show what would be written
Rasterize to images
- Command:
uv run python scripts/pdf_tools.py rasterize <pdf-path> - Output (default):
./.pdf-artifacts/<pdf-stem>/images/page-0001.jpg(etc.) - Behavior: Renders each PDF page to an image at the requested DPI.
Explicit flags
--out-dir <dir>: output directory (relative paths are rooted at the project root)--project-root <dir>: base path for relative output paths--copy-pdf / --no-copy-pdf: control whether the input PDF is copied into the output dir--pages <ranges>: page ranges like1-3,5,8-10--dpi <int>: rasterization DPI--format <jpg|png|webp>: output image format--quality <1-100>: for jpg/webp--images-dir <name>: directory for output images--images-manifest <name>: JSON manifest name with image paths--manifest <name>: manifest filename (defaultmanifest.json)--overwrite / --no-overwrite: overwrite existing outputs--dry-run: show what would be written
Inspect metadata
- Command:
uv run python scripts/pdf_tools.py inspect <pdf-path> - Output: writes metadata to stdout and
manifest.jsonunless--no-manifest
Clean artifacts
- Command:
uv run python scripts/pdf_tools.py clean <pdf-path> - Behavior: Removes the artifact directory for the PDF.
Notes
- Keep outputs inside the current project by using the default output directory or a relative
--out-dir. - If docling requires extra OCR backends in your environment, install them before running OCR.
Tools
scripts/pdf_tools.py: Typer CLI withocr,rasterize,inspect, andcleancommands.
More by jxnl
View allDelegate codebase exploration or scripted actions to a non-interactive Codex exec run (codex exec / codex e). Use when you want a subagent to read lots of code or take actions without human interaction, and you can accept CLI output and optional file changes as the result.
Create or edit Slidev presentations in the /Users/jasonliu/dev/presentations repo. Use for drafting new decks, editing existing slides, applying repo-specific Slidev conventions, and polishing/animation work. Triggers: Slidev slide requests, layout/components usage, deck setup, or presentation workflow guidance for this repo.
Extract transcripts, titles, and thumbnails from YouTube videos. Use for ingesting video content, capturing captions with timestamps, or downloading video metadata.
