Extract transcripts, titles, and thumbnails from YouTube videos. Use for ingesting video content, capturing captions with timestamps, or downloading video metadata.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: youtube-extractor description: Extract transcripts, titles, and thumbnails from YouTube videos. Use for ingesting video content, capturing captions with timestamps, or downloading video metadata.
YouTube Extractor
Overview
Use this skill to extract transcripts (with timestamps), titles, descriptions, and thumbnails from YouTube videos. Outputs are saved under a local project directory (default: ./.youtube-artifacts/<video-id>).
Quick Start
- Create a local venv and install deps:
uv venv uv add --dev yt-dlp typer - Extract all metadata and transcript:
uv run python scripts/youtube_tools.py extract "https://youtube.com/watch?v=VIDEO_ID" - Get just the transcript:
uv run python scripts/youtube_tools.py transcript "https://youtube.com/watch?v=VIDEO_ID"
Tasks
Extract all (default)
- Command:
uv run python scripts/youtube_tools.py extract <url> - Output:
./.youtube-artifacts/<video-id>/metadata.json: title, description, channel, duration, upload datetranscript.json: captions with timestampstranscript.txt: plain text transcriptthumbnail.jpg: highest resolution thumbnail
- Behavior: Downloads all available metadata and transcript.
Flags
--out-dir <dir>: output directory--no-thumbnail: skip thumbnail download--no-transcript: skip transcript extraction--lang <code>: preferred transcript language (default: en)--overwrite / --no-overwrite: overwrite existing outputs
Transcript only
- Command:
uv run python scripts/youtube_tools.py transcript <url> - Output:
transcript.jsonandtranscript.txt - Behavior: Extracts only captions/subtitles.
Flags
--out-dir <dir>: output directory--lang <code>: preferred language--format <json|txt|both>: output format (default: both)
Metadata only
- Command:
uv run python scripts/youtube_tools.py metadata <url> - Output:
metadata.json - Behavior: Extracts title, description, channel info, duration.
Thumbnail only
- Command:
uv run python scripts/youtube_tools.py thumbnail <url> - Output:
thumbnail.jpg - Behavior: Downloads highest resolution thumbnail available.
Flags
--out-dir <dir>: output directory--quality <best|high|medium|low>: thumbnail quality (default: best)
Storyboard frames
- Command:
uv run python scripts/youtube_tools.py storyboard <url> - Output:
frames/directory with individual timestamped JPGs +storyboard_manifest.json - Behavior: Extracts YouTube's preview thumbnails (used in video scrubber) into individual frames at ~2s intervals.
Flags
--out-dir <dir>: output directory--with-transcript / -t: also extract transcript and align each segment to its nearest frame--lang <code>: transcript language (if using --with-transcript)
Higher quality frames with ffmpeg
Storyboard frames are low-res (~320x180). For full quality frames aligned to transcript timestamps:
-
Download video with yt-dlp:
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]" -o video.mp4 <url> -
Extract frames at specific timestamps from transcript.json:
ffmpeg -ss 00:01:23 -i video.mp4 -frames:v 1 frame_83s.jpg -
Or extract frames at regular intervals:
ffmpeg -i video.mp4 -vf "fps=0.5" frames/frame_%04d.jpg # 1 frame every 2 seconds
Notes
- Requires network access to YouTube.
- Some videos may not have transcripts available; auto-generated captions used as fallback.
- Storyboard command requires
pillow:uv add --dev pillow
Tools
scripts/youtube_tools.py: Typer CLI withextract,transcript,metadata,thumbnail, andstoryboardcommands.
More by jxnl
View allDelegate codebase exploration or scripted actions to a non-interactive Codex exec run (codex exec / codex e). Use when you want a subagent to read lots of code or take actions without human interaction, and you can accept CLI output and optional file changes as the result.
Create or edit Slidev presentations in the /Users/jasonliu/dev/presentations repo. Use for drafting new decks, editing existing slides, applying repo-specific Slidev conventions, and polishing/animation work. Triggers: Slidev slide requests, layout/components usage, deck setup, or presentation workflow guidance for this repo.
OCR PDFs with docling while tracking per-page text and rasterize PDFs to images. Use for PDF ingestion, page-aware text extraction, rendering pages to images, or inspecting PDF metadata, with outputs saved under a local project directory.
