skill-validator

@panaversity/skill-validator

23 forks

Updated 3/31/2026

Validates skills against production-level criteria with 9-category scoring. This skill should be used when reviewing, auditing, or improving skills to ensure quality standards. Evaluates structure, content, user interaction, documentation, domain standards, technical robustness, maintainability, zero-shot implementation, and reusability. Returns actionable validation report with scores and improvement recommendations.

Installation

$npx agent-skills-cli install @panaversity/skill-validator

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorypanaversity/claude-code-skills-lab

Path.claude/skills/skill-validator/SKILL.md

Branchmain

Scoped Name@panaversity/skill-validator

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions

name: skill-validator description: | Validates skills against production-level criteria with 9-category scoring. This skill should be used when reviewing, auditing, or improving skills to ensure quality standards. Evaluates structure, content, user interaction, documentation, domain standards, technical robustness, maintainability, zero-shot implementation, and reusability. Returns actionable validation report with scores and improvement recommendations.

Skill Validator

Validate any skill against production-level quality criteria.

Validation Workflow

Phase 1: Gather Context

Read the skill's SKILL.md completely
Identify skill type from frontmatter description:
- Builder skill (creates artifacts)
- Guide skill (provides instructions)
- Automation skill (executes workflows)
- Analyzer skill (extracts insights)
- Validator skill (enforces quality)
- Hybrid skill (combination of above)
Read all reference files in references/ directory
Check for assets/scripts directories
Note frontmatter fields (name, description, allowed-tools, model)

Phase 2: Apply Criteria

Evaluate against 9 criteria categories. Each criterion scores 0-3:

0: Missing/Absent
1: Present but inadequate
2: Adequate implementation
3: Excellent implementation

Criteria Categories

1. Structure & Anatomy (Weight: 12%)

Criterion	What to Check
SKILL.md exists	Root file present
Line count	<500 lines (context is precious)
Frontmatter complete	`name` and `description` present in YAML
Name constraints	Lowercase, numbers, hyphens only; ≤64 chars; matches directory
Description format	[What] + [When] format; ≤1024 chars
Description style	Third-person: "This skill should be used when..."
No extraneous files	No README.md, CHANGELOG.md, LICENSE in skill dir
Progressive disclosure	Details in `references/`, not bloated SKILL.md
Asset organization	Templates in `assets/`, scripts in `scripts/`
Large file guidance	If references >10k words, grep patterns in SKILL.md

Fail condition: Missing SKILL.md or >800 lines = automatic fail

2. Content Quality (Weight: 15%)

Criterion	What to Check
Conciseness	No verbose explanations, context is public good
Imperative form	Instructions use "Do X" not "You should do X"
Appropriate freedom	Constraints where needed, flexibility where safe
Scope clarity	Clear what skill does AND does not do
No hallucination risk	No instructions that encourage making up info
Output specification	Clear expected outputs defined

3. User Interaction (Weight: 12%)

Criterion	What to Check
Clarification triggers	Asks questions before acting on ambiguity
Required vs optional	Distinguishes must-know from nice-to-know
Graceful handling	What to do when user doesn't answer
No over-asking	Doesn't ask obvious or inferrable questions
Question pacing	Avoids too many questions in single message
Context awareness	Uses available context before asking

Key pattern to look for:

## Required Clarifications
1. Question about X
2. Question about Y

## Optional Clarifications
3. Question about Z (if relevant)

Note: Avoid asking too many questions in a single message.

4. Documentation & References (Weight: 10%)

Criterion	What to Check
Source URLs	Official documentation links provided
Reference files	Complex details in `references/` not main file
Fetch guidance	Instructions to fetch docs for unlisted patterns
Version awareness	Notes about checking for latest patterns
Example coverage	Good/bad examples for key patterns

Key pattern to look for:

| Resource | URL | Use For |
|----------|-----|---------|
| Official Docs | https://... | Complex cases |

5. Domain Standards (Weight: 10%)

Criterion	What to Check
Best practices	Follows domain conventions (e.g., WCAG, OWASP)
Enforcement mechanism	Checklists, validation steps, must-verify items
Anti-patterns	Lists what NOT to do
Quality gates	Output checklist before delivery

Key pattern to look for:

### Must Follow
- [ ] Requirement 1
- [ ] Requirement 2

### Must Avoid
- Antipattern 1
- Antipattern 2

6. Technical Robustness (Weight: 8%)

Criterion	What to Check
Error handling	Guidance for failure scenarios
Security considerations	Input validation, secrets handling if relevant
Dependencies	External tools/APIs documented
Edge cases	Common edge cases addressed
Testability	Can outputs be verified?

7. Maintainability (Weight: 8%)

Criterion	What to Check
Modularity	References are self-contained topics
Update path	Easy to update when standards change
No hardcoded values	Uses placeholders/variables where appropriate
Clear organization	Logical section ordering

8. Zero-Shot Implementation (Weight: 12%)

Skills should enable single-interaction implementation with embedded expertise.

Criterion	What to Check
Before Implementation section	Context gathering guidance present
Codebase context	Guidance to scan existing structure/patterns
Conversation context	Uses discussed requirements/decisions
Embedded expertise	Domain knowledge in `references/`, not runtime discovery
User-only questions	Only asks for USER requirements, not domain knowledge

Key pattern to look for:

## Before Implementation

Gather context to ensure successful implementation:

| Source | Gather |
|--------|--------|
| **Codebase** | Existing structure, patterns, conventions |
| **Conversation** | User's specific requirements |
| **Skill References** | Domain patterns from `references/` |
| **User Guidelines** | Project-specific conventions |

Red flag: Skill instructs to "research" or "discover" domain knowledge at runtime instead of embedding it.

9. Reusability (Weight: 13%)

Skills should handle variations, not single requirements.

Criterion	What to Check
Handles variations	Not hardcoded to single use case
Variable elements	Clarifications capture what VARIES
Constant patterns	Domain best practices encoded as constants
Not requirement-specific	Avoids hardcoded data, tools, configs
Abstraction level	Appropriate generalization for domain

Good example:

"Create visualizations - adaptable to data shape, chart type, library"

Bad example (too specific):

"Create bar chart with sales data using Recharts"

Key check: Does the skill work for multiple use cases within its domain?

Type-Specific Validation

After scoring general criteria, verify type-specific requirements:

Type	Must Have
Builder	Clarifications, Output Spec, Domain Standards, Output Checklist
Guide	Workflow Steps, Examples (Good/Bad), Official Docs links
Automation	Scripts in `scripts/`, Dependencies, Error Handling, I/O Spec
Analyzer	Analysis Scope, Evaluation Criteria, Output Format, Synthesis
Validator	Quality Criteria, Scoring Rubric, Thresholds, Remediation

Scoring: Deduct 10 points if type-specific requirements missing for identified type.

Scoring Guide

Category Scores

Calculate each category score:

Category Score = (Sum of criterion scores) / (Max possible) * 100

Overall Score

Overall = Σ(Category Score × Weight)

Rating Thresholds

Score	Rating	Meaning
90-100	Production	Ready for wide use
75-89	Good	Minor improvements needed
60-74	Adequate	Functional but needs work
40-59	Developing	Significant gaps
0-39	Incomplete	Major rework required

Output Format

Generate validation report:

# Skill Validation Report: [skill-name]

**Rating**: [Production/Good/Adequate/Developing/Incomplete]
**Overall Score**: [X]/100

## Summary
[2-3 sentence assessment]

## Category Scores

| Category | Score | Weight | Weighted |
|----------|-------|--------|----------|
| Structure & Anatomy | X/100 | 12% | X |
| Content Quality | X/100 | 15% | X |
| User Interaction | X/100 | 12% | X |
| Documentation | X/100 | 10% | X |
| Domain Standards | X/100 | 10% | X |
| Technical Robustness | X/100 | 8% | X |
| Maintainability | X/100 | 8% | X |
| Zero-Shot Implementation | X/100 | 12% | X |
| Reusability | X/100 | 13% | X |
| **Type-Specific Deduction** | -X | - | -X |

## Critical Issues (if any)
- [Issue requiring immediate fix]

## Improvement Recommendations
1. **High Priority**: [Specific action]
2. **Medium Priority**: [Specific action]
3. **Low Priority**: [Specific action]

## Strengths
- [What skill does well]

Quick Validation Checklist

For rapid assessment, check these critical items:

Structure & Frontmatter

SKILL.md <500 lines
Frontmatter: name (≤64 chars, lowercase, hyphens) + description (≤1024 chars)
Description uses third-person style ("This skill should be used when...")
No README.md/CHANGELOG.md in skill directory

Content & Interaction

Has clarification questions (Required vs Optional)
Has output specification
Has official documentation links

Zero-Shot & Reusability

Has "Before Implementation" section (context gathering)
Domain expertise embedded in references/ (not runtime discovery)
Handles variations (not requirement-specific)

Type-Specific (check based on skill type)

Builder: Clarifications + Output Spec + Standards + Checklist
Guide: Workflow + Examples + Docs
Automation: Scripts + Dependencies + Error Handling
Analyzer: Scope + Criteria + Output Format
Validator: Criteria + Scoring + Thresholds + Remediation

If 10+ checked: Likely Production (90+) If 7-9 checked: Likely Good (75-89) If 5-6 checked: Likely Adequate (60-74) If <5 checked: Needs significant work

Reference Files

File	When to Read
`references/detailed-criteria.md`	Deep evaluation of specific criterion
`references/scoring-examples.md`	Example validations for calibration
`references/improvement-patterns.md`	Common fixes for common issues

Usage Examples

Validate a skill

Validate the chatgpt-widget-creator skill against production criteria

Quick audit

Quick validation check on mcp-builder skill

Focused review

Check if skill-creator skill has proper user interaction patterns

More by panaversity

View all

browsing-with-playwright

Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use for web browsing, form submission, web scraping, or UI testing. NOT for static content (use curl/wget).

interview

This skill conducts discovery conversations to understand user intent and agree on approach before taking action. It should be used when the user explicitly calls /interview, asks for recommendations, needs brainstorming, wants to clarify, or when the request could be misunderstood. Prevents building the wrong thing by uncovering WHY behind WHAT.

fetch-library-docs

Fetches official documentation for external libraries and frameworks (React, Next.js, Prisma, FastAPI, Express, Tailwind, MongoDB, etc.) with 60-90% token savings via content-type filtering. Use this skill when implementing features using library APIs, debugging library-specific errors, troubleshooting configuration issues, installing or setting up frameworks, integrating third-party packages, upgrading between library versions, or looking up correct API patterns and best practices. Triggers automatically during coding work - fetch docs before writing library code to get correct patterns, not after guessing wrong.

skill-creator-pro

Creates production-grade, reusable skills that extend Claude's capabilities. This skill should be used when users want to create a new skill, improve an existing skill, or build domain-specific intelligence. Gathers context from codebase, conversation, and authentic sources before creating adaptable skills.