Agent SkillsAgent Skills
therealchrisrock

rubric-design

@therealchrisrock/rubric-design
therealchrisrock
1
0 forks
Updated 4/7/2026
View on GitHub

This skill provides guidance on designing effective evaluation rubrics. Use when: - Creating criteria for content or code quality assessment - Defining weights and thresholds for evaluation - Designing check types (pattern-based vs custom) - Structuring rubrics for maintainability and reusability

Installation

$npx agent-skills-cli install @therealchrisrock/rubric-design
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Patheval-framework/skills/rubric-design/SKILL.md
Branchmain
Scoped Name@therealchrisrock/rubric-design

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: rubric-design description: | This skill provides guidance on designing effective evaluation rubrics. Use when:

  • Creating criteria for content or code quality assessment
  • Defining weights and thresholds for evaluation
  • Designing check types (pattern-based vs custom)
  • Structuring rubrics for maintainability and reusability version: 1.0.0

Designing Effective Evaluation Rubrics

Design rubrics that are clear, measurable, and actionable. Good rubrics produce consistent results and provide useful feedback.

Core Principles

Measurable over subjective: Every criterion should have concrete checks that can be evaluated consistently. "Writes well" is bad. "Uses active voice and leads with verbs" is good.

Weighted by importance: Not all criteria are equal. Assign weights that reflect actual impact. Security issues might be 40% of a code review rubric while style is 10%.

Thresholds reflect reality: Set thresholds that match real-world expectations. A brand voice rubric for marketing copy might require 80%, while a security rubric for production code might require 95%.

Actionable feedback: Every check should produce feedback that tells the user exactly how to fix the issue.

Rubric Schema

See references/schema.md for the complete YAML schema reference.

Essential Fields

name: rubric-name          # Identifier (kebab-case)
version: 1.0.0             # Semantic version
description: |             # When to use this rubric
  Evaluates marketing copy for brand voice alignment

scope:
  type: content            # content | behavior | both
  file_patterns:           # Optional file filters
    - "*.md"
    - "app/routes/**/*.tsx"

criteria:
  criterion-name:
    weight: 25             # Percentage (all weights sum to 100)
    threshold: 80          # Minimum score to pass this criterion
    description: "..."     # What this measures
    checks: [...]          # Evaluation checks
    examples:              # Pass/fail examples
      pass: "Good example"
      fail: "Bad example"

passing:
  min_score: 75            # Overall minimum
  required_criteria: []    # Must-pass regardless of score

Check Types

Pattern-Based Checks (Fast, Deterministic)

Use for objective, pattern-matchable criteria:

checks:
  # Absence: Content should NOT contain pattern
  - type: absence
    pattern: "\\b(might|could|potentially)\\b"
    message: "Remove hedge words for more confident tone"

  # Presence: Content MUST contain pattern
  - type: presence
    pattern: "\\b(you|your)\\b"
    message: "Address the reader directly"

  # Pattern: Content should match format
  - type: pattern
    pattern: "^[A-Z]"
    message: "Headlines should start with capital letter"

When to use pattern checks:

  • Detecting forbidden words/phrases
  • Enforcing required elements
  • Validating format/structure
  • Checking naming conventions

Custom Checks (LLM-Evaluated)

Use for nuanced, context-dependent criteria:

checks:
  - type: custom
    prompt: "Does this content lead with action verbs and avoid passive voice?"
    message: "Use active voice and lead with verbs"

  - type: custom
    prompt: "Is the tone confident without being arrogant?"
    message: "Adjust tone: confident but approachable"

When to use custom checks:

  • Evaluating tone/voice
  • Assessing logical flow
  • Checking context-appropriate content
  • Nuanced quality assessments

Criterion Design

Good Criteria Have

  1. Clear name: directness, error-handling, test-coverage
  2. Focused scope: One quality dimension per criterion
  3. Multiple checks: 2-5 checks that together assess the criterion
  4. Concrete examples: Pass and fail that clarify expectations

Common Criterion Categories

Content criteria (what it says):

  • Directness, specificity, accuracy, completeness

Style criteria (how it's written):

  • Tone, voice, formatting, readability

Structural criteria (how it's organized):

  • Hierarchy, flow, sections, navigation

Behavioral criteria (what it does):

  • Error handling, logging, testing, security

Weight Distribution

Weights should reflect actual importance:

# Brand Voice Rubric Example
criteria:
  directness:    { weight: 30 }  # Core brand attribute
  specificity:   { weight: 25 }  # Important for trust
  tone:          { weight: 20 }  # Supports brand
  audience:      { weight: 15 }  # Enables connection
  formatting:    { weight: 10 }  # Nice to have

Guidelines:

  • All weights must sum to 100
  • Most important criterion: 25-40%
  • Supporting criteria: 15-25%
  • Minor criteria: 5-15%
  • Avoid equal weights (shows lack of prioritization)

Threshold Setting

Choose thresholds based on context:

ContextThreshold RangeRationale
Security-critical90-100%Can't compromise
Production code80-90%High standards
Marketing copy70-85%Room for creativity
Draft content60-75%Early feedback

Examples

See examples/ for working rubric examples:

  • brand-voice.yaml - Marketing copy evaluation
  • code-security.yaml - Security audit rubric
  • api-design.yaml - API review criteria

Anti-Patterns

Avoid:

  • Vague criteria: "Code quality" (unmeasurable)
  • Overlapping checks: Testing same thing twice
  • Extreme thresholds: 100% (nothing passes) or 50% (everything passes)
  • Missing examples: Leaves room for interpretation
  • Generic messages: "Fix this" (not actionable)

Prefer:

  • Specific criteria: "Error messages include context and recovery steps"
  • Distinct checks: Each check tests something unique
  • Reasonable thresholds: Based on real-world expectations
  • Clear examples: Both pass and fail cases
  • Actionable messages: "Add error context: what failed and how to fix"