rubric-design

@therealchrisrock/rubric-design

0 forks

Updated 4/7/2026

This skill provides guidance on designing effective evaluation rubrics. Use when: - Creating criteria for content or code quality assessment - Defining weights and thresholds for evaluation - Designing check types (pattern-based vs custom) - Structuring rubrics for maintainability and reusability

Installation

$npx agent-skills-cli install @therealchrisrock/rubric-design

Claude Code

Cursor

Copilot

Codex

Antigravity

Details

Repositorytherealchrisrock/basho

Patheval-framework/skills/rubric-design/SKILL.md

Branchmain

Scoped Name@therealchrisrock/rubric-design

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions

name: rubric-design description: | This skill provides guidance on designing effective evaluation rubrics. Use when:

Creating criteria for content or code quality assessment
Defining weights and thresholds for evaluation
Designing check types (pattern-based vs custom)
Structuring rubrics for maintainability and reusability version: 1.0.0

Designing Effective Evaluation Rubrics

Design rubrics that are clear, measurable, and actionable. Good rubrics produce consistent results and provide useful feedback.

Core Principles

Measurable over subjective: Every criterion should have concrete checks that can be evaluated consistently. "Writes well" is bad. "Uses active voice and leads with verbs" is good.

Weighted by importance: Not all criteria are equal. Assign weights that reflect actual impact. Security issues might be 40% of a code review rubric while style is 10%.

Thresholds reflect reality: Set thresholds that match real-world expectations. A brand voice rubric for marketing copy might require 80%, while a security rubric for production code might require 95%.

Actionable feedback: Every check should produce feedback that tells the user exactly how to fix the issue.

Rubric Schema

See references/schema.md for the complete YAML schema reference.

Essential Fields

name: rubric-name          # Identifier (kebab-case)
version: 1.0.0             # Semantic version
description: |             # When to use this rubric
  Evaluates marketing copy for brand voice alignment

scope:
  type: content            # content | behavior | both
  file_patterns:           # Optional file filters
    - "*.md"
    - "app/routes/**/*.tsx"

criteria:
  criterion-name:
    weight: 25             # Percentage (all weights sum to 100)
    threshold: 80          # Minimum score to pass this criterion
    description: "..."     # What this measures
    checks: [...]          # Evaluation checks
    examples:              # Pass/fail examples
      pass: "Good example"
      fail: "Bad example"

passing:
  min_score: 75            # Overall minimum
  required_criteria: []    # Must-pass regardless of score

Check Types

Pattern-Based Checks (Fast, Deterministic)

Use for objective, pattern-matchable criteria:

checks:
  # Absence: Content should NOT contain pattern
  - type: absence
    pattern: "\\b(might|could|potentially)\\b"
    message: "Remove hedge words for more confident tone"

  # Presence: Content MUST contain pattern
  - type: presence
    pattern: "\\b(you|your)\\b"
    message: "Address the reader directly"

  # Pattern: Content should match format
  - type: pattern
    pattern: "^[A-Z]"
    message: "Headlines should start with capital letter"

When to use pattern checks:

Detecting forbidden words/phrases
Enforcing required elements
Validating format/structure
Checking naming conventions

Custom Checks (LLM-Evaluated)

Use for nuanced, context-dependent criteria:

checks:
  - type: custom
    prompt: "Does this content lead with action verbs and avoid passive voice?"
    message: "Use active voice and lead with verbs"

  - type: custom
    prompt: "Is the tone confident without being arrogant?"
    message: "Adjust tone: confident but approachable"

When to use custom checks:

Evaluating tone/voice
Assessing logical flow
Checking context-appropriate content
Nuanced quality assessments

Criterion Design

Good Criteria Have

Clear name: directness, error-handling, test-coverage
Focused scope: One quality dimension per criterion
Multiple checks: 2-5 checks that together assess the criterion
Concrete examples: Pass and fail that clarify expectations

Common Criterion Categories

Content criteria (what it says):

Directness, specificity, accuracy, completeness

Style criteria (how it's written):

Tone, voice, formatting, readability

Structural criteria (how it's organized):

Hierarchy, flow, sections, navigation

Behavioral criteria (what it does):

Error handling, logging, testing, security

Weight Distribution

Weights should reflect actual importance:

# Brand Voice Rubric Example
criteria:
  directness:    { weight: 30 }  # Core brand attribute
  specificity:   { weight: 25 }  # Important for trust
  tone:          { weight: 20 }  # Supports brand
  audience:      { weight: 15 }  # Enables connection
  formatting:    { weight: 10 }  # Nice to have

Guidelines:

All weights must sum to 100
Most important criterion: 25-40%
Supporting criteria: 15-25%
Minor criteria: 5-15%
Avoid equal weights (shows lack of prioritization)

Threshold Setting

Choose thresholds based on context:

Context	Threshold Range	Rationale
Security-critical	90-100%	Can't compromise
Production code	80-90%	High standards
Marketing copy	70-85%	Room for creativity
Draft content	60-75%	Early feedback

Examples

See examples/ for working rubric examples:

brand-voice.yaml - Marketing copy evaluation
code-security.yaml - Security audit rubric
api-design.yaml - API review criteria

Anti-Patterns

Avoid:

Vague criteria: "Code quality" (unmeasurable)
Overlapping checks: Testing same thing twice
Extreme thresholds: 100% (nothing passes) or 50% (everything passes)
Missing examples: Leaves room for interpretation
Generic messages: "Fix this" (not actionable)

Prefer:

Specific criteria: "Error messages include context and recovery steps"
Distinct checks: Each check tests something unique
Reasonable thresholds: Based on real-world expectations
Clear examples: Both pass and fail cases
Actionable messages: "Add error context: what failed and how to fix"

More by therealchrisrock

View all

eval-patterns

This skill provides common evaluation patterns and integration guidance. Use when: - Integrating eval-framework with other plugins - Designing evaluation workflows - Choosing between content vs behavior evaluation - Setting up project-local rubrics

Notion Icon Strategy

This skill should be used when the user asks about "Notion icons", "page icons", "emoji icons", "icon consistency", "visual organization", "icon meaning", "database icons", "icon color coding", "meaningful icons", or needs to apply consistent, meaningful iconography to Notion pages and databases.

Ba Contexts

This skill should be used when the user asks about "Ba", "knowledge sharing context", "enabling context for collaboration", "originating ba", "dialoguing ba", "systemizing ba", "exercising ba", "creating space for knowledge creation", "knowledge creation environment", or needs guidance on setting up environments that facilitate specific types of knowledge work.

SECI-GRAI Knowledge Creation

This skill should be used when the user asks about "SECI model", "knowledge creation cycle", "tacit vs explicit knowledge", "knowledge conversion", "GRAI framework", "human-AI knowledge collaboration", "socialization externalization combination internalization", "knowledge spiral", "what phase of knowledge creation", or needs to understand which phase of knowledge work a task involves. Provides the theoretical foundation for knowledge management across all contexts.