Agent SkillsAgent Skills
therealchrisrock

rubric-design

@therealchrisrock/rubric-design
therealchrisrock
1
0 forks
Updated 4/1/2026
View on GitHub

This skill provides guidance on designing effective evaluation rubrics. Use when: - Creating criteria for content or code quality assessment - Defining weights and thresholds for evaluation - Designing check types (pattern-based vs custom) - Structuring rubrics for maintainability and reusability

Installation

$npx agent-skills-cli install @therealchrisrock/rubric-design
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Patheval-framework/skills/rubric-design/SKILL.md
Branchmain
Scoped Name@therealchrisrock/rubric-design

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: rubric-design description: | This skill provides guidance on designing effective evaluation rubrics. Use when:

  • Creating criteria for content or code quality assessment
  • Defining weights and thresholds for evaluation
  • Designing check types (pattern-based vs custom)
  • Structuring rubrics for maintainability and reusability version: 1.0.0

Designing Effective Evaluation Rubrics

Design rubrics that are clear, measurable, and actionable. Good rubrics produce consistent results and provide useful feedback.

Core Principles

Measurable over subjective: Every criterion should have concrete checks that can be evaluated consistently. "Writes well" is bad. "Uses active voice and leads with verbs" is good.

Weighted by importance: Not all criteria are equal. Assign weights that reflect actual impact. Security issues might be 40% of a code review rubric while style is 10%.

Thresholds reflect reality: Set thresholds that match real-world expectations. A brand voice rubric for marketing copy might require 80%, while a security rubric for production code might require 95%.

Actionable feedback: Every check should produce feedback that tells the user exactly how to fix the issue.

Rubric Schema

See references/schema.md for the complete YAML schema reference.

Essential Fields

name: rubric-name          # Identifier (kebab-case)
version: 1.0.0             # Semantic version
description: |             # When to use this rubric
  Evaluates marketing copy for brand voice alignment

scope:
  type: content            # content | behavior | both
  file_patterns:           # Optional file filters
    - "*.md"
    - "app/routes/**/*.tsx"

criteria:
  criterion-name:
    weight: 25             # Percentage (all weights sum to 100)
    threshold: 80          # Minimum score to pass this criterion
    description: "..."     # What this measures
    checks: [...]          # Evaluation checks
    examples:              # Pass/fail examples
      pass: "Good example"
      fail: "Bad example"

passing:
  min_score: 75            # Overall minimum
  required_criteria: []    # Must-pass regardless of score

Check Types

Pattern-Based Checks (Fast, Deterministic)

Use for objective, pattern-matchable criteria:

checks:
  # Absence: Content should NOT contain pattern
  - type: absence
    pattern: "\\b(might|could|potentially)\\b"
    message: "Remove hedge words for more confident tone"

  # Presence: Content MUST contain pattern
  - type: presence
    pattern: "\\b(you|your)\\b"
    message: "Address the reader directly"

  # Pattern: Content should match format
  - type: pattern
    pattern: "^[A-Z]"
    message: "Headlines should start with capital letter"

When to use pattern checks:

  • Detecting forbidden words/phrases
  • Enforcing required elements
  • Validating format/structure
  • Checking naming conventions

Custom Checks (LLM-Evaluated)

Use for nuanced, context-dependent criteria:

checks:
  - type: custom
    prompt: "Does this content lead with action verbs and avoid passive voice?"
    message: "Use active voice and lead with verbs"

  - type: custom
    prompt: "Is the tone confident without being arrogant?"
    message: "Adjust tone: confident but approachable"

When to use custom checks:

  • Evaluating tone/voice
  • Assessing logical flow
  • Checking context-appropriate content
  • Nuanced quality assessments

Criterion Design

Good Criteria Have

  1. Clear name: directness, error-handling, test-coverage
  2. Focused scope: One quality dimension per criterion
  3. Multiple checks: 2-5 checks that together assess the criterion
  4. Concrete examples: Pass and fail that clarify expectations

Common Criterion Categories

Content criteria (what it says):

  • Directness, specificity, accuracy, completeness

Style criteria (how it's written):

  • Tone, voice, formatting, readability

Structural criteria (how it's organized):

  • Hierarchy, flow, sections, navigation

Behavioral criteria (what it does):

  • Error handling, logging, testing, security

Weight Distribution

Weights should reflect actual importance:

# Brand Voice Rubric Example
criteria:
  directness:    { weight: 30 }  # Core brand attribute
  specificity:   { weight: 25 }  # Important for trust
  tone:          { weight: 20 }  # Supports brand
  audience:      { weight: 15 }  # Enables connection
  formatting:    { weight: 10 }  # Nice to have

Guidelines:

  • All weights must sum to 100
  • Most important criterion: 25-40%
  • Supporting criteria: 15-25%
  • Minor criteria: 5-15%
  • Avoid equal weights (shows lack of prioritization)

Threshold Setting

Choose thresholds based on context:

ContextThreshold RangeRationale
Security-critical90-100%Can't compromise
Production code80-90%High standards
Marketing copy70-85%Room for creativity
Draft content60-75%Early feedback

Examples

See examples/ for working rubric examples:

  • brand-voice.yaml - Marketing copy evaluation
  • code-security.yaml - Security audit rubric
  • api-design.yaml - API review criteria

Anti-Patterns

Avoid:

  • Vague criteria: "Code quality" (unmeasurable)
  • Overlapping checks: Testing same thing twice
  • Extreme thresholds: 100% (nothing passes) or 50% (everything passes)
  • Missing examples: Leaves room for interpretation
  • Generic messages: "Fix this" (not actionable)

Prefer:

  • Specific criteria: "Error messages include context and recovery steps"
  • Distinct checks: Each check tests something unique
  • Reasonable thresholds: Based on real-world expectations
  • Clear examples: Both pass and fail cases
  • Actionable messages: "Add error context: what failed and how to fix"

More by therealchrisrock

View all
Ba Contexts
1

This skill should be used when the user asks about "Ba", "knowledge sharing context", "enabling context for collaboration", "originating ba", "dialoguing ba", "systemizing ba", "exercising ba", "creating space for knowledge creation", "knowledge creation environment", or needs guidance on setting up environments that facilitate specific types of knowledge work.

SECI-GRAI Knowledge Creation
1

This skill should be used when the user asks about "SECI model", "knowledge creation cycle", "tacit vs explicit knowledge", "knowledge conversion", "GRAI framework", "human-AI knowledge collaboration", "socialization externalization combination internalization", "knowledge spiral", "what phase of knowledge creation", or needs to understand which phase of knowledge work a task involves. Provides the theoretical foundation for knowledge management across all contexts.

Knowledge Assets
1

This skill should be used when the user asks about "knowledge assets", "capturing knowledge", "organizational memory", "experiential knowledge", "conceptual knowledge", "systemic knowledge", "routine knowledge", "preserving insights", "knowledge artifacts", or needs guidance on what types of knowledge artifacts to create and maintain in AI-human collaboration.

Extension Interface
1

This skill should be used when the user asks to "extend knowledge-manager", "create a knowledge management plugin for Notion/Obsidian/Linear", "implement SECI for a specific tool", "build a knowledge adapter", or wants to create a tool-specific plugin that inherits from knowledge-manager's theoretical foundation.