Agent SkillsAgent Skills
therealchrisrock

eval-patterns

@therealchrisrock/eval-patterns
therealchrisrock
1
0 forks
Updated 4/7/2026
View on GitHub

This skill provides common evaluation patterns and integration guidance. Use when: - Integrating eval-framework with other plugins - Designing evaluation workflows - Choosing between content vs behavior evaluation - Setting up project-local rubrics

Installation

$npx agent-skills-cli install @therealchrisrock/eval-patterns
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Patheval-framework/skills/eval-patterns/SKILL.md
Branchmain
Scoped Name@therealchrisrock/eval-patterns

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: eval-patterns description: | This skill provides common evaluation patterns and integration guidance. Use when:

  • Integrating eval-framework with other plugins
  • Designing evaluation workflows
  • Choosing between content vs behavior evaluation
  • Setting up project-local rubrics version: 1.0.0

Evaluation Patterns & Integration

Common patterns for using the eval-framework effectively in different contexts.

Evaluation Types

Content Evaluation

Evaluates static content: copy, documentation, code files.

Use for:

  • Marketing copy review
  • Documentation quality
  • Code style/patterns
  • Configuration validation

Invocation:

/eval-run brand-voice app/routes/sell-on-vouchline.tsx

Behavior Evaluation

Evaluates actions and outputs: what Claude did, not just what exists.

Use for:

  • Code review after implementation
  • Commit message quality
  • Test coverage verification
  • API response validation

Invocation:

Judge agent triggered: "Review what I just implemented against the code-security rubric"

Combined Evaluation

Evaluates both content and behavior together.

Use for:

  • Full code review (style + security + behavior)
  • Documentation with examples (accuracy + completeness)
  • Feature implementation review

Project-Local Setup

Directory Structure

your-project/
├── .claude/
│   └── evals/
│       ├── brand-voice.yaml      # Project rubrics
│       ├── code-security.yaml
│       └── api-design.yaml

Quick Setup

  1. Create directory: mkdir -p .claude/evals
  2. Create rubric: /eval-create brand-voice --from docs/brand/voice.md
  3. Run evaluation: /eval-run brand-voice

Rubric Discovery

The judge agent automatically discovers rubrics in:

  1. .claude/evals/*.yaml (project-local)
  2. .claude/evals/*.yml (alternate extension)
  3. Explicit paths passed to commands

Integration Patterns

Pattern 1: Post-Implementation Review

After completing significant work, invoke judge for quality check:

User: "I just finished the authentication module"
Claude: [Uses judge agent to evaluate against code-security rubric]

The judge agent's when_to_use description enables proactive triggering after code review requests.

Pattern 2: Command-Based Validation

Explicit validation during development:

/eval-run brand-voice app/routes/sell-on-vouchline.tsx

Returns structured feedback before committing.

Pattern 3: Plugin Integration

Other plugins can invoke the judge programmatically:

## In your plugin's agent/command:

Invoke the eval-framework judge agent with:
- Rubric: [name or path]
- Content: [what to evaluate]
- Context: [additional context]

The judge will return structured evaluation results.

Pattern 4: Pre-Commit Workflow

Manual pre-commit check (not automated hook):

User: "Check my changes before I commit"
Claude: [Runs relevant rubrics against staged files]

Choosing Rubrics

By Content Type

ContentRecommended Rubric
Marketing copybrand-voice
API codecode-security, api-design
Documentationdocs-quality
Test filestest-coverage
Config filesconfig-validation

By Quality Gate

GateThresholdRequired Criteria
Draft review60%None
PR review75%Core criteria
Production85%All security

Rubric Composition

Layered Rubrics

Create focused rubrics that can be run together:

# code-style.yaml - formatting, naming
# code-security.yaml - vulnerabilities
# code-perf.yaml - performance patterns

Run multiple: /eval-run code-style && /eval-run code-security

Domain-Specific Rubrics

Create rubrics for specific features:

# auth-flow.yaml - authentication patterns
# payment-handling.yaml - financial code
# user-input.yaml - input validation

Best Practices

Start Simple: Begin with 2-3 criteria, add more as needed.

Iterate Rubrics: Version your rubrics and refine based on false positives/negatives.

Context Matters: Include file patterns in scope to auto-filter relevant files.

Required vs Optional: Use required_criteria for must-pass items, let others contribute to score.

Actionable Feedback: Every check message should tell how to fix the issue.

Troubleshooting

Rubric not found: Check .claude/evals/ exists and rubric name matches file.

False positives: Refine regex patterns or use custom checks for nuance.

Score too low: Review thresholds - they might be too strict for your context.

Slow evaluation: Reduce custom checks (LLM-evaluated) where pattern checks work.

Reference Files

See references/ for additional patterns:

  • integration-examples.md - Real-world integration examples

More by therealchrisrock

View all
Notion Icon Strategy
1

This skill should be used when the user asks about "Notion icons", "page icons", "emoji icons", "icon consistency", "visual organization", "icon meaning", "database icons", "icon color coding", "meaningful icons", or needs to apply consistent, meaningful iconography to Notion pages and databases.

Ba Contexts
1

This skill should be used when the user asks about "Ba", "knowledge sharing context", "enabling context for collaboration", "originating ba", "dialoguing ba", "systemizing ba", "exercising ba", "creating space for knowledge creation", "knowledge creation environment", or needs guidance on setting up environments that facilitate specific types of knowledge work.

SECI-GRAI Knowledge Creation
1

This skill should be used when the user asks about "SECI model", "knowledge creation cycle", "tacit vs explicit knowledge", "knowledge conversion", "GRAI framework", "human-AI knowledge collaboration", "socialization externalization combination internalization", "knowledge spiral", "what phase of knowledge creation", or needs to understand which phase of knowledge work a task involves. Provides the theoretical foundation for knowledge management across all contexts.

Extension Interface
1

This skill should be used when the user asks to "extend knowledge-manager", "create a knowledge management plugin for Notion/Obsidian/Linear", "implement SECI for a specific tool", "build a knowledge adapter", or wants to create a tool-specific plugin that inherits from knowledge-manager's theoretical foundation.