Tracks task success metrics to improve future model routing. Use when completing any implementation, refactor, bugfix, or testing task.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: effectiveness-tracking description: Tracks task success metrics to improve future model routing. Use when completing any implementation, refactor, bugfix, or testing task.
Effectiveness Tracking Protocol
On Task Completion
After tests pass and task is complete, log effectiveness data for smart routing learning.
Required Metrics
-
Task Pattern: Which pattern did this task follow?
architecture- System design, integration planningmulti-file-refactor- Restructuring 5+ filesbugfix- Fixing broken functionalityfeature- Adding new capability (2-5 files)testing- Writing test coveragedocumentation- Markdown/docs updates
-
Model Used: Which model completed this task
- Example:
claude-sonnet-4.5,claude-opus-4.5,gpt-5
- Example:
-
Prompts Used: Total conversation turns to completion
- Count from task start to passing tests
-
Success: Did tests pass?
trueorfalse
-
Timestamp: When completed
- ISO 8601 format
Storage Format
Append to dev-data/effectiveness-log.jsonl (JSON Lines format):
{"pattern":"multi-file-refactor","model":"claude-sonnet-4.5","prompts":2,"success":true,"timestamp":"2026-01-03T12:00:00Z","files_changed":7}
{"pattern":"bugfix","model":"claude-haiku","prompts":1,"success":true,"timestamp":"2026-01-03T12:15:00Z","files_changed":1}
{"pattern":"feature","model":"claude-opus-4.5","prompts":3,"success":true,"timestamp":"2026-01-03T12:30:00Z","files_changed":4}
Querying Historical Data
Before starting complex tasks, query effectiveness log:
// Example: What's the best model for multi-file refactor?
const logs = readJsonLines('dev-data/effectiveness-log.jsonl');
const pattern = logs.filter(l => l.pattern === 'multi-file-refactor' && l.success);
const byModel = groupBy(pattern, 'model');
const avgPrompts = {
opus: average(byModel.opus.map(l => l.prompts)),
sonnet: average(byModel.sonnet.map(l => l.prompts)),
haiku: average(byModel.haiku.map(l => l.prompts))
};
// Recommend model with lowest average prompts
Integration with Smart Router
The effectiveness log feeds the smart routing recommendation engine:
- User starts task → pattern detected
- Query log for similar past tasks
- Calculate average prompts by model
- Recommend model with best effectiveness (fewest prompts)
- Show cost as secondary info
Example Recommendation
📊 Smart Router Recommendation
Task Pattern: Multi-file refactor (7 files)
Historical Data: 15 similar tasks
Model Performance:
• Opus: avg 2.3 prompts (90% success) ← RECOMMENDED
• Sonnet: avg 4.8 prompts (75% success)
• Haiku: avg 8.2 prompts (50% success)
Cost Delta: +0.8 credits for Opus vs Sonnet
Prompts Saved: ~2.5 prompts (worth the cost)
Auto-Logging
If possible, log automatically on task completion:
// Backend middleware example
app.post('/api/task/complete', (req, res) => {
const log = {
pattern: classifyTask(req.body.description),
model: req.body.model,
prompts: req.body.conversationLength,
success: req.body.testsPassed,
timestamp: new Date().toISOString(),
files_changed: req.body.filesChanged
};
appendToFile('dev-data/effectiveness-log.jsonl', JSON.stringify(log) + '\n');
});
More by mikejsmith1985
View allBest practices for refactoring across multiple files. Use when restructuring logic across 5+ files.
Classifies tasks by complexity pattern for smart routing. Auto-invoked for all implementation requests.
Auto-fetches and displays images from GitHub issues when user requests them. Activates on keywords like "screenshot", "image", "picture" + "issue" or "gh issue".
ALWAYS activate when user says "hello", "hi", or greets. This tests if skills actually load and are followed by the model.
