Test-Driven Development patterns for testing AI deliberation features. Use when adding new deliberation features, adapters, convergence detection, or decision graph components. Encodes TDD workflow: write test first → implement → verify.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: "deliberation-tester" description: "Test-Driven Development patterns for testing AI deliberation features. Use when adding new deliberation features, adapters, convergence detection, or decision graph components. Encodes TDD workflow: write test first → implement → verify."
Deliberation Tester Skill
Purpose
This skill teaches Test-Driven Development (TDD) patterns for the AI Counsel deliberation system. Follow the red-green-refactor cycle: write failing test first, implement feature, verify passing, then refactor.
When to Use This Skill
- Adding new CLI or HTTP adapters
- Implementing deliberation engine features
- Building convergence detection logic
- Adding decision graph functionality
- Extending voting or transcript systems
- Any feature that affects multi-round deliberations
Test Organization
The project has 113+ tests organized into three categories:
tests/
├── unit/ # Fast tests with mocked dependencies
├── integration/ # Tests with real CLI tools or system integration
├── e2e/ # End-to-end tests with real API calls (slow, expensive)
├── conftest.py # Shared pytest fixtures
└── fixtures/
└── vcr_cassettes/ # Recorded HTTP responses for replay
TDD Workflow
1. Write Test First (RED)
Before implementing any feature, write a test that will fail:
# tests/unit/test_new_feature.py
import pytest
from my_module import NewFeature
class TestNewFeature:
"""Tests for NewFeature."""
def test_feature_does_something(self):
"""Test that feature performs expected behavior."""
feature = NewFeature()
result = feature.do_something()
assert result == "expected output"
Run the test to verify it fails:
pytest tests/unit/test_new_feature.py -v
2. Implement Feature (GREEN)
Write minimal code to make the test pass:
# my_module.py
class NewFeature:
def do_something(self):
return "expected output"
Run the test to verify it passes:
pytest tests/unit/test_new_feature.py -v
3. Refactor (REFACTOR)
Improve code quality while keeping tests green:
- Extract duplicated logic
- Improve naming
- Optimize performance
- Add type hints
Run all tests to ensure nothing broke:
pytest tests/unit -v
Unit Test Patterns
Pattern 1: Mock Adapters for Engine Tests
Use shared fixtures from conftest.py to mock adapters:
# tests/unit/test_engine.py
from deliberation.engine import DeliberationEngine
from models.schema import Participant
class TestDeliberationEngine:
"""Tests for DeliberationEngine."""
def test_engine_initialization(self, mock_adapters):
"""Test engine initializes with adapters."""
engine = DeliberationEngine(mock_adapters)
assert engine.adapters == mock_adapters
assert len(engine.adapters) == 2
@pytest.mark.asyncio
async def test_execute_round_single_participant(self, mock_adapters):
"""Test executing single round with one participant."""
engine = DeliberationEngine(mock_adapters)
participants = [
Participant(cli="claude", model="claude-3-5-sonnet", stance="neutral")
]
# Configure mock return value
mock_adapters["claude"].invoke_mock.return_value = "This is Claude's response"
responses = await engine.execute_round(
round_num=1,
prompt="What is 2+2?",
participants=participants,
previous_responses=[],
)
assert len(responses) == 1
assert responses[0].response == "This is Claude's response"
assert responses[0].participant == "claude-3-5-sonnet@claude"
Key Points:
- Use
@pytest.mark.asynciofor async tests - Use
mock_adaptersfixture fromconftest.py - Configure mock return values with
invoke_mock.return_value - Assert on response structure and content
Pattern 2: Mock Subprocesses for CLI Adapter Tests
Use unittest.mock.patch to mock subprocess execution:
# tests/unit/test_adapters.py
from unittest.mock import AsyncMock, Mock, patch
import pytest
from adapters.claude import ClaudeAdapter
class TestClaudeAdapter:
"""Tests for ClaudeAdapter."""
@pytest.mark.asyncio
@patch("adapters.base.asyncio.create_subprocess_exec")
async def test_invoke_success(self, mock_subprocess):
"""Test successful CLI invocation."""
# Mock subprocess
mock_process = Mock()
mock_process.communicate = AsyncMock(
return_value=(b"Claude Code output\n\nActual model response here", b"")
)
mock_process.returncode = 0
mock_subprocess.return_value = mock_process
adapter = ClaudeAdapter(
args=["-p", "--model", "{model}", "{prompt}"]
)
result = await adapter.invoke(
prompt="What is 2+2?",
model="claude-3-5-sonnet-20241022"
)
assert result == "Actual model response here"
mock_subprocess.assert_called_once()
@pytest.mark.asyncio
@patch("adapters.base.asyncio.create_subprocess_exec")
async def test_invoke_timeout(self, mock_subprocess):
"""Test timeout handling."""
mock_process = Mock()
mock_process.communicate = AsyncMock(side_effect=asyncio.TimeoutError())
mock_subprocess.return_value = mock_process
adapter = ClaudeAdapter(args=["-p", "{model}", "{prompt}"], timeout=1)
with pytest.raises(RuntimeError, match="timeout"):
await adapter.invoke(prompt="test", model="sonnet")
Key Points:
- Patch
asyncio.create_subprocess_execat the import path - Mock
communicate()to return(stdout, stderr)tuple - Use
AsyncMockfor async methods - Test both success and error cases
Pattern 3: HTTP Adapter Tests with Mock Responses
Test HTTP adapters without making real API calls:
# tests/unit/test_ollama_adapter.py
from adapters.ollama import OllamaAdapter
import pytest
class TestOllamaAdapter:
"""Tests for Ollama HTTP adapter."""
def test_adapter_initialization(self):
"""Test adapter initializes with correct base_url and defaults."""
adapter = OllamaAdapter(base_url="http://localhost:11434", timeout=60)
assert adapter.base_url == "http://localhost:11434"
assert adapter.timeout == 60
assert adapter.max_retries == 3
def test_build_request_structure(self):
"""Test build_request returns correct endpoint, headers, body."""
adapter = OllamaAdapter(base_url="http://localhost:11434")
endpoint, headers, body = adapter.build_request(
model="llama2", prompt="What is 2+2?"
)
assert endpoint == "/api/generate"
assert headers["Content-Type"] == "application/json"
assert body["model"] == "llama2"
assert body["prompt"] == "What is 2+2?"
assert body["stream"] is False
def test_parse_response_extracts_content(self):
"""Test parse_response extracts 'response' field from JSON."""
adapter = OllamaAdapter(base_url="http://localhost:11434")
response_json = {
"model": "llama2",
"response": "The answer is 4.",
"done": True,
}
result = adapter.parse_response(response_json)
assert result == "The answer is 4."
def test_parse_response_missing_field_raises_error(self):
"""Test parse_response raises error if 'response' field missing."""
adapter = OllamaAdapter(base_url="http://localhost:11434")
response_json = {"model": "llama2", "done": True}
with pytest.raises(KeyError) as exc_info:
adapter.parse_response(response_json)
assert "response" in str(exc_info.value).lower()
Key Points:
- Test
build_request()separately fromparse_response() - Verify request structure (endpoint, headers, body)
- Test response parsing with valid and invalid JSON
- Use
pytest.raises()for error cases
Pattern 4: Pydantic Model Validation Tests
Test data models with valid and invalid inputs:
# tests/unit/test_models.py
import pytest
from pydantic import ValidationError
from models.schema import Participant, Vote
class TestParticipant:
"""Tests for Participant model."""
def test_valid_participant(self):
"""Test participant creation with valid data."""
p = Participant(cli="claude", model="sonnet", stance="neutral")
assert p.cli == "claude"
assert p.model == "sonnet"
assert p.stance == "neutral"
def test_invalid_cli_raises_error(self):
"""Test invalid CLI name raises validation error."""
with pytest.raises(ValidationError) as exc_info:
Participant(cli="invalid", model="test", stance="neutral")
assert "cli" in str(exc_info.value).lower()
def test_invalid_stance_raises_error(self):
"""Test invalid stance raises validation error."""
with pytest.raises(ValidationError) as exc_info:
Participant(cli="claude", model="sonnet", stance="maybe")
assert "stance" in str(exc_info.value).lower()
class TestVote:
"""Tests for Vote model."""
def test_valid_vote(self):
"""Test vote creation with valid data."""
vote = Vote(
option="Option A",
confidence=0.85,
rationale="Strong evidence supports this",
continue_debate=False
)
assert vote.confidence == 0.85
assert vote.continue_debate is False
def test_confidence_out_of_range_raises_error(self):
"""Test confidence outside 0.0-1.0 raises error."""
with pytest.raises(ValidationError) as exc_info:
Vote(option="A", confidence=1.5, rationale="test")
assert "confidence" in str(exc_info.value).lower()
Key Points:
- Test valid model creation
- Test each validation rule with invalid data
- Use
pytest.raises(ValidationError)for schema violations - Check error message contains the problematic field
Integration Test Patterns
Pattern 5: Real Adapter Integration Tests
Test adapters with real CLI invocations (requires tools installed):
# tests/integration/test_engine_convergence.py
import pytest
from deliberation.engine import DeliberationEngine
from models.config import load_config
from models.schema import Participant
@pytest.mark.integration
class TestEngineConvergenceIntegration:
"""Test convergence detection integrated with deliberation engine."""
@pytest.fixture
def config(self):
"""Load test config."""
return load_config("config.yaml")
@pytest.mark.asyncio
async def test_engine_detects_convergence_with_similar_responses(
self, config, mock_adapters
):
"""Engine should detect convergence when responses are similar."""
from deliberation.convergence import ConvergenceDetector
engine = DeliberationEngine(adapters=mock_adapters)
engine.convergence_detector = ConvergenceDetector(config)
engine.config = config
# Mock adapters to return similar responses
mock_adapters["claude"].invoke = AsyncMock(
side_effect=[
"TypeScript is better for large projects",
"TypeScript is better for large projects due to type safety",
]
)
# Execute rounds and verify convergence detection
# ... test logic here
Key Points:
- Mark with
@pytest.mark.integration - Load real config with
load_config() - Can use real CLI tools or mocked responses
- Test feature integration, not just units
Pattern 6: VCR Cassettes for HTTP Tests
Record and replay HTTP responses for consistent testing:
# tests/integration/test_ollama_integration.py
import pytest
import vcr
from adapters.ollama import OllamaAdapter
# Configure VCR to record/replay HTTP interactions
my_vcr = vcr.VCR(
cassette_library_dir='tests/fixtures/vcr_cassettes/ollama',
record_mode='once', # Record once, then replay
match_on=['method', 'scheme', 'host', 'port', 'path', 'query', 'body'],
)
@pytest.mark.integration
class TestOllamaIntegration:
"""Integration tests for Ollama adapter with VCR."""
@pytest.mark.asyncio
@my_vcr.use_cassette('ollama_generate_success.yaml')
async def test_real_ollama_request(self):
"""Test real Ollama request (recorded to cassette)."""
adapter = OllamaAdapter(base_url="http://localhost:11434", timeout=60)
result = await adapter.invoke(
prompt="What is 2+2?",
model="llama2"
)
assert isinstance(result, str)
assert len(result) > 0
Key Points:
- Install VCR:
pip install vcrpy - Configure cassette directory and match criteria
- First run records HTTP interactions to YAML file
- Subsequent runs replay from cassette (no network calls)
- Commit cassettes to repo for CI/CD consistency
Pattern 7: Performance and Latency Tests
Test performance characteristics of features:
# tests/integration/test_performance.py
import pytest
import time
from decision_graph.cache import DecisionCache
@pytest.mark.integration
class TestCachePerformance:
"""Performance tests for decision graph cache."""
@pytest.mark.asyncio
async def test_cache_hit_latency(self):
"""Test cache hit latency is under 5μs."""
cache = DecisionCache(max_size=200)
# Warm up cache
cache.set("test_key", "test_value")
# Measure cache hit time
start = time.perf_counter()
for _ in range(1000):
result = cache.get("test_key")
elapsed = time.perf_counter() - start
avg_latency_us = (elapsed / 1000) * 1_000_000
assert avg_latency_us < 5, f"Cache hit too slow: {avg_latency_us}μs"
assert result == "test_value"
@pytest.mark.asyncio
async def test_query_latency_with_1000_nodes(self):
"""Test query latency stays under 100ms with 1000 nodes."""
# Setup: create 1000 decision nodes
# ... setup code
start = time.perf_counter()
results = await query_engine.search_similar("test question", limit=5)
elapsed = time.perf_counter() - start
assert elapsed < 0.1, f"Query too slow: {elapsed*1000}ms"
assert len(results) <= 5
Key Points:
- Use
time.perf_counter()for high-resolution timing - Test realistic data volumes (1000+ nodes)
- Assert on performance targets (p95, p99 latencies)
- Run separately from fast unit tests
Test Naming Conventions
Follow these naming patterns for clarity:
# Test class: Test<ComponentName>
class TestDeliberationEngine:
pass
# Test method: test_<what>_<condition>_<outcome>
def test_engine_initialization_with_adapters_succeeds(self):
pass
def test_invoke_with_timeout_raises_runtime_error(self):
pass
def test_parse_response_with_missing_field_raises_key_error(self):
pass
Patterns:
- Class:
Test<ComponentName>(PascalCase) - Method:
test_<action>_<condition>_<result>(snake_case) - Use descriptive names that explain the test scenario
- Group related tests in classes
Running Tests
Run All Tests
# All tests with coverage
pytest --cov=. --cov-report=html
# View coverage report
open htmlcov/index.html
Run Specific Test Types
# Unit tests only (fast, no external dependencies)
pytest tests/unit -v
# Integration tests (requires CLI tools)
pytest tests/integration -v -m integration
# End-to-end tests (real API calls, slow)
pytest tests/e2e -v -m e2e
Run Specific Tests
# Single file
pytest tests/unit/test_engine.py -v
# Single test class
pytest tests/unit/test_engine.py::TestDeliberationEngine -v
# Single test method
pytest tests/unit/test_engine.py::TestDeliberationEngine::test_engine_initialization -v
# Tests matching pattern
pytest -k "convergence" -v
Debugging Tests
# Show print statements
pytest tests/unit/test_engine.py -v -s
# Drop into debugger on failure
pytest tests/unit/test_engine.py -v --pdb
# Show full diff on assertion failures
pytest tests/unit/test_engine.py -v -vv
Code Quality Checks
After writing tests, run quality checks:
# Format code (black)
black .
# Lint (ruff)
ruff check .
# Type check (optional, mypy)
mypy .
TDD Example: Adding a New CLI Adapter
Step 1: Write Failing Test
# tests/unit/test_new_cli.py
import pytest
from adapters.new_cli import NewCLIAdapter
class TestNewCLIAdapter:
"""Tests for NewCLIAdapter."""
def test_adapter_initialization(self):
"""Test adapter initializes with correct command."""
adapter = NewCLIAdapter(args=["--model", "{model}", "{prompt}"])
assert adapter.command == "new-cli"
assert adapter.timeout == 60
def test_parse_output_extracts_response(self):
"""Test parse_output extracts model response."""
adapter = NewCLIAdapter(args=[])
raw = "Some CLI header\n\nActual response here"
result = adapter.parse_output(raw)
assert result == "Actual response here"
Run test: pytest tests/unit/test_new_cli.py -v (FAILS)
Step 2: Implement Adapter
# adapters/new_cli.py
from adapters.base import BaseCLIAdapter
class NewCLIAdapter(BaseCLIAdapter):
"""Adapter for new-cli tool."""
def __init__(self, args: list[str], timeout: int = 60):
super().__init__(command="new-cli", args=args, timeout=timeout)
def parse_output(self, raw_output: str) -> str:
"""Extract response from CLI output."""
lines = raw_output.strip().split("\n")
# Skip header, return content
return "\n".join(lines[2:]).strip()
Run test: pytest tests/unit/test_new_cli.py -v (PASSES)
Step 3: Add Integration Test
# tests/integration/test_new_cli_integration.py
import pytest
from adapters.new_cli import NewCLIAdapter
@pytest.mark.integration
class TestNewCLIIntegration:
"""Integration tests for NewCLI adapter."""
@pytest.mark.asyncio
async def test_real_cli_invocation(self):
"""Test real CLI invocation (requires new-cli installed)."""
adapter = NewCLIAdapter(args=["--model", "{model}", "{prompt}"])
result = await adapter.invoke(
prompt="What is 2+2?",
model="default-model"
)
assert isinstance(result, str)
assert len(result) > 0
Step 4: Register Adapter
# adapters/__init__.py
from adapters.new_cli import NewCLIAdapter
def create_adapter(name: str, config):
"""Factory function for creating adapters."""
cli_adapters = {
"claude": ClaudeAdapter,
"codex": CodexAdapter,
"new_cli": NewCLIAdapter, # Add here
}
# ... rest of factory logic
Step 5: Update Schema
# models/schema.py
class Participant(BaseModel):
"""Participant in deliberation."""
cli: Literal["claude", "codex", "droid", "gemini", "new_cli"] # Add here
model: str
stance: Literal["for", "against", "neutral"]
Step 6: Run All Tests
# Verify no regressions
pytest tests/unit -v
pytest tests/integration -v -m integration
# Check coverage
pytest --cov=adapters --cov-report=term-missing
Common Testing Pitfalls
1. Not Using Async Fixtures
# WRONG: Mixing sync and async
def test_async_function(self):
result = my_async_function() # Returns coroutine, not result
# RIGHT: Mark test as async
@pytest.mark.asyncio
async def test_async_function(self):
result = await my_async_function()
2. Mock Path Mismatch
# WRONG: Patching where defined
@patch("adapters.base.asyncio") # Won't work if imported in subclass
# RIGHT: Patch where used
@patch("adapters.base.asyncio.create_subprocess_exec")
3. Forgetting to Reset Mocks
# WRONG: Reusing mock without reset
mock_adapter.invoke_mock.return_value = "first"
# ... test 1
mock_adapter.invoke_mock.return_value = "second"
# ... test 2 (but test 1 state might leak)
# RIGHT: Use fixtures or reset
@pytest.fixture(autouse=True)
def reset_mocks(self, mock_adapters):
yield
for adapter in mock_adapters.values():
adapter.invoke_mock.reset_mock()
4. Not Testing Error Cases
# INCOMPLETE: Only testing happy path
def test_adapter_invoke_success(self):
result = await adapter.invoke("prompt", "model")
assert result == "response"
# COMPLETE: Test errors too
def test_adapter_invoke_timeout(self):
with pytest.raises(RuntimeError, match="timeout"):
await adapter.invoke("prompt", "model")
def test_adapter_invoke_invalid_response(self):
with pytest.raises(ValueError, match="invalid"):
adapter.parse_response({})
Shared Fixtures Reference
Located in tests/conftest.py:
# mock_adapters fixture
def test_with_mocks(self, mock_adapters):
"""Use pre-configured mock adapters."""
claude = mock_adapters["claude"]
codex = mock_adapters["codex"]
# Both have invoke_mock configured
# sample_config fixture
def test_with_config(self, sample_config):
"""Use sample configuration dict."""
assert sample_config["defaults"]["rounds"] == 2
Test Coverage Goals
- Unit tests: 90%+ coverage of core logic
- Integration tests: Critical workflows (engine execution, convergence, voting)
- E2E tests: Minimal, focused on user-facing scenarios
- Total: 113+ tests currently, growing with each feature
Final Checklist
Before committing new features:
- Write unit test first (RED)
- Implement minimal feature (GREEN)
- Add integration test if needed
- Refactor while keeping tests green
- Run
pytest tests/unit -v(all pass) - Run
black . && ruff check .(no errors) - Check coverage:
pytest --cov=. --cov-report=term-missing - Update CLAUDE.md if architecture changed
- Commit with clear message describing what was tested
Resources
- Pytest docs: https://docs.pytest.org/
- Async testing: https://pytest-asyncio.readthedocs.io/
- VCR.py docs: https://vcrpy.readthedocs.io/
- Project tests:
/Users/harrison/Github/ai-counsel/tests/ - CLAUDE.md:
/Users/harrison/Github/ai-counsel/CLAUDE.md
Remember: Tests are documentation. Write tests that explain what the feature does and why it's important.
More by blueman82
View allExpert at evolving Pydantic configuration schemas with backward compatibility and automated migrations
Guide for creating new CLI or HTTP adapters to integrate AI models into the AI Counsel deliberation system
Query and analyze the AI Counsel decision graph to find past deliberations, identify patterns, and debug memory issues
Guide for safely adding new MCP tools to the AI Counsel server