Execute PostHog incident response procedures with triage, mitigation, and postmortem. Use when responding to PostHog-related outages, investigating errors, or running post-incident reviews for PostHog integration failures. Trigger with phrases like "posthog incident", "posthog outage", "posthog down", "posthog on-call", "posthog emergency", "posthog broken".
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: posthog-incident-runbook description: | Execute PostHog incident response procedures with triage, mitigation, and postmortem. Use when responding to PostHog-related outages, investigating errors, or running post-incident reviews for PostHog integration failures. Trigger with phrases like "posthog incident", "posthog outage", "posthog down", "posthog on-call", "posthog emergency", "posthog broken". allowed-tools: Read, Grep, Bash(kubectl:), Bash(curl:) version: 1.0.0 license: MIT author: Jeremy Longshore jeremy@intentsolutions.io
PostHog Incident Runbook
Overview
Rapid incident response procedures for PostHog-related outages.
Prerequisites
- Access to PostHog dashboard and status page
- kubectl access to production cluster
- Prometheus/Grafana access
- Communication channels (Slack, PagerDuty)
Severity Levels
| Level | Definition | Response Time | Examples |
|---|---|---|---|
| P1 | Complete outage | < 15 min | PostHog API unreachable |
| P2 | Degraded service | < 1 hour | High latency, partial failures |
| P3 | Minor impact | < 4 hours | Webhook delays, non-critical errors |
| P4 | No user impact | Next business day | Monitoring gaps |
Quick Triage
# 1. Check PostHog status
curl -s https://status.posthog.com | jq
# 2. Check our integration health
curl -s https://api.yourapp.com/health | jq '.services.posthog'
# 3. Check error rate (last 5 min)
curl -s localhost:9090/api/v1/query?query=rate(posthog_errors_total[5m])
# 4. Recent error logs
kubectl logs -l app=posthog-integration --since=5m | grep -i error | tail -20
Decision Tree
PostHog API returning errors?
├─ YES: Is status.posthog.com showing incident?
│ ├─ YES → Wait for PostHog to resolve. Enable fallback.
│ └─ NO → Our integration issue. Check credentials, config.
└─ NO: Is our service healthy?
├─ YES → Likely resolved or intermittent. Monitor.
└─ NO → Our infrastructure issue. Check pods, memory, network.
Immediate Actions by Error Type
401/403 - Authentication
# Verify API key is set
kubectl get secret posthog-secrets -o jsonpath='{.data.api-key}' | base64 -d
# Check if key was rotated
# → Verify in PostHog dashboard
# Remediation: Update secret and restart pods
kubectl create secret generic posthog-secrets --from-literal=api-key=NEW_KEY --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deployment/posthog-integration
429 - Rate Limited
# Check rate limit headers
curl -v https://api.posthog.com 2>&1 | grep -i rate
# Enable request queuing
kubectl set env deployment/posthog-integration RATE_LIMIT_MODE=queue
# Long-term: Contact PostHog for limit increase
500/503 - PostHog Errors
# Enable graceful degradation
kubectl set env deployment/posthog-integration POSTHOG_FALLBACK=true
# Notify users of degraded service
# Update status page
# Monitor PostHog status for resolution
Communication Templates
Internal (Slack)
🔴 P1 INCIDENT: PostHog Integration
Status: INVESTIGATING
Impact: [Describe user impact]
Current action: [What you're doing]
Next update: [Time]
Incident commander: @[name]
External (Status Page)
PostHog Integration Issue
We're experiencing issues with our PostHog integration.
Some users may experience [specific impact].
We're actively investigating and will provide updates.
Last updated: [timestamp]
Post-Incident
Evidence Collection
# Generate debug bundle
./scripts/posthog-debug-bundle.sh
# Export relevant logs
kubectl logs -l app=posthog-integration --since=1h > incident-logs.txt
# Capture metrics
curl "localhost:9090/api/v1/query_range?query=posthog_errors_total&start=2h" > metrics.json
Postmortem Template
## Incident: PostHog [Error Type]
**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** P[1-4]
### Summary
[1-2 sentence description]
### Timeline
- HH:MM - [Event]
- HH:MM - [Event]
### Root Cause
[Technical explanation]
### Impact
- Users affected: N
- Revenue impact: $X
### Action Items
- [ ] [Preventive measure] - Owner - Due date
Instructions
Step 1: Quick Triage
Run the triage commands to identify the issue source.
Step 2: Follow Decision Tree
Determine if the issue is PostHog-side or internal.
Step 3: Execute Immediate Actions
Apply the appropriate remediation for the error type.
Step 4: Communicate Status
Update internal and external stakeholders.
Output
- Issue identified and categorized
- Remediation applied
- Stakeholders notified
- Evidence collected for postmortem
Error Handling
| Issue | Cause | Solution |
|---|---|---|
| Can't reach status page | Network issue | Use mobile or VPN |
| kubectl fails | Auth expired | Re-authenticate |
| Metrics unavailable | Prometheus down | Check backup metrics |
| Secret rotation fails | Permission denied | Escalate to admin |
Examples
One-Line Health Check
curl -sf https://api.yourapp.com/health | jq '.services.posthog.status' || echo "UNHEALTHY"
Resources
Next Steps
For data handling, see posthog-data-handling.
More by jeremylongshore
View allRabbitmq Queue Setup - Auto-activating skill for Backend Development. Triggers on: rabbitmq queue setup, rabbitmq queue setup Part of the Backend Development skill category.
evaluating-machine-learning-models: This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".
building-neural-networks: This skill allows Claude to construct and configure neural network architectures using the neural-network-builder plugin. It should be used when the user requests the creation of a new neural network, modification of an existing one, or assistance with defining the layers, parameters, and training process. The skill is triggered by requests involving terms like "build a neural network," "define network architecture," "configure layers," or specific mentions of neural network types (e.g., "CNN," "RNN," "transformer").
Oauth Callback Handler - Auto-activating skill for API Integration. Triggers on: oauth callback handler, oauth callback handler Part of the API Integration skill category.
