Respond to production incidents systematically with triage, investigation, resolution, and post-mortem analysis to minimize downtime and prevent recurrence. Use when handling production outages, triaging incidents, investigating critical bugs, coordinating incident response, implementing hotfixes, conducting post-mortems, or establishing incident response procedures.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: incident-response description: Respond to production incidents systematically with triage, investigation, resolution, and post-mortem analysis to minimize downtime and prevent recurrence. Use when handling production outages, triaging incidents, investigating critical bugs, coordinating incident response, implementing hotfixes, conducting post-mortems, or establishing incident response procedures.
Incident Response - Production Issue Management
When to use this skill
- Responding to production outages
- Triaging critical incidents
- Investigating high-severity bugs
- Coordinating incident response teams
- Implementing emergency hotfixes
- Conducting post-mortem analyses
- Establishing incident response procedures
- Communicating status during incidents
- Creating runbooks for common issues
- Implementing rollback strategies
- Documenting incident timelines
- Preventing incident recurrence
When to use this skill
- Responding to outages, managing incidents, conducting postmortems.
- When working on related tasks or features
- During development that requires this expertise
Use when: Responding to outages, managing incidents, conducting postmortems.
Incident Response Process
1. Detect
- Monitoring alerts
- User reports
- Automated checks
2. Triage
- Assess severity (P0-P4)
- Page on-call engineer
- Create incident channel
3. Mitigate
- Rollback to last known good
- Scale resources
- Apply hotfix
- Communicate status
4. Resolve
- Verify fix
- Monitor metrics
- Update status page
- Close incident
5. Postmortem
- Timeline of events
- Root cause analysis
- Action items
- Follow-up tasks
Severity Levels
- P0 (Critical): Complete outage, data loss
- P1 (High): Major feature broken, revenue impact
- P2 (Medium): Degraded performance, workaround exists
- P3 (Low): Minor bug, cosmetic issue
- P4 (Informational): Enhancement request
Example Runbook
```markdown
High CPU Usage Runbook
Symptoms
- Server CPU > 90%
- Slow response times
- Request timeouts
Investigation
- Check top processes: `top`
- Check memory: `free -h`
- Check logs: `tail -f app.log`
Mitigation
- Scale horizontally: Add servers
- Restart service: `systemctl restart app`
- Rate limit: Enable aggressive rate limiting
Resolution
- Identify root cause (N+1 query, memory leak, etc.)
- Deploy fix
- Monitor for 1 hour ```
Communication Template
``` [INCIDENT] Service X degraded
Status: Investigating Impact: 20% of users seeing slow load times ETA: 30 minutes
Updates:
- 10:00 AM: Issue detected
- 10:05 AM: On-call paged, investigation started
- 10:15 AM: Root cause identified (database bottleneck)
- 10:30 AM: Fix deployed, monitoring
Next update: 11:00 AM ```
Resources
More by korallis
View allWrite type-safe TypeScript code with strict mode enabled, comprehensive type definitions, proper error handling, and elimination of any types. Use when enabling TypeScript strict mode, adding types to existing JavaScript, fixing type errors, creating type definitions, using utility types, implementing type guards, avoiding any types, creating generic types, or ensuring complete type safety across the codebase.
Optimize application performance through code splitting, lazy loading, caching strategies, bundle size reduction, render optimization, and profiling. Use when improving page load times, reducing bundle sizes, optimizing React rendering, implementing code splitting, configuring caching strategies, lazy loading components and routes, optimizing images and assets, profiling performance bottlenecks, implementing virtual scrolling for large lists, or improving Core Web Vitals and Lighthouse scores.
Create consistent, scalable design systems using Tailwind CSS utility classes with custom themes, design tokens, and responsive design patterns. Use when building design systems, implementing custom themes, creating reusable utility patterns, configuring Tailwind theme extensions, implementing dark mode, building responsive layouts, creating design tokens, using arbitrary values, or establishing consistent spacing and typography scales.
Build reusable, composable, and maintainable React/Vue/Angular components following established design patterns like compound components, render props, custom hooks, and HOCs. Use when creating component libraries, implementing component composition, building reusable UI elements, designing prop APIs, managing component state patterns, implementing controlled vs uncontrolled components, creating compound components, using render props or children as functions, building custom hooks, or establishing component architecture standards.
