jeremylongshore

fireflies-incident-runbook

@jeremylongshore/fireflies-incident-runbook
jeremylongshore
1,004
123 forks
Updated 1/18/2026
View on GitHub

Execute Fireflies.ai incident response procedures with triage, mitigation, and postmortem. Use when responding to Fireflies.ai-related outages, investigating errors, or running post-incident reviews for Fireflies.ai integration failures. Trigger with phrases like "fireflies incident", "fireflies outage", "fireflies down", "fireflies on-call", "fireflies emergency", "fireflies broken".

Installation

$skills install @jeremylongshore/fireflies-incident-runbook
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathplugins/saas-packs/fireflies-pack/skills/fireflies-incident-runbook/SKILL.md
Branchmain
Scoped Name@jeremylongshore/fireflies-incident-runbook

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions


name: fireflies-incident-runbook description: | Execute Fireflies.ai incident response procedures with triage, mitigation, and postmortem. Use when responding to Fireflies.ai-related outages, investigating errors, or running post-incident reviews for Fireflies.ai integration failures. Trigger with phrases like "fireflies incident", "fireflies outage", "fireflies down", "fireflies on-call", "fireflies emergency", "fireflies broken". allowed-tools: Read, Grep, Bash(kubectl:), Bash(curl:) version: 1.0.0 license: MIT author: Jeremy Longshore jeremy@intentsolutions.io

Fireflies.ai Incident Runbook

Overview

Rapid incident response procedures for Fireflies.ai-related outages.

Prerequisites

  • Access to Fireflies.ai dashboard and status page
  • kubectl access to production cluster
  • Prometheus/Grafana access
  • Communication channels (Slack, PagerDuty)

Severity Levels

LevelDefinitionResponse TimeExamples
P1Complete outage< 15 minFireflies.ai API unreachable
P2Degraded service< 1 hourHigh latency, partial failures
P3Minor impact< 4 hoursWebhook delays, non-critical errors
P4No user impactNext business dayMonitoring gaps

Quick Triage

# 1. Check Fireflies.ai status
curl -s https://status.fireflies.com | jq

# 2. Check our integration health
curl -s https://api.yourapp.com/health | jq '.services.fireflies'

# 3. Check error rate (last 5 min)
curl -s localhost:9090/api/v1/query?query=rate(fireflies_errors_total[5m])

# 4. Recent error logs
kubectl logs -l app=fireflies-integration --since=5m | grep -i error | tail -20

Decision Tree

Fireflies.ai API returning errors?
├─ YES: Is status.fireflies.com showing incident?
│   ├─ YES → Wait for Fireflies.ai to resolve. Enable fallback.
│   └─ NO → Our integration issue. Check credentials, config.
└─ NO: Is our service healthy?
    ├─ YES → Likely resolved or intermittent. Monitor.
    └─ NO → Our infrastructure issue. Check pods, memory, network.

Immediate Actions by Error Type

401/403 - Authentication

# Verify API key is set
kubectl get secret fireflies-secrets -o jsonpath='{.data.api-key}' | base64 -d

# Check if key was rotated
# → Verify in Fireflies.ai dashboard

# Remediation: Update secret and restart pods
kubectl create secret generic fireflies-secrets --from-literal=api-key=NEW_KEY --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deployment/fireflies-integration

429 - Rate Limited

# Check rate limit headers
curl -v https://api.fireflies.com 2>&1 | grep -i rate

# Enable request queuing
kubectl set env deployment/fireflies-integration RATE_LIMIT_MODE=queue

# Long-term: Contact Fireflies.ai for limit increase

500/503 - Fireflies.ai Errors

# Enable graceful degradation
kubectl set env deployment/fireflies-integration FIREFLIES_FALLBACK=true

# Notify users of degraded service
# Update status page

# Monitor Fireflies.ai status for resolution

Communication Templates

Internal (Slack)

🔴 P1 INCIDENT: Fireflies.ai Integration
Status: INVESTIGATING
Impact: [Describe user impact]
Current action: [What you're doing]
Next update: [Time]
Incident commander: @[name]

External (Status Page)

Fireflies.ai Integration Issue

We're experiencing issues with our Fireflies.ai integration.
Some users may experience [specific impact].

We're actively investigating and will provide updates.

Last updated: [timestamp]

Post-Incident

Evidence Collection

# Generate debug bundle
./scripts/fireflies-debug-bundle.sh

# Export relevant logs
kubectl logs -l app=fireflies-integration --since=1h > incident-logs.txt

# Capture metrics
curl "localhost:9090/api/v1/query_range?query=fireflies_errors_total&start=2h" > metrics.json

Postmortem Template

## Incident: Fireflies.ai [Error Type]
**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** P[1-4]

### Summary
[1-2 sentence description]

### Timeline
- HH:MM - [Event]
- HH:MM - [Event]

### Root Cause
[Technical explanation]

### Impact
- Users affected: N
- Revenue impact: $X

### Action Items
- [ ] [Preventive measure] - Owner - Due date

Instructions

Step 1: Quick Triage

Run the triage commands to identify the issue source.

Step 2: Follow Decision Tree

Determine if the issue is Fireflies.ai-side or internal.

Step 3: Execute Immediate Actions

Apply the appropriate remediation for the error type.

Step 4: Communicate Status

Update internal and external stakeholders.

Output

  • Issue identified and categorized
  • Remediation applied
  • Stakeholders notified
  • Evidence collected for postmortem

Error Handling

IssueCauseSolution
Can't reach status pageNetwork issueUse mobile or VPN
kubectl failsAuth expiredRe-authenticate
Metrics unavailablePrometheus downCheck backup metrics
Secret rotation failsPermission deniedEscalate to admin

Examples

One-Line Health Check

curl -sf https://api.yourapp.com/health | jq '.services.fireflies.status' || echo "UNHEALTHY"

Resources

Next Steps

For data handling, see fireflies-data-handling.

More by jeremylongshore

View all
rabbitmq-queue-setup
1,004

Rabbitmq Queue Setup - Auto-activating skill for Backend Development. Triggers on: rabbitmq queue setup, rabbitmq queue setup Part of the Backend Development skill category.

model-evaluation-suite
1,004

evaluating-machine-learning-models: This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".

neural-network-builder
1,004

building-neural-networks: This skill allows Claude to construct and configure neural network architectures using the neural-network-builder plugin. It should be used when the user requests the creation of a new neural network, modification of an existing one, or assistance with defining the layers, parameters, and training process. The skill is triggered by requests involving terms like "build a neural network," "define network architecture," "configure layers," or specific mentions of neural network types (e.g., "CNN," "RNN," "transformer").

oauth-callback-handler
1,004

Oauth Callback Handler - Auto-activating skill for API Integration. Triggers on: oauth callback handler, oauth callback handler Part of the API Integration skill category.