jeremylongshore

lindy-incident-runbook

@jeremylongshore/lindy-incident-runbook
jeremylongshore
1,004
123 forks
Updated 1/18/2026
View on GitHub

Incident response runbook for Lindy AI integrations. Use when responding to incidents, troubleshooting outages, or creating on-call procedures. Trigger with phrases like "lindy incident", "lindy outage", "lindy on-call", "lindy runbook".

Installation

$skills install @jeremylongshore/lindy-incident-runbook
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathplugins/saas-packs/lindy-pack/skills/lindy-incident-runbook/SKILL.md
Branchmain
Scoped Name@jeremylongshore/lindy-incident-runbook

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions


name: lindy-incident-runbook description: | Incident response runbook for Lindy AI integrations. Use when responding to incidents, troubleshooting outages, or creating on-call procedures. Trigger with phrases like "lindy incident", "lindy outage", "lindy on-call", "lindy runbook". allowed-tools: Read, Write, Edit, Bash(curl:*) version: 1.0.0 license: MIT author: Jeremy Longshore jeremy@intentsolutions.io

Lindy Incident Runbook

Overview

Incident response procedures for Lindy AI integration issues.

Prerequisites

  • Access to Lindy dashboard
  • Monitoring dashboards available
  • Escalation contacts known
  • Admin access to production

Incident Severity Levels

SeverityDescriptionResponse TimeExamples
SEV1Complete outage15 minutesAll agents failing
SEV2Partial outage30 minutesOne critical agent down
SEV3Degraded2 hoursHigh latency, some errors
SEV4Minor24 hoursCosmetic issues

Quick Diagnostics

Step 1: Check Lindy Status

# Check Lindy status page
curl -s https://status.lindy.ai/api/v1/status | jq '.status'

# Check API health
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $LINDY_API_KEY" \
  https://api.lindy.ai/v1/health

Step 2: Verify Authentication

# Test API key
curl -s -H "Authorization: Bearer $LINDY_API_KEY" \
  https://api.lindy.ai/v1/users/me | jq '.email'

Step 3: Check Rate Limits

# Check rate limit headers
curl -sI -H "Authorization: Bearer $LINDY_API_KEY" \
  https://api.lindy.ai/v1/users/me | grep -i "x-ratelimit"

Common Incidents

Incident: Complete API Outage

Symptoms:

  • All API calls failing
  • 5xx errors from Lindy

Runbook:

1. [ ] Check https://status.lindy.ai
2. [ ] Verify it's not a local network issue
3. [ ] Check if other services on same network work
4. [ ] Enable fallback mode if available
5. [ ] Notify stakeholders
6. [ ] Open support ticket with Lindy
7. [ ] Monitor status page for updates

Fallback Code:

async function runWithFallback(agentId: string, input: string) {
  try {
    return await lindy.agents.run(agentId, { input });
  } catch (error: any) {
    if (error.status >= 500) {
      // Enable fallback mode
      return {
        output: 'Service temporarily unavailable. Please try again later.',
        fallback: true,
      };
    }
    throw error;
  }
}

Incident: Rate Limiting

Symptoms:

  • 429 errors
  • "Rate limit exceeded" messages

Runbook:

1. [ ] Check current usage in dashboard
2. [ ] Identify spike source (which agent/automation)
3. [ ] Reduce request rate or implement throttling
4. [ ] Consider upgrading plan if legitimate traffic
5. [ ] Implement request queuing

Throttling Code:

const queue = new PQueue({ concurrency: 5, interval: 1000, intervalCap: 10 });

async function throttledRun(agentId: string, input: string) {
  return queue.add(() => lindy.agents.run(agentId, { input }));
}

Incident: Agent Failures

Symptoms:

  • Specific agent not responding
  • Unexpected outputs
  • Timeout errors

Runbook:

1. [ ] Identify affected agent(s)
2. [ ] Check agent configuration hasn't changed
3. [ ] Review recent runs for patterns
4. [ ] Test with simple input
5. [ ] Check if tools are working
6. [ ] Rollback to previous version if needed

Diagnostic Script:

async function diagnoseAgent(agentId: string) {
  const lindy = new Lindy({ apiKey: process.env.LINDY_API_KEY });

  // Get agent details
  const agent = await lindy.agents.get(agentId);
  console.log('Agent:', agent.name, agent.status);

  // Check recent runs
  const runs = await lindy.runs.list({ agentId, limit: 10 });
  const failures = runs.filter((r: any) => r.status === 'failed');
  console.log(`Failures: ${failures.length}/${runs.length}`);

  // Test run
  try {
    const test = await lindy.agents.run(agentId, { input: 'Hello' });
    console.log('Test run: SUCCESS');
  } catch (e: any) {
    console.log('Test run: FAILED -', e.message);
  }

  return { agent, runs, failures };
}

Incident: High Latency

Symptoms:

  • Response times > 10 seconds
  • Timeouts increasing

Runbook:

1. [ ] Check Lindy status page for degradation
2. [ ] Review latency metrics by agent
3. [ ] Check if issue is with specific agent
4. [ ] Verify instructions aren't causing long responses
5. [ ] Consider reducing max_tokens
6. [ ] Implement streaming if not already

Escalation Matrix

LevelContactWhen
L1On-call engineerInitial response
L2Engineering leadAfter 30 min SEV1/2
L3VP EngineeringAfter 1 hour SEV1
Lindysupport@lindy.aiExternal issue confirmed

Post-Incident

Incident Report Template

## Incident Report: [Title]

**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** SEV1/2/3/4
**Impact:** [Description of user impact]

### Timeline
- HH:MM - Incident detected
- HH:MM - On-call paged
- HH:MM - Root cause identified
- HH:MM - Resolution applied
- HH:MM - All clear

### Root Cause
[What caused the incident]

### Resolution
[What fixed it]

### Action Items
- [ ] [Preventive action 1]
- [ ] [Preventive action 2]

Output

  • Quick diagnostic commands
  • Common incident runbooks
  • Fallback code patterns
  • Escalation procedures
  • Post-incident template

Resources

Next Steps

Proceed to lindy-data-handling for data management.

More by jeremylongshore

View all
rabbitmq-queue-setup
1,004

Rabbitmq Queue Setup - Auto-activating skill for Backend Development. Triggers on: rabbitmq queue setup, rabbitmq queue setup Part of the Backend Development skill category.

model-evaluation-suite
1,004

evaluating-machine-learning-models: This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".

neural-network-builder
1,004

building-neural-networks: This skill allows Claude to construct and configure neural network architectures using the neural-network-builder plugin. It should be used when the user requests the creation of a new neural network, modification of an existing one, or assistance with defining the layers, parameters, and training process. The skill is triggered by requests involving terms like "build a neural network," "define network architecture," "configure layers," or specific mentions of neural network types (e.g., "CNN," "RNN," "transformer").

oauth-callback-handler
1,004

Oauth Callback Handler - Auto-activating skill for API Integration. Triggers on: oauth callback handler, oauth callback handler Part of the API Integration skill category.