Implement production pre-recorded speech-to-text with Deepgram. Use when building audio transcription, batch processing, or implementing diarization and intelligence features. Trigger: "deepgram transcription", "speech to text", "transcribe audio", "batch transcription", "deepgram nova", "diarize audio".
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: deepgram-core-workflow-a description: | Implement production pre-recorded speech-to-text with Deepgram. Use when building audio transcription, batch processing, or implementing diarization and intelligence features. Trigger: "deepgram transcription", "speech to text", "transcribe audio", "batch transcription", "deepgram nova", "diarize audio". allowed-tools: Read, Write, Edit, Bash(npm:), Bash(pip:), Grep version: 1.0.0 license: MIT author: Jeremy Longshore jeremy@intentsolutions.io compatible-with: claude-code, codex, openclaw tags: [saas, deepgram, voice-ai, transcription, workflow, stt]
Deepgram Core Workflow A: Pre-recorded Transcription
Overview
Production pre-recorded transcription service using Deepgram's REST API. Covers transcribeUrl and transcribeFile, speaker diarization, audio intelligence (summarization, topic detection, sentiment, intent), batch processing with concurrency control, and callback-based async transcription for large files.
Prerequisites
@deepgram/sdkinstalled,DEEPGRAM_API_KEYconfigured- Audio files: WAV, MP3, FLAC, OGG, M4A, or WebM
- For batch:
p-limitpackage (npm install p-limit)
Instructions
Step 1: Transcription Service Class
import { createClient, DeepgramClient } from '@deepgram/sdk';
import { readFileSync } from 'fs';
interface TranscribeOptions {
model?: 'nova-3' | 'nova-2' | 'nova-2-meeting' | 'nova-2-phonecall' | 'base';
language?: string;
diarize?: boolean;
utterances?: boolean;
paragraphs?: boolean;
smart_format?: boolean;
summarize?: boolean; // Audio intelligence
detect_topics?: boolean; // Topic detection
sentiment?: boolean; // Sentiment analysis
intents?: boolean; // Intent recognition
keywords?: string[]; // Keyword boosting: ["term:weight"]
callback?: string; // Async callback URL
}
class DeepgramTranscriber {
private client: DeepgramClient;
constructor(apiKey: string) {
this.client = createClient(apiKey);
}
async transcribeUrl(url: string, opts: TranscribeOptions = {}) {
const { result, error } = await this.client.listen.prerecorded.transcribeUrl(
{ url },
{
model: opts.model ?? 'nova-3',
language: opts.language ?? 'en',
smart_format: opts.smart_format ?? true,
diarize: opts.diarize ?? false,
utterances: opts.utterances ?? false,
paragraphs: opts.paragraphs ?? false,
summarize: opts.summarize ? 'v2' : undefined,
detect_topics: opts.detect_topics ?? false,
sentiment: opts.sentiment ?? false,
intents: opts.intents ?? false,
keywords: opts.keywords,
callback: opts.callback,
}
);
if (error) throw new Error(`Transcription failed: ${error.message}`);
return result;
}
async transcribeFile(filePath: string, opts: TranscribeOptions = {}) {
const audio = readFileSync(filePath);
const mimetype = this.detectMimetype(filePath);
const { result, error } = await this.client.listen.prerecorded.transcribeFile(
audio,
{
model: opts.model ?? 'nova-3',
smart_format: opts.smart_format ?? true,
mimetype,
diarize: opts.diarize ?? false,
utterances: opts.utterances ?? false,
summarize: opts.summarize ? 'v2' : undefined,
detect_topics: opts.detect_topics ?? false,
sentiment: opts.sentiment ?? false,
}
);
if (error) throw new Error(`File transcription failed: ${error.message}`);
return result;
}
private detectMimetype(path: string): string {
const ext = path.split('.').pop()?.toLowerCase();
const map: Record<string, string> = {
wav: 'audio/wav', mp3: 'audio/mpeg', flac: 'audio/flac',
ogg: 'audio/ogg', m4a: 'audio/mp4', webm: 'audio/webm',
};
return map[ext ?? ''] ?? 'audio/wav';
}
}
Step 2: Extract Structured Results
function formatResult(result: any) {
const channel = result.results.channels[0];
const alt = channel.alternatives[0];
return {
transcript: alt.transcript,
confidence: alt.confidence,
words: alt.words?.map((w: any) => ({
word: w.word,
start: w.start,
end: w.end,
confidence: w.confidence,
speaker: w.speaker, // Only if diarize: true
punctuated_word: w.punctuated_word,
})),
// Speaker segments (requires utterances: true + diarize: true)
utterances: result.results.utterances?.map((u: any) => ({
speaker: u.speaker,
text: u.transcript,
start: u.start,
end: u.end,
confidence: u.confidence,
})),
// Audio intelligence results
summary: result.results.summary?.short,
topics: result.results.topics?.segments,
sentiments: result.results.sentiments?.segments,
intents: result.results.intents?.segments,
metadata: {
duration: result.metadata.duration,
channels: result.metadata.channels,
model: result.metadata.model_info,
request_id: result.metadata.request_id,
},
};
}
Step 3: Batch Processing
import pLimit from 'p-limit';
async function batchTranscribe(
files: string[],
opts: TranscribeOptions = {},
concurrency = 5
) {
const transcriber = new DeepgramTranscriber(process.env.DEEPGRAM_API_KEY!);
const limit = pLimit(concurrency);
const results = await Promise.allSettled(
files.map(file =>
limit(async () => {
const result = await transcriber.transcribeFile(file, opts);
console.log(`Done: ${file} (${result.metadata.duration}s)`);
return { file, result: formatResult(result) };
})
)
);
const succeeded = results.filter(r => r.status === 'fulfilled');
const failed = results.filter(r => r.status === 'rejected');
console.log(`Batch complete: ${succeeded.length} ok, ${failed.length} failed`);
return results;
}
Step 4: Async Callback Transcription (Large Files)
// For files >2 hours or when you don't want to hold a connection open,
// use Deepgram's callback feature. Deepgram POSTs results to your URL.
async function submitAsync(audioUrl: string, callbackUrl: string) {
const transcriber = new DeepgramTranscriber(process.env.DEEPGRAM_API_KEY!);
// Deepgram returns a request_id immediately, processes in background
const result = await transcriber.transcribeUrl(audioUrl, {
model: 'nova-3',
diarize: true,
callback: callbackUrl, // Your HTTPS endpoint
});
console.log('Submitted. Request ID:', result.metadata.request_id);
// Deepgram will POST results to callbackUrl when done
// Retries up to 10 times with 30s delay on failure
}
Step 5: Keyword Boosting
// Boost domain-specific terms for higher accuracy
const result = await transcriber.transcribeUrl(audioUrl, {
model: 'nova-3',
keywords: [
'Kubernetes:1.5', // Boost weight 1.0-2.0
'PostgreSQL:1.5',
'microservices:1.3',
],
});
Output
DeepgramTranscriberclass with URL and file transcription- Structured result extraction with word-level timing, speakers, and intelligence
- Batch processing with configurable concurrency via
p-limit - Async callback pattern for large files
- Keyword boosting for domain vocabulary
Error Handling
| Error | Cause | Solution |
|---|---|---|
400 Bad Request | Invalid audio format | Verify file header bytes (WAV: RIFF, MP3: 0xFFF3/0xFFFB) |
413 Payload Too Large | File exceeds limit | Use callback URL for async processing |
| Empty transcript | No speech in audio | Check audio volume, try alternatives: 3 for confidence |
408 Timeout | Long file, sync mode | Switch to callback-based async |
| Low confidence | Background noise | Preprocess: ffmpeg -i input.wav -af "highpass=f=200,lowpass=f=3000" clean.wav |
Resources
Next Steps
Proceed to deepgram-core-workflow-b for real-time streaming transcription.
More by jeremylongshore
View allBuild and deploy production-ready generative AI agents using Vertex AI, Gemini models, and Google Cloud infrastructure with RAG, function calling, and multi-modal capabilities. Use when appropriate context detected. Trigger with relevant phrases based on skill purpose.
Generate production-ready Google Cloud code examples from official repositories including ADK samples, Genkit templates, Vertex AI notebooks, and Gemini patterns. Use when asked to "show ADK example" or "provide GCP starter kit". Trigger with relevant phrases based on skill purpose.
Build production Firebase Genkit applications including RAG systems, multi-step flows, and tool calling for Node.js/Python/Go. Deploy to Firebase Functions or Cloud Run with AI monitoring. Use when asked to "create genkit flow" or "implement RAG". Trigger with relevant phrases based on skill purpose.
Validate production readiness of Vertex AI Agent Engine deployments across security, monitoring, performance, compliance, and best practices. Generates weighted scores (0-100%) with actionable remediation plans. Use when asked to validate a deployment, run a production readiness check, audit security posture, or verify compliance for Vertex AI agents. Trigger with "validate deployment", "production readiness", "security audit", "compliance check", "is this agent ready for prod", "check my ADK agent", "review before deploy", or "production readiness check". Make sure to use this skill whenever validating ADK agents for Agent Engine.
