jeremylongshore

deepgram-core-workflow-a

@jeremylongshore/deepgram-core-workflow-a
jeremylongshore
1,004
123 forks
Updated 1/18/2026
View on GitHub

Implement speech-to-text transcription workflow with Deepgram. Use when building pre-recorded audio transcription, batch processing, or implementing core transcription features. Trigger with phrases like "deepgram transcription", "speech to text", "transcribe audio", "audio transcription workflow", "batch transcription".

Installation

$skills install @jeremylongshore/deepgram-core-workflow-a
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathplugins/saas-packs/deepgram-pack/skills/deepgram-core-workflow-a/SKILL.md
Branchmain
Scoped Name@jeremylongshore/deepgram-core-workflow-a

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions


name: deepgram-core-workflow-a description: | Implement speech-to-text transcription workflow with Deepgram. Use when building pre-recorded audio transcription, batch processing, or implementing core transcription features. Trigger with phrases like "deepgram transcription", "speech to text", "transcribe audio", "audio transcription workflow", "batch transcription". allowed-tools: Read, Write, Edit, Bash(npm:), Bash(pip:), Grep version: 1.0.0 license: MIT author: Jeremy Longshore jeremy@intentsolutions.io

Deepgram Core Workflow A: Pre-recorded Transcription

Overview

Implement a complete pre-recorded audio transcription workflow using Deepgram's Nova-2 model.

Prerequisites

  • Completed deepgram-install-auth setup
  • Understanding of async patterns
  • Audio files or URLs to transcribe

Instructions

Step 1: Set Up Transcription Service

Create a service class to handle transcription operations.

Step 2: Implement File and URL Transcription

Add methods for both local files and remote URLs.

Step 3: Add Feature Options

Configure punctuation, diarization, and formatting.

Step 4: Process Results

Extract and format transcription results.

Output

  • Transcription service class
  • Support for file and URL transcription
  • Configurable transcription options
  • Formatted transcript output

Error Handling

ErrorCauseSolution
Audio Too LongExceeds limitsSplit into chunks or use async
Unsupported FormatInvalid audio typeConvert to WAV/MP3/FLAC
Empty ResponseNo speech detectedCheck audio quality
TimeoutLarge file processingUse callback URL pattern

Examples

TypeScript Transcription Service

// services/transcription.ts
import { createClient } from '@deepgram/sdk';
import { readFile } from 'fs/promises';

export interface TranscriptionOptions {
  model?: 'nova-2' | 'nova' | 'enhanced' | 'base';
  language?: string;
  punctuate?: boolean;
  diarize?: boolean;
  smartFormat?: boolean;
  utterances?: boolean;
  paragraphs?: boolean;
}

export interface TranscriptionResult {
  transcript: string;
  confidence: number;
  words: Array<{
    word: string;
    start: number;
    end: number;
    confidence: number;
  }>;
  utterances?: Array<{
    speaker: number;
    transcript: string;
    start: number;
    end: number;
  }>;
}

export class TranscriptionService {
  private client;

  constructor(apiKey: string) {
    this.client = createClient(apiKey);
  }

  async transcribeUrl(
    url: string,
    options: TranscriptionOptions = {}
  ): Promise<TranscriptionResult> {
    const { result, error } = await this.client.listen.prerecorded.transcribeUrl(
      { url },
      {
        model: options.model || 'nova-2',
        language: options.language || 'en',
        punctuate: options.punctuate ?? true,
        diarize: options.diarize ?? false,
        smart_format: options.smartFormat ?? true,
        utterances: options.utterances ?? false,
        paragraphs: options.paragraphs ?? false,
      }
    );

    if (error) throw new Error(error.message);

    return this.formatResult(result);
  }

  async transcribeFile(
    filePath: string,
    options: TranscriptionOptions = {}
  ): Promise<TranscriptionResult> {
    const audio = await readFile(filePath);
    const mimetype = this.getMimeType(filePath);

    const { result, error } = await this.client.listen.prerecorded.transcribeFile(
      audio,
      {
        model: options.model || 'nova-2',
        language: options.language || 'en',
        punctuate: options.punctuate ?? true,
        diarize: options.diarize ?? false,
        smart_format: options.smartFormat ?? true,
        mimetype,
      }
    );

    if (error) throw new Error(error.message);

    return this.formatResult(result);
  }

  private formatResult(result: any): TranscriptionResult {
    const channel = result.results.channels[0];
    const alternative = channel.alternatives[0];

    return {
      transcript: alternative.transcript,
      confidence: alternative.confidence,
      words: alternative.words || [],
      utterances: result.results.utterances,
    };
  }

  private getMimeType(filePath: string): string {
    const ext = filePath.split('.').pop()?.toLowerCase();
    const mimeTypes: Record<string, string> = {
      wav: 'audio/wav',
      mp3: 'audio/mpeg',
      flac: 'audio/flac',
      ogg: 'audio/ogg',
      m4a: 'audio/mp4',
      webm: 'audio/webm',
    };
    return mimeTypes[ext || ''] || 'audio/wav';
  }
}

Batch Transcription

// services/batch-transcription.ts
import { TranscriptionService, TranscriptionResult } from './transcription';

export async function batchTranscribe(
  files: string[],
  options: { concurrency?: number } = {}
): Promise<Map<string, TranscriptionResult | Error>> {
  const service = new TranscriptionService(process.env.DEEPGRAM_API_KEY!);
  const results = new Map<string, TranscriptionResult | Error>();
  const concurrency = options.concurrency || 5;

  // Process in batches
  for (let i = 0; i < files.length; i += concurrency) {
    const batch = files.slice(i, i + concurrency);

    const batchResults = await Promise.allSettled(
      batch.map(file => service.transcribeFile(file))
    );

    batchResults.forEach((result, index) => {
      const file = batch[index];
      if (result.status === 'fulfilled') {
        results.set(file, result.value);
      } else {
        results.set(file, result.reason);
      }
    });
  }

  return results;
}

Speaker Diarization

// Example with speaker diarization
const result = await service.transcribeFile('./meeting.wav', {
  diarize: true,
  utterances: true,
});

// Format as conversation
result.utterances?.forEach(utterance => {
  console.log(`Speaker ${utterance.speaker}: ${utterance.transcript}`);
});

Python Example

# services/transcription.py
from deepgram import DeepgramClient, PrerecordedOptions, FileSource
from pathlib import Path
from typing import Optional
import mimetypes

class TranscriptionService:
    def __init__(self, api_key: str):
        self.client = DeepgramClient(api_key)

    def transcribe_url(
        self,
        url: str,
        model: str = 'nova-2',
        language: str = 'en',
        diarize: bool = False
    ) -> dict:
        options = PrerecordedOptions(
            model=model,
            language=language,
            smart_format=True,
            punctuate=True,
            diarize=diarize,
        )

        response = self.client.listen.rest.v("1").transcribe_url(
            {"url": url},
            options
        )

        return self._format_result(response)

    def transcribe_file(
        self,
        file_path: str,
        model: str = 'nova-2',
        diarize: bool = False
    ) -> dict:
        with open(file_path, 'rb') as f:
            audio = f.read()

        mimetype, _ = mimetypes.guess_type(file_path)

        source = FileSource(audio, mimetype or 'audio/wav')

        options = PrerecordedOptions(
            model=model,
            smart_format=True,
            punctuate=True,
            diarize=diarize,
        )

        response = self.client.listen.rest.v("1").transcribe_file(
            source,
            options
        )

        return self._format_result(response)

    def _format_result(self, response) -> dict:
        channel = response.results.channels[0]
        alternative = channel.alternatives[0]

        return {
            'transcript': alternative.transcript,
            'confidence': alternative.confidence,
            'words': alternative.words,
        }

Resources

Next Steps

Proceed to deepgram-core-workflow-b for real-time streaming transcription.

More by jeremylongshore

View all
rabbitmq-queue-setup
1,004

Rabbitmq Queue Setup - Auto-activating skill for Backend Development. Triggers on: rabbitmq queue setup, rabbitmq queue setup Part of the Backend Development skill category.

model-evaluation-suite
1,004

evaluating-machine-learning-models: This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".

neural-network-builder
1,004

building-neural-networks: This skill allows Claude to construct and configure neural network architectures using the neural-network-builder plugin. It should be used when the user requests the creation of a new neural network, modification of an existing one, or assistance with defining the layers, parameters, and training process. The skill is triggered by requests involving terms like "build a neural network," "define network architecture," "configure layers," or specific mentions of neural network types (e.g., "CNN," "RNN," "transformer").

oauth-callback-handler
1,004

Oauth Callback Handler - Auto-activating skill for API Integration. Triggers on: oauth callback handler, oauth callback handler Part of the API Integration skill category.