jeremylongshore

vertex-media-master

@jeremylongshore/vertex-media-master
jeremylongshore
1,004
122 forks
Updated 1/18/2026
View on GitHub

Vertex AI Media Master: Automatic activation for ALL Google Vertex AI multimodal operations - video processing, audio generation, image creation, and marketing campaigns. **TRIGGER PHRASES:** - "vertex ai", "gemini multimodal", "process video", "generate audio", "create images", "marketing campaign" - "imagen", "video understanding", "multimodal", "content generation", "media assets" **AUTO-INVOKES FOR:** - Video processing and understanding (up to 6 hours) - Audio generation and transcription - Image generation with Imagen 4 - Marketing campaign automation - Social media content creation - Ad creative generation - Multimodal content workflows

Installation

$skills install @jeremylongshore/vertex-media-master
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathbackups/skills-batch-20251204-000554/plugins/productivity/003-jeremy-vertex-ai-media-master/skills/vertex-media-master/SKILL.md
Branchmain
Scoped Name@jeremylongshore/vertex-media-master

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

skills list

Skill Instructions


name: Vertex AI Media Master description: | Automatic activation for ALL Google Vertex AI multimodal operations - video processing, audio generation, image creation, and marketing campaigns. TRIGGER PHRASES:

  • "vertex ai", "gemini multimodal", "process video", "generate audio", "create images", "marketing campaign"
  • "imagen", "video understanding", "multimodal", "content generation", "media assets" AUTO-INVOKES FOR:
  • Video processing and understanding (up to 6 hours)
  • Audio generation and transcription
  • Image generation with Imagen 4
  • Marketing campaign automation
  • Social media content creation
  • Ad creative generation
  • Multimodal content workflows allowed-tools: Read, Write, Edit, Grep, Glob, Bash version: 1.0.0

Vertex AI Media Master - Comprehensive Multimodal AI Operations

This Agent Skill provides comprehensive mastery of Google Vertex AI multimodal capabilities for video, audio, image, and text processing with focus on marketing applications.

Core Capabilities

🎥 Video Processing (Gemini 2.0/2.5)

  • Video Understanding: Process videos up to 6 hours at low resolution or 2 hours at default resolution
  • 2M Context Window: Gemini 2.5 Pro handles massive video content
  • Audio Track Processing: Automatic audio transcription from video
  • Multi-video Analysis: Process multiple videos in single request
  • Video Summarization: Extract key moments, scenes, and insights
  • Marketing Use Cases:
    • Analyze competitor video ads
    • Extract highlights from long-form content
    • Generate video summaries for social media
    • Transcribe and caption video content
    • Identify brand mentions and product placements

🎵 Audio Generation & Processing

  • Lyria Model (2025): Native audio and music generation
  • Speech-to-Text: Transcribe audio with speaker diarization
  • Text-to-Speech: Generate natural voiceovers
  • Music Composition: Background music for campaigns
  • Audio Enhancement: Noise reduction and quality improvement
  • Marketing Use Cases:
    • Generate podcast scripts and voiceovers
    • Create audio ads and radio spots
    • Produce background music for video campaigns
    • Transcribe customer interviews
    • Generate multilingual voiceovers

🖼️ Image Generation (Imagen 4 & Gemini 2.5 Flash Image)

  • Imagen 4: Highest quality text-to-image generation
  • Gemini 2.5 Flash Image: Interleaved image generation with text
  • Style Transfer: Apply brand styles to generated images
  • Product Visualization: Generate product mockups
  • Campaign Assets: Create ad creatives and social media graphics
  • Marketing Use Cases:
    • Generate personalized ad images (Adios solution)
    • Create social media graphics at scale
    • Produce product lifestyle images
    • Generate A/B test variations
    • Create branded campaign visuals

📢 Marketing Campaign Automation

  • ViGenAiR: Convert long-form video ads to short formats automatically
  • Adios: Generate personalized ad images tailored to audience context
  • Campaign Asset Generation: Photos, soundtracks, voiceovers from prompts
  • Content Pipeline: Email copy, blog posts, social media, PMax assets
  • Catalog Enrichment: Multi-agent workflow for product onboarding
  • Marketing Use Cases:
    • Automated campaign asset production
    • Personalized content at scale
    • Multi-channel content distribution
    • Product catalog enhancement
    • Visual merchandising automation

🔧 Technical Implementation

API Integration:

from google.cloud import aiplatform
from vertexai.preview.generative_models import GenerativeModel

# Initialize Vertex AI
aiplatform.init(project="your-project", location="us-central1")

# Gemini 2.5 Pro for video
model = GenerativeModel("gemini-2.5-pro")

# Process video with audio
response = model.generate_content([
    "Analyze this video and extract key marketing insights",
    video_file,  # Up to 6 hours
])

# Imagen 4 for image generation
from vertexai.preview.vision_models import ImageGenerationModel
imagen = ImageGenerationModel.from_pretrained("imagen-4")
images = imagen.generate_images(
    prompt="Professional product photo, studio lighting, white background",
    number_of_images=4
)

Gemini 2.5 Flash Image (Interleaved Generation):

# Generate images within text responses
model = GenerativeModel("gemini-2.5-flash-image")
response = model.generate_content([
    "Create a 5-step recipe with images for each step"
])
# Returns text + images interleaved

Audio Generation (Lyria):

from vertexai.preview.audio_models import AudioGenerationModel
lyria = AudioGenerationModel.from_pretrained("lyria")
audio = lyria.generate_audio(
    prompt="Upbeat background music for product launch video, 30 seconds",
    duration=30
)

📊 Marketing Workflow Automation

1. Multi-Channel Campaign Creation:

# Single prompt generates all assets
campaign = model.generate_content([
    """Create a product launch campaign for [product]:
    - Hero image (1920x1080)
    - 3 social media graphics (1080x1080)
    - 30-second video script
    - Background music description
    - Email marketing copy
    - Instagram caption"""
])

2. Video Repurposing Pipeline:

# Long-form to short-form conversion (ViGenAiR approach)
long_video = "gs://bucket/original-ad-60s.mp4"
response = model.generate_content([
    f"Extract 3 engaging 15-second clips from this video for TikTok/Reels",
    long_video
])
# Auto-generates format-specific versions

3. Personalized Ad Generation:

# Context-aware image generation (Adios approach)
for audience in audiences:
    ad_image = imagen.generate_images(
        prompt=f"Product ad for {product}, targeting {audience.demographics}, {audience.style_preference}",
        aspect_ratio="16:9"
    )

🎯 Best Practices for Jeremy

1. Project Setup:

# Set environment variables
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"

# Install SDK
pip install google-cloud-aiplatform[vision,audio] google-generativeai

2. Rate Limits & Quotas:

  • Gemini 2.5 Pro: 2M tokens/min (video processing)
  • Imagen 4: 100 images/min
  • Monitor usage in Cloud Console

3. Cost Optimization:

  • Use Gemini 2.5 Flash for faster, cheaper operations
  • Batch image generation requests
  • Cache video embeddings for repeated analysis
  • Use low-resolution video setting when appropriate

4. Security & Compliance:

  • Keep API keys in Secret Manager, never in code
  • Use service accounts with minimal permissions
  • Enable VPC Service Controls for data residency
  • Log all API calls for audit trails

🚀 Advanced Marketing Use Cases

1. Campaign Performance Analysis:

# Analyze competitor campaigns
competitor_videos = ["gs://bucket/competitor1.mp4", "gs://bucket/competitor2.mp4"]
analysis = model.generate_content([
    "Compare these competitor videos: themes, messaging, CTAs, production quality",
    *competitor_videos
])

2. Content Localization:

# Generate multilingual campaigns
for lang in ["en", "es", "fr", "de", "ja"]:
    localized_content = model.generate_content([
        f"Translate and culturally adapt this campaign for {lang} market:",
        campaign_brief,
        hero_image
    ])

3. A/B Test Generation:

# Generate variations automatically
variations = []
for style in ["minimalist", "bold", "luxury", "playful"]:
    variation = imagen.generate_images(
        prompt=f"Product ad, {style} style, {brand_guidelines}",
        number_of_images=1
    )
    variations.append(variation)

📚 Reference Documentation

Official Documentation:

Marketing Solutions:

Pricing:

  • Gemini 2.5 Pro: $3.50/1M input tokens, $10.50/1M output tokens
  • Imagen 4: $0.04/image
  • Video processing: Included in Gemini token pricing

When This Skill Activates

This skill automatically activates when you mention:

  • Video processing, analysis, or understanding
  • Audio generation, music composition, or voiceovers
  • Image generation, ad creatives, or visual content
  • Marketing campaigns, content automation, or asset production
  • Gemini multimodal capabilities
  • Vertex AI media operations
  • Social media content, email marketing, or PMax campaigns

Integration with Other Tools

Google Cloud Services:

  • Cloud Storage for media asset management
  • BigQuery for campaign analytics
  • Cloud Functions for automation triggers
  • Vertex AI Pipelines for content workflows

Third-Party Integrations:

  • Social media APIs (LinkedIn, Twitter, Instagram)
  • Marketing automation platforms (HubSpot, Marketo)
  • CMS integrations (WordPress, Contentful)
  • DAM systems (Bynder, Cloudinary)

Success Metrics

Track These KPIs:

  • Asset generation speed (baseline: 5 images/min)
  • Content approval rate (target: >80%)
  • Campaign personalization scale (target: 1000+ variants)
  • Cost per asset (target: <$0.10/image)
  • Time saved vs manual production (target: 90% reduction)

This skill makes Jeremy a Vertex AI multimodal expert with instant access to video processing, audio generation, image creation, and marketing automation capabilities.

More by jeremylongshore

View all
rabbitmq-queue-setup
1,004

Rabbitmq Queue Setup - Auto-activating skill for Backend Development. Triggers on: rabbitmq queue setup, rabbitmq queue setup Part of the Backend Development skill category.

model-evaluation-suite
1,004

evaluating-machine-learning-models: This skill allows Claude to evaluate machine learning models using a comprehensive suite of metrics. It should be used when the user requests model performance analysis, validation, or testing. Claude can use this skill to assess model accuracy, precision, recall, F1-score, and other relevant metrics. Trigger this skill when the user mentions "evaluate model", "model performance", "testing metrics", "validation results", or requests a comprehensive "model evaluation".

neural-network-builder
1,004

building-neural-networks: This skill allows Claude to construct and configure neural network architectures using the neural-network-builder plugin. It should be used when the user requests the creation of a new neural network, modification of an existing one, or assistance with defining the layers, parameters, and training process. The skill is triggered by requests involving terms like "build a neural network," "define network architecture," "configure layers," or specific mentions of neural network types (e.g., "CNN," "RNN," "transformer").

oauth-callback-handler
1,004

Oauth Callback Handler - Auto-activating skill for API Integration. Triggers on: oauth callback handler, oauth callback handler Part of the API Integration skill category.