Agent SkillsAgent Skills
Th0rgal

media-creation

@Th0rgal/media-creation
Th0rgal
8
12 forks
Updated 4/1/2026
View on GitHub

Creates images and video via Alibaba Wan 2.6 (DashScope), Google Gemini/Veo, and OpenAI GPT Image 1.5 APIs, plus background extraction workflows. Trigger terms: image generation, video generation, dashscope, wan 2.6, alibaba, gemini, veo, gpt image, openai images, background removal, alpha extraction, transparent png.

Installation

$npx agent-skills-cli install @Th0rgal/media-creation
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathskill/media-creation/SKILL.md
Branchmain
Scoped Name@Th0rgal/media-creation

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: media-creation description: > Creates images and video via Alibaba Wan 2.6 (DashScope), Google Gemini/Veo, and OpenAI GPT Image 1.5 APIs, plus background extraction workflows. Trigger terms: image generation, video generation, dashscope, wan 2.6, alibaba, gemini, veo, gpt image, openai images, background removal, alpha extraction, transparent png.

Use when

  • Generate images or video via Alibaba Wan, Google Gemini/Veo, or OpenAI GPT Image APIs.
  • Create transparent PNGs (prefer GPT Image 1.5 for native transparency support).
  • Convert consistent renders (3D, compositing) with different backgrounds into a transparent RGBA output.
  • Prefer Gemini for general image generation; use Wan when content is restricted (e.g., babies).

Don't use when

  • If API access or credentials are not available.
  • If the task does not involve media generation or background extraction.

Outputs

  • Generated media files in artifacts/ (PNG, WEBP, MP4, etc.) or API JSON responses when requested.

Templates or Examples

  • Use the API request examples below as templates.

Transparent Image Generation (Recommended Approach)

Option 1: GPT Image 1.5 Native Transparency (BEST)

GPT Image 1.5 supports native transparency output. This is the simplest and most reliable method:

curl -X POST "https://api.openai.com/v1/images/generations" \
  -H "Authorization: Bearer ${OPENAI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-1.5",
    "prompt": "A cute cartoon cat mascot",
    "size": "1024x1024",
    "quality": "high",
    "background": "transparent",
    "output_format": "png"
  }'

Notes:

  • background: "transparent" requires output_format: "png" or "webp"
  • Returns base64 data in data[0].b64_json
  • This is the only method that produces TRUE transparency from a single generation

Option 2: Three-Background Extraction (For Consistent Renders Only)

⚠️ IMPORTANT LIMITATION: This workflow ONLY works when you have control over the exact pixel output:

  • ✅ 3D renders (Blender, Maya, etc.)
  • ✅ Compositing software with controlled backgrounds
  • ✅ Screenshots with different desktop backgrounds
  • ❌ Generative AI (each generation produces different results)

The algorithm requires IDENTICAL foreground pixels across all three images. Generative AI models produce different outputs even with the same prompt.

For 3D/compositing use:

python3 scripts/extract_transparency.py \
  --black render_black.png \
  --white render_white.png \
  --colored render_red.png \
  --output result.png

Option 3: AI Image + Manual Background Removal

For AI-generated images that need transparency:

  1. Generate the image with any provider
  2. Use a dedicated background removal tool (rembg, remove.bg API, etc.)

Inputs the Agent Should Ask For (only if missing; otherwise proceed)

  • Provider: Alibaba Wan (DashScope), Google (Gemini/Veo), or OpenAI (GPT Image).
  • Model ID and task type (T2I, I2V, T2V).
  • Prompt text and any input image path (for I2V).
  • Output size/resolution and aspect ratio.
  • Desired output format and count.
  • For transparency: whether native transparency (GPT Image) or background extraction is needed.
  • For background extraction: paths to black/white/red background images and the colored background RGB (0-1).

API Keys

The following environment variables should be set for API access:

  • OPENAI_API_KEY - For GPT Image 1.5 generations
  • GOOGLE_GENAI_API_KEY - For Gemini image/Veo video generation
  • DASHSCOPE_API_KEY - For Alibaba Wan 2.6 image/video generation

Outputs / Definition of Done

  • A clear, credential-safe request plan or script snippet.
  • For generation: task submission, polling, and decode/download steps.
  • For background removal: algorithm steps and expected RGBA output.

Procedure

  • Use references/alibaba-wan-api.md for Wan 2.6 endpoints and parameters (image, T2V, I2V).
  • Use references/gemini-banana-api.md for Gemini image and Veo video in the Gemini API.
  • Use references/openai-gpt-image-api.md for GPT Image 1.5 endpoints and parameters.
  • Use references/background-removal-3-bg.md for the three-background alpha extraction algorithm.
  • API keys in code examples are stored encrypted using <encrypted> tags.

Model Quick Reference

ProviderModelUse Case
OpenAIgpt-image-1.5Best for transparent images, high quality
OpenAIgpt-image-1Image edits/inpainting
Googlegemini-2.5-flash-imageFast image generation
Googleveo-3.1-generate-previewVideo generation
Alibabawan2.6-t2vText-to-video
Alibabawan2.6-i2vImage-to-video
Alibabawan2.6-imageImage generation (fewer restrictions)

Checks & Guardrails

  • API keys must be wrapped in <encrypted> tags; they are encrypted at rest.
  • Validate image sizes/formats and rate limits.
  • Ensure base64 encoding formats match API expectations.
  • For transparency: verify the workflow matches the source type (render vs. AI).

References

  • references/alibaba-wan-api.md
  • references/gemini-banana-api.md
  • references/openai-gpt-image-api.md
  • references/background-removal-3-bg.md

Scripts

  • scripts/extract_transparency.py - Extract RGBA from black/white/red background images. Usage: python3 scripts/extract_transparency.py --black img_black.png --white img_white.png --colored img_red.png --output result.png