Agent SkillsAgent Skills
vasilyu1983

ops-devops-platform

@vasilyu1983/ops-devops-platform
vasilyu1983
51
11 forks
Updated 3/31/2026
View on GitHub

Production-grade DevOps patterns with Kubernetes 1.34+, Terraform 1.9+, Docker 27+, ArgoCD/FluxCD GitOps, SRE, eBPF-based observability, AI-driven monitoring, CI/CD security, and cloud-native operations (AWS, GCP, Azure, Kafka).

Installation

$npx agent-skills-cli install @vasilyu1983/ops-devops-platform
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathframeworks/claude-code-kit/framework/skills/ops-devops-platform/SKILL.md
Branchmain
Scoped Name@vasilyu1983/ops-devops-platform

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: ops-devops-platform description: Production-grade DevOps patterns with Kubernetes 1.34+, Terraform 1.9+, Docker 27+, ArgoCD/FluxCD GitOps, SRE, eBPF-based observability, AI-driven monitoring, CI/CD security, and cloud-native operations (AWS, GCP, Azure, Kafka).

DevOps Engineering β€” Quick Reference

This skill equips Claude with actionable templates, checklists, and patterns for building self-service platforms, automating infrastructure with GitOps, deploying securely with DevSecOps, scaling with Kubernetes, ensuring reliability through SRE practices, and operating production systems with AI-driven observability.

Modern Best Practices (December 2025): Kubernetes 1.34 (in-place Pod resource updates GA, 1.35 releasing Dec 17), Docker 27 with BuildKit optimizations, Terraform 1.9+ with improved provider ecosystem, ArgoCD 2.14/FluxCD 2.5 GitOps patterns, eBPF-based observability (Cilium, Hubble), and AI-driven AIOps for incident correlation.


Quick Reference

TaskTool/FrameworkCommandWhen to Use
Infrastructure as CodeTerraform 1.9+terraform plan && terraform applyProvision cloud resources declaratively
GitOps DeploymentArgoCD / FluxCDargocd app sync myappContinuous reconciliation, declarative deployments
Container BuildDocker 27+docker build -t app:v1 .Package applications with dependencies
Kubernetes Deploymentkubectl / Helm (K8s 1.34+)kubectl apply -f deploy.yaml / helm upgrade app ./chartDeploy to K8s cluster, manage releases
CI/CD PipelineGitHub ActionsDefine workflow in .github/workflows/ci.ymlAutomated testing, building, deploying
Security ScanningTrivy / Falcotrivy image myapp:latestVulnerability scanning, runtime security
Monitoring & AlertsPrometheus + GrafanaConfigure ServiceMonitor and AlertManagerObservability, SLO tracking, incident alerts
Load Testingk6 / Locustk6 run load-test.jsPerformance validation, capacity planning
Incident ResponsePagerDuty / OpsgenieConfigure escalation policiesOn-call management, automated escalation
Platform EngineeringBackstage / PortDeploy internal developer portalSelf-service infrastructure, golden paths

Decision Tree: Choosing DevOps Approach

What do you need to accomplish?
    β”œβ”€ Infrastructure provisioning?
    β”‚   β”œβ”€ Cloud-agnostic β†’ Terraform (multi-cloud support)
    β”‚   β”œβ”€ AWS-specific β†’ CloudFormation or Terraform
    β”‚   β”œβ”€ GCP-specific β†’ Deployment Manager or Terraform
    β”‚   └─ Azure-specific β†’ ARM templates or Terraform
    β”‚
    β”œβ”€ Application deployment?
    β”‚   β”œβ”€ Kubernetes cluster?
    β”‚   β”‚   β”œβ”€ Simple deploy β†’ kubectl apply -f manifests/
    β”‚   β”‚   β”œβ”€ Complex app β†’ Helm charts
    β”‚   β”‚   └─ GitOps workflow β†’ ArgoCD or FluxCD
    β”‚   └─ Serverless?
    β”‚       β”œβ”€ AWS β†’ Lambda + SAM/Serverless Framework
    β”‚       β”œβ”€ GCP β†’ Cloud Functions
    β”‚       └─ Azure β†’ Azure Functions
    β”‚
    β”œβ”€ CI/CD pipeline setup?
    β”‚   β”œβ”€ GitHub-based β†’ GitHub Actions (template-github-actions.md)
    β”‚   β”œβ”€ GitLab-based β†’ GitLab CI
    β”‚   β”œβ”€ Enterprise β†’ Jenkins or Tekton
    β”‚   └─ Security-first β†’ Add SAST/DAST/SCA scans (template-ci-cd.md)
    β”‚
    β”œβ”€ Observability & monitoring?
    β”‚   β”œβ”€ Metrics β†’ Prometheus + Grafana
    β”‚   β”œβ”€ Distributed tracing β†’ Jaeger or OpenTelemetry
    β”‚   β”œβ”€ Logs β†’ Loki or ELK stack
    β”‚   β”œβ”€ eBPF-based β†’ Cilium + Hubble (sidecarless)
    β”‚   └─ Unified platform β†’ Datadog or New Relic
    β”‚
    β”œβ”€ Incident management?
    β”‚   β”œβ”€ On-call rotation β†’ PagerDuty or Opsgenie
    β”‚   β”œβ”€ Postmortem β†’ template-postmortem.md
    β”‚   └─ Communication β†’ template-incident-comm.md
    β”‚
    β”œβ”€ Platform engineering?
    β”‚   β”œβ”€ Self-service β†’ Backstage or Port (internal developer portal)
    β”‚   β”œβ”€ Policy enforcement β†’ OPA/Gatekeeper
    β”‚   └─ Golden paths β†’ Template repositories + automation
    β”‚
    └─ Security hardening?
        β”œβ”€ Container scanning β†’ Trivy or Grype
        β”œβ”€ Runtime security β†’ Falco or Sysdig
        β”œβ”€ Secrets management β†’ HashiCorp Vault or cloud-native KMS
        └─ Compliance β†’ CIS Benchmarks, template-security-hardening.md

When to Use This Skill

Claude should invoke this skill when users request:

  • Platform engineering patterns (self-service developer platforms, internal tools)
  • GitOps workflows (ArgoCD, FluxCD, declarative infrastructure management)
  • Infrastructure as Code patterns (Terraform, K8s manifests, policy as code)
  • CI/CD pipelines with DevSecOps (GitHub Actions, security scanning, SAST/DAST/SCA)
  • SRE incident management, AI-driven alerting, escalation, or postmortem templates
  • eBPF-based observability (Cilium, Hubble, kernel-level insights, OpenTelemetry)
  • Kubernetes operational patterns (day-2 operations, resource management, workload placement)
  • Cloud-native monitoring (Prometheus, Grafana, unified observability platforms)
  • Team workflow, communication, handover guides, and runbooks

Resources (Best Practices Guides)

Operational best practices by domain:

Each guide includes:

  • Checklists for completeness and safety
  • Common anti-patterns and remediations
  • Step-by-step patterns for safe rollout, rollback, and verification
  • Decision matrices (e.g., deployment, escalation, monitoring strategy)
  • Real-world examples and edge case handling

Templates (Copy-Paste Ready)

Production templates organized by tech stack (27 templates total):

AWS Cloud

GCP Cloud

Azure Cloud

Kubernetes

Docker

Kafka

Terraform & IaC

CI/CD Pipelines

Monitoring & Observability

Incident Response

Security


Navigation

Resources

Shared Utilities (Centralized patterns β€” extract, don't duplicate)

Templates

Data


Related Skills

Operations & Infrastructure:

Security & Compliance:

Software Development:

AI/ML Operations:


Operational Deep Dives

See resources/operational-patterns.md for:

  • Platform engineering blueprints and GitOps reconciliation checklists
  • DevSecOps CI/CD gates, SLO/SLI playbooks, and rollout verification steps
  • Observability patterns (eBPF), AIOps incident handling, and reliability drills

External Resources

See data/sources.json for 45+ curated sources organized by tech stack:

  • Cloud Platforms: AWS, GCP, Azure documentation and best practices
  • Container Orchestration: Kubernetes, Helm, Kustomize, Docker
  • Infrastructure as Code: Terraform, CloudFormation, ARM templates
  • CI/CD & GitOps: GitHub Actions, GitLab CI, Jenkins, ArgoCD, FluxCD
  • Streaming: Apache Kafka, Confluent, Strimzi
  • Monitoring: Prometheus, Grafana, Datadog, OpenTelemetry, Jaeger
  • SRE: Google SRE books, incident response patterns
  • Security: OWASP DevSecOps, CIS Benchmarks, Trivy, Falco
  • Tools: kubectl, k9s, stern, Cosign, Syft, Terragrunt

Use this skill as a hub for safe, modern, and production-grade DevOps patterns. All templates and patterns are operationalβ€”no theory or book summaries.