Agent SkillsAgent Skills
alirezarezvani

senior-computer-vision

@alirezarezvani/senior-computer-vision
alirezarezvani
8,326
986 forks
Updated 3/31/2026
View on GitHub

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

Installation

$npx agent-skills-cli install @alirezarezvani/senior-computer-vision
Claude Code
Cursor
Copilot
Codex
Antigravity

Details

Pathengineering-team/senior-computer-vision/SKILL.md
Branchmain
Scoped Name@alirezarezvani/senior-computer-vision

Usage

After installing, this skill will be available to your AI coding assistant.

Verify installation:

npx agent-skills-cli list

Skill Instructions


name: "senior-computer-vision" description: Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

Senior Computer Vision Engineer

Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.

Table of Contents

Quick Start

# Generate training configuration for YOLO or Faster R-CNN
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8

# Analyze model for optimization opportunities (quantization, pruning)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark

# Build dataset pipeline with augmentations
python scripts/dataset_pipeline_builder.py images/ --format coco --augment

Core Expertise

This skill provides guidance on:

  • Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
  • Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
  • Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
  • Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
  • Video Analysis: Object tracking (ByteTrack, SORT), action recognition
  • 3D Vision: Depth estimation, point cloud processing, NeRF
  • Production Deployment: ONNX, TensorRT, OpenVINO, CoreML

Tech Stack

CategoryTechnologies
FrameworksPyTorch, torchvision, timm
DetectionUltralytics (YOLO), Detectron2, MMDetection
Segmentationsegment-anything, mmsegmentation
OptimizationONNX, TensorRT, OpenVINO, torch.compile
Image ProcessingOpenCV, Pillow, albumentations
AnnotationCVAT, Label Studio, Roboflow
Experiment TrackingMLflow, Weights & Biases
ServingTriton Inference Server, TorchServe

Workflow 1: Object Detection Pipeline

Use this workflow when building an object detection system from scratch.

Step 1: Define Detection Requirements

Analyze the detection task requirements:

Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]

Step 2: Select Detection Architecture

Choose architecture based on requirements:

RequirementRecommended ArchitectureWhy
Real-time (>30 FPS)YOLOv8/v11, RT-DETRSingle-stage, optimized for speed
High accuracyFaster R-CNN, DINOTwo-stage, better localization
Small objectsYOLO + SAHI, Faster R-CNN + FPNMulti-scale detection
Edge deploymentYOLOv8n, MobileNetV3-SSDLightweight architectures
Transformer-basedDETR, DINO, RT-DETREnd-to-end, no NMS required

Step 3: Prepare Dataset

Convert annotations to required format:

# COCO format (recommended)
python scripts/dataset_pipeline_builder.py data/images/ \
    --annotations data/labels/ \
    --format coco \
    --split 0.8 0.1 0.1 \
    --output data/coco/

# Verify dataset
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

Step 4: Configure Training

Generate training configuration:

# For Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch yolov8m \
    --epochs 100 \
    --batch 16 \
    --imgsz 640 \
    --output configs/

# For Detectron2
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch faster_rcnn_R_50_FPN \
    --framework detectron2 \
    --output configs/

Step 5: Train and Validate

# Ultralytics training
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640

# Detectron2 training
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1

# Validate on test set
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

Step 6: Evaluate Results

Key metrics to analyze:

MetricTargetDescription
mAP@50>0.7Mean Average Precision at IoU 0.5
mAP@50:95>0.5COCO primary metric
Precision>0.8Low false positives
Recall>0.8Low missed detections
Inference time<33msFor 30 FPS real-time

Workflow 2: Model Optimization and Deployment

Use this workflow when preparing a trained model for production deployment.

Step 1: Benchmark Baseline Performance

# Measure current model performance
python scripts/inference_optimizer.py model.pt \
    --benchmark \
    --input-size 640 640 \
    --batch-sizes 1 4 8 16 \
    --warmup 10 \
    --iterations 100

Expected output:

Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M

Step 2: Select Optimization Strategy

Deployment TargetOptimization Path
NVIDIA GPU (cloud)PyTorch → ONNX → TensorRT FP16
NVIDIA GPU (edge)PyTorch → TensorRT INT8
Intel CPUPyTorch → ONNX → OpenVINO
Apple SiliconPyTorch → CoreML
Generic CPUPyTorch → ONNX Runtime
MobilePyTorch → TFLite or ONNX Mobile

Step 3: Export to ONNX

# Export with dynamic batch size
python scripts/inference_optimizer.py model.pt \
    --export onnx \
    --input-size 640 640 \
    --dynamic-batch \
    --simplify \
    --output model.onnx

# Verify ONNX model
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

Step 4: Apply Quantization (Optional)

For INT8 quantization with calibration:

# Generate calibration dataset
python scripts/inference_optimizer.py model.onnx \
    --quantize int8 \
    --calibration-data data/calibration/ \
    --calibration-samples 500 \
    --output model_int8.onnx

Quantization impact analysis:

PrecisionSizeSpeedAccuracy Drop
FP32100%1x0%
FP1650%1.5-2x<0.5%
INT825%2-4x1-3%

Step 5: Convert to Target Runtime

# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16

# OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/

# CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

Step 6: Benchmark Optimized Model

python scripts/inference_optimizer.py model.engine \
    --benchmark \
    --runtime tensorrt \
    --compare model.pt

Expected speedup:

Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP

Workflow 3: Custom Dataset Preparation

Use this workflow when preparing a computer vision dataset for training.

Step 1: Audit Raw Data

# Analyze image dataset
python scripts/dataset_pipeline_builder.py data/raw/ \
    --analyze \
    --output analysis/

Analysis report includes:

Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs

Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234

Step 2: Clean and Validate

# Remove corrupted and duplicate images
python scripts/dataset_pipeline_builder.py data/raw/ \
    --clean \
    --remove-corrupted \
    --remove-duplicates \
    --output data/cleaned/

Step 3: Convert Annotation Format

# Convert VOC to COCO format
python scripts/dataset_pipeline_builder.py data/cleaned/ \
    --annotations data/annotations/ \
    --input-format voc \
    --output-format coco \
    --output data/coco/

Supported format conversions:

FromTo
Pascal VOC XMLCOCO JSON
YOLO TXTCOCO JSON
COCO JSONYOLO TXT
LabelMe JSONCOCO JSON
CVAT XMLCOCO JSON

Step 4: Apply Augmentations

# Generate augmentation config
python scripts/dataset_pipeline_builder.py data/coco/ \
    --augment \
    --aug-config configs/augmentation.yaml \
    --output data/augmented/

Recommended augmentations for detection:

# configs/augmentation.yaml
augmentations:
  geometric:
    - horizontal_flip: { p: 0.5 }
    - vertical_flip: { p: 0.1 }  # Only if orientation invariant
    - rotate: { limit: 15, p: 0.3 }
    - scale: { scale_limit: 0.2, p: 0.5 }

  color:
    - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
    - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
    - blur: { blur_limit: 3, p: 0.1 }

  advanced:
    - mosaic: { p: 0.5 }  # YOLO-style mosaic
    - mixup: { p: 0.1 }   # Image mixing
    - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

Step 5: Create Train/Val/Test Splits

python scripts/dataset_pipeline_builder.py data/augmented/ \
    --split 0.8 0.1 0.1 \
    --stratify \
    --seed 42 \
    --output data/final/

Split strategy guidelines:

Dataset SizeTrainValTest
<1,000 images70%15%15%
1,000-10,00080%10%10%
>10,00090%5%5%

Step 6: Generate Dataset Configuration

# For Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config yolo \
    --output data.yaml

# For Detectron2
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config detectron2 \
    --output detectron2_config.py

Architecture Selection Guide

Object Detection Architectures

ArchitectureSpeedAccuracyBest For
YOLOv8n1.2ms37.3 mAPEdge, mobile, real-time
YOLOv8s2.1ms44.9 mAPBalanced speed/accuracy
YOLOv8m4.2ms50.2 mAPGeneral purpose
YOLOv8l6.8ms52.9 mAPHigh accuracy
YOLOv8x10.1ms53.9 mAPMaximum accuracy
RT-DETR-L5.3ms53.0 mAPTransformer, no NMS
Faster R-CNN R5046ms40.2 mAPTwo-stage, high quality
DINO-4scale85ms49.0 mAPSOTA transformer

Segmentation Architectures

ArchitectureTypeSpeedBest For
YOLOv8-segInstance4.5msReal-time instance seg
Mask R-CNNInstance67msHigh-quality masks
SAMPromptable50msZero-shot segmentation
DeepLabV3+Semantic25msScene parsing
SegFormerSemantic15msEfficient semantic seg

CNN vs Vision Transformer Trade-offs

AspectCNN (YOLO, R-CNN)ViT (DETR, DINO)
Training data needed1K-10K images10K-100K+ images
Training timeFastSlow (needs more epochs)
Inference speedFasterSlower
Small objectsGood with FPNNeeds multi-scale
Global contextLimitedExcellent
Positional encodingImplicitExplicit

Reference Documentation

→ See references/reference-docs-and-commands.md for details

Performance Targets

MetricReal-timeHigh AccuracyEdge
FPS>30>10>15
mAP@50>0.6>0.8>0.5
Latency P99<50ms<150ms<100ms
GPU Memory<4GB<8GB<2GB
Model Size<50MB<200MB<20MB

Resources

  • Architecture Guide: references/computer_vision_architectures.md
  • Optimization Guide: references/object_detection_optimization.md
  • Deployment Guide: references/production_vision_systems.md
  • Scripts: scripts/ directory for automation tools

More by alirezarezvani

View all
aws-solution-architect
8,326

Design AWS architectures for startups using serverless patterns and IaC templates. Use when asked to design serverless architecture, create CloudFormation templates, optimize AWS costs, set up CI/CD pipelines, or migrate to AWS. Covers Lambda, API Gateway, DynamoDB, ECS, Aurora, and cost optimization.

ms365-tenant-manager
8,326

Microsoft 365 tenant administration for Global Administrators. Automate M365 tenant setup, Office 365 admin tasks, Azure AD user management, Exchange Online configuration, Teams administration, and security policies. Generate PowerShell scripts for bulk operations, Conditional Access policies, license management, and compliance reporting. Use for M365 tenant manager, Office 365 admin, Azure AD users, Global Administrator, tenant configuration, or Microsoft 365 automation.

senior-architect
8,326

This skill should be used when the user asks to "design system architecture", "evaluate microservices vs monolith", "create architecture diagrams", "analyze dependencies", "choose a database", "plan for scalability", "make technical decisions", or "review system design". Use for architecture decision records (ADRs), tech stack evaluation, system design reviews, dependency analysis, and generating architecture diagrams in Mermaid, PlantUML, or ASCII format.

senior-backend
8,326

Designs and implements backend systems including REST APIs, microservices, database architectures, authentication flows, and security hardening. Use when the user asks to "design REST APIs", "optimize database queries", "implement authentication", "build microservices", "review backend code", "set up GraphQL", "handle database migrations", or "load test APIs". Covers Node.js/Express/Fastify development, PostgreSQL optimization, API security, and backend architecture patterns.