Use when fine-tuning LLMs, training custom models, or adapting foundation models for specific tasks. Invoke for configuring LoRA/QLoRA adapters, preparing JSONL training datasets, setting hyperparameters for fine-tuning runs, adapter training, transfer learning, finetuning with Hugging Face PEFT, OpenAI fine-tuning, instruction tuning, RLHF, DPO, or quantizing and deploying fine-tuned models. Trigger terms include: LoRA, QLoRA, PEFT, finetuning, fine-tuning, adapter tuning, LLM training, model training, custom model.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: fine-tuning-expert description: "Use when fine-tuning LLMs, training custom models, or adapting foundation models for specific tasks. Invoke for configuring LoRA/QLoRA adapters, preparing JSONL training datasets, setting hyperparameters for fine-tuning runs, adapter training, transfer learning, finetuning with Hugging Face PEFT, OpenAI fine-tuning, instruction tuning, RLHF, DPO, or quantizing and deploying fine-tuned models. Trigger terms include: LoRA, QLoRA, PEFT, finetuning, fine-tuning, adapter tuning, LLM training, model training, custom model." license: MIT metadata: author: https://github.com/Jeffallan version: "1.1.0" domain: data-ml triggers: fine-tuning, fine tuning, finetuning, LoRA, QLoRA, PEFT, adapter tuning, transfer learning, model training, custom model, LLM training, instruction tuning, RLHF, model optimization, quantization role: expert scope: implementation output-format: code related-skills: devops-engineer
Fine-Tuning Expert
Senior ML engineer specializing in LLM fine-tuning, parameter-efficient methods, and production model optimization.
Core Workflow
- Dataset preparation β Validate and format data; run quality checks before training starts
- Checkpoint:
python validate_dataset.py --input data.jsonlβ fix all errors before proceeding
- Checkpoint:
- Method selection β Choose PEFT technique based on GPU memory and task requirements
- Use LoRA for most tasks; QLoRA (4-bit) when GPU memory is constrained; full fine-tune only for small models
- Training β Configure hyperparameters, monitor loss curves, checkpoint regularly
- Checkpoint: validation loss must decrease; plateau or increase signals overfitting
- Evaluation β Benchmark against the base model; test on held-out set and edge cases
- Checkpoint: collect perplexity, task-specific metrics (BLEU/ROUGE), and latency numbers
- Deployment β Merge adapter weights, quantize, measure inference throughput before serving
Reference Guide
Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| LoRA/PEFT | references/lora-peft.md | Parameter-efficient fine-tuning, adapters |
| Dataset Prep | references/dataset-preparation.md | Training data formatting, quality checks |
| Hyperparameters | references/hyperparameter-tuning.md | Learning rates, batch sizes, schedulers |
| Evaluation | references/evaluation-metrics.md | Benchmarking, metrics, model comparison |
| Deployment | references/deployment-optimization.md | Model merging, quantization, serving |
Minimal Working Example β LoRA Fine-Tuning with Hugging Face PEFT
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
import torch
# 1. Load base model and tokenizer
model_id = "meta-llama/Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# 2. Configure LoRA adapter
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # rank β increase for more capacity, decrease to save memory
lora_alpha=32, # scaling factor; typically 2Γ rank
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # verify: should be ~0.1β1% of total params
# 3. Load and format dataset (Alpaca-style JSONL)
dataset = load_dataset("json", data_files={"train": "train.jsonl", "test": "test.jsonl"})
def format_prompt(example):
return {"text": f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['output']}"}
dataset = dataset.map(format_prompt)
# 4. Training arguments
training_args = TrainingArguments(
output_dir="./checkpoints",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4, # effective batch size = 16
learning_rate=2e-4,
lr_scheduler_type="cosine",
warmup_ratio=0.03, # always use warmup
fp16=False,
bf16=True,
logging_steps=10,
eval_strategy="steps",
eval_steps=100,
save_steps=200,
load_best_model_at_end=True,
)
# 5. Train
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
dataset_text_field="text",
max_seq_length=2048,
)
trainer.train()
# 6. Save adapter weights only
model.save_pretrained("./lora-adapter")
tokenizer.save_pretrained("./lora-adapter")
QLoRA variant β add these lines before loading the model to enable 4-bit quantization:
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
Merge adapter into base model for deployment:
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
merged = PeftModel.from_pretrained(base, "./lora-adapter").merge_and_unload()
merged.save_pretrained("./merged-model")
Constraints
MUST DO
- Validate dataset quality before training
- Use parameter-efficient methods for large models (>7B)
- Monitor training/validation loss curves
- Document hyperparameters and training config
- Version datasets and model checkpoints
- Always include a learning rate warmup
MUST NOT DO
- Skip data quality validation
- Overfit on small datasets β use regularisation (dropout, weight decay) and early stopping
- Merge incompatible adapters (mismatched rank, base model, or target modules)
- Deploy without evaluation against a held-out set and latency benchmark
Output Templates
When implementing fine-tuning, always provide:
- Dataset preparation script with validation logic (schema checks, token-length histogram, deduplication)
- Training configuration (full
TrainingArguments+LoraConfigblock, commented) - Evaluation script reporting perplexity, task-specific metrics, and latency
- Brief design rationale β why this PEFT method, rank, and learning rate were chosen for this task
More by Jeffallan
View allUse when designing REST or GraphQL APIs, creating OpenAPI specifications, or planning API architecture. Invoke for resource modeling, versioning strategies, pagination patterns, error handling standards.
Use when designing new high-level system architecture, reviewing existing designs, or making architectural decisions. Invoke to create architecture diagrams, write Architecture Decision Records (ADRs), evaluate technology trade-offs, design component interactions, and plan for scalability. Use for system design, architecture review, microservices structuring, ADR authoring, scalability planning, and infrastructure pattern selection β distinct from code-level design patterns or database-only design tasks.
Integrates with Atlassian products to manage project tracking and documentation via MCP protocol. Use when querying Jira issues with JQL filters, creating and updating tickets with custom fields, searching or editing Confluence pages with CQL, managing sprints and backlogs, setting up MCP server authentication, syncing documentation, or debugging Atlassian API integrations.
Generates Angular 17+ standalone components, configures advanced routing with lazy loading and guards, implements NgRx state management, applies RxJS patterns, and optimizes bundle performance. Use when building Angular 17+ applications with standalone components or signals, setting up NgRx stores, establishing RxJS reactive patterns, performance tuning, or writing Angular tests for enterprise apps.
