Production-grade agentic system development with LlamaIndex in Python. Covers semantic ingestion (SemanticSplitterNodeParser, CodeSplitter, IngestionPipeline), retrieval strategies (BM25Retriever, hybrid search, alpha weighting), PropertyGraphIndex with graph stores (Neo4j), context RAG (RouterQueryEngine, SubQuestionQueryEngine, LLMRerank), agentic orchestration (ReAct, Workflows, FunctionTool), and observability (Arize Phoenix). Use when asked to "build a LlamaIndex agent", "set up semantic chunking", "index source code", "implement hybrid search", "create a knowledge graph with LlamaIndex", "implement query routing", "debug RAG pipeline", "add Phoenix observability", or "create an event-driven workflow". Triggers on "PropertyGraphIndex", "SemanticSplitterNodeParser", "CodeSplitter", "BM25Retriever", "hybrid search", "ReAct agent", "Workflow pattern", "LLMRerank", "Text-to-Cypher".
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
npx agent-skills-cli listSkill Instructions
name: developing-llamaindex-systems description: >- Production-grade agentic system development with LlamaIndex in Python. Covers semantic ingestion (SemanticSplitterNodeParser, CodeSplitter, IngestionPipeline), retrieval strategies (BM25Retriever, hybrid search, alpha weighting), PropertyGraphIndex with graph stores (Neo4j), context RAG (RouterQueryEngine, SubQuestionQueryEngine, LLMRerank), agentic orchestration (ReAct, Workflows, FunctionTool), and observability (Arize Phoenix). Use when asked to "build a LlamaIndex agent", "set up semantic chunking", "index source code", "implement hybrid search", "create a knowledge graph with LlamaIndex", "implement query routing", "debug RAG pipeline", "add Phoenix observability", or "create an event-driven workflow". Triggers on "PropertyGraphIndex", "SemanticSplitterNodeParser", "CodeSplitter", "BM25Retriever", "hybrid search", "ReAct agent", "Workflow pattern", "LLMRerank", "Text-to-Cypher". allowed-tools:
- Read
- Write
- Bash
- WebFetch
- Grep
- Glob metadata: version: 1.2.0 last-updated: 2025-12-28 category: frameworks python-version: ">=3.9"
LlamaIndex Agentic Systems
Build production-grade agentic RAG systems with semantic ingestion, knowledge graphs, dynamic routing, and observability.
Quick Start
Build a working agent in 6 steps:
Step 1: Install Dependencies
pip install llama-index-core>=0.10.0 llama-index-llms-openai llama-index-embeddings-openai arize-phoenix
See scripts/requirements.txt for full pinned dependencies.
Step 2: Ingest with Semantic Chunking
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
splitter = SemanticSplitterNodeParser(
buffer_size=1,
breakpoint_percentile_threshold=95,
embed_model=embed_model
)
docs = SimpleDirectoryReader(input_files=["data.pdf"]).load_data()
nodes = splitter.get_nodes_from_documents(docs)
Step 3: Build Index
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex(nodes, embed_model=embed_model)
index.storage_context.persist(persist_dir="./storage")
Step 4: Verify Index
# Confirm index built correctly
print(f"Indexed {len(index.docstore.docs)} document chunks")
# Preview a sample node
sample = list(index.docstore.docs.values())[0]
print(f"Sample chunk: {sample.text[:200]}...")
Step 5: Create Query Engine
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What are the key concepts?")
print(response)
Step 6: Enable Observability
import phoenix as px
import llama_index.core
px.launch_app()
llama_index.core.set_global_handler("arize_phoenix")
# All subsequent queries are now traced
For production script, run: python scripts/ingest_semantic.py
Architecture Overview
Six pillars for agentic systems:
| Pillar | Purpose | Reference |
|---|---|---|
| Ingestion | Semantic chunking, code splitting, metadata | references/ingestion.md |
| Retrieval | BM25 keyword search, hybrid fusion | references/retrieval-strategies.md |
| Property Graphs | Knowledge graphs + vector hybrid | references/property-graphs.md |
| Context RAG | Query routing, decomposition, reranking | references/context-rag.md |
| Orchestration | ReAct agents, event-driven Workflows | references/orchestration.md |
| Observability | Tracing, debugging, evaluation | references/observability.md |
Decision Trees
Which Node Parser?
Is the content source code?
├─ Yes → CodeSplitter
│ language="python" (or typescript, javascript, java, go)
│ chunk_lines=40, chunk_lines_overlap=15
│ → See: references/ingestion.md#codesplitter
│
└─ No, it's documents:
├─ Need semantic coherence (legal, technical docs)?
│ └─ Yes → SemanticSplitterNodeParser
│ buffer_size=1 (sensitive), 3 (stable)
│ breakpoint_percentile_threshold=95 (fewer), 70 (more)
│ → See: references/ingestion.md#semanticsplitternodeparser
│
├─ Prioritize speed → SentenceSplitter
│ chunk_size=1024, chunk_overlap=20
│ → See: references/ingestion.md#sentencesplitter
│
└─ Need fine-grained retrieval → SentenceWindowNodeParser
window_size=3 (surrounding sentences in metadata)
→ See: references/ingestion.md#sentencewindownodeparser
Trade-off: Semantic chunking requires embedding calls during ingestion (cost + latency).
Which Retrieval Mode?
Query contains exact terms (function names, error codes, IDs)?
├─ Yes, exact match critical → BM25
│ retriever = BM25Retriever.from_defaults(nodes=nodes)
│ → See: references/retrieval-strategies.md#bm25retriever
│
├─ Conceptual/semantic query → Vector
│ retriever = index.as_retriever(similarity_top_k=5)
│ → See: references/context-rag.md
│
└─ Mixed or unknown query type → Hybrid (recommended default)
alpha=0.5 (equal weight), 0.3 (favor BM25), 0.7 (favor vector)
→ See: references/retrieval-strategies.md#hybrid-search
Trade-off: Hybrid adds BM25 index overhead but provides most robust retrieval.
Which Graph Extractor?
Need document navigation only (prev/next/parent)?
├─ Yes → ImplicitPathExtractor (no LLM, zero cost)
│ → See: references/property-graphs.md#implicitpathextractor
│
└─ No, need semantic relationships:
├─ Fixed ontology required (regulated domain)?
│ └─ Yes → SchemaLLMPathExtractor
│ Pass schema: {"PERSON": ["WORKS_AT"], "COMPANY": ["LOCATED_IN"]}
│ → See: references/property-graphs.md#schemallmpathextractor
│
└─ No, discovery/exploration:
└─ SimpleLLMPathExtractor
max_paths_per_chunk=10 (control noise)
→ See: references/property-graphs.md#simplellmpathextractor
Which Graph Retriever?
Need SQL-like aggregations (COUNT, SUM)?
├─ Yes, trusted environment → TextToCypherRetriever
│ Risk: LLM syntax errors, injection
│ → See: references/property-graphs.md#texttocypherretriever
│
├─ Yes, need safety → CypherTemplateRetriever
│ Pre-define: MATCH (p:Person {name: $name}) RETURN p
│ LLM only extracts parameters
│ → See: references/property-graphs.md#cyphertemplateretriever
│
└─ No, robustness priority → VectorContextRetriever
Vector search → graph traversal (path_depth=2)
Most reliable, no code generation
→ See: references/property-graphs.md#vectorcontextretriever
Which Agent Pattern?
Simple tool loop sufficient?
├─ Yes → ReAct Agent (FunctionCallingAgent)
│ Tools via FunctionTool or ToolSpec
│ → See: references/orchestration.md#react-agent-pattern
│
└─ No, need:
├─ Branching/cycles → Workflow
│ → See: references/orchestration.md#branching
├─ Human-in-the-loop → Workflow (suspend/resume)
│ → See: references/orchestration.md#human-in-the-loop
├─ Multi-agent handoff → Workflow + Concierge pattern
│ → See: references/orchestration.md#concierge-multi-agent
└─ Parallel execution → Workflow with multiple event emissions
→ See: references/orchestration.md#workflows
Common Patterns
Pattern 1: Metadata-Enriched Ingestion
from llama_index.core.extractors import TitleExtractor, SummaryExtractor, KeywordExtractor
from llama_index.core.ingestion import IngestionPipeline
pipeline = IngestionPipeline(
transformations=[
splitter,
TitleExtractor(),
SummaryExtractor(),
KeywordExtractor(keywords=5),
embed_model,
]
)
nodes = pipeline.run(documents=docs)
Pattern 2: PropertyGraphIndex with Hybrid Retrieval
from llama_index.core import PropertyGraphIndex
from llama_index.core.indices.property_graph import SimpleLLMPathExtractor
index = PropertyGraphIndex.from_documents(
docs,
embed_model=embed_model,
kg_extractors=[SimpleLLMPathExtractor(max_paths_per_chunk=10)],
)
# Hybrid: vector search + graph traversal
retriever = index.as_retriever(include_text=True)
Pattern 3: Router with Multiple Engines
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool
tools = [
QueryEngineTool.from_defaults(
query_engine=summary_engine,
description="High-level summaries and overviews"
),
QueryEngineTool.from_defaults(
query_engine=detail_engine,
description="Specific facts, numbers, and details"
),
]
router = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(),
query_engine_tools=tools,
)
Pattern 4: Event-Driven Workflow
from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent, Event
class QueryEvent(Event):
query: str
class MyAgent(Workflow):
@step
async def classify(self, ev: StartEvent) -> QueryEvent:
return QueryEvent(query=ev.get("query"))
@step
async def respond(self, ev: QueryEvent) -> StopEvent:
result = self.query_engine.query(ev.query)
return StopEvent(result=str(result))
# Run
agent = MyAgent(timeout=60)
result = await agent.run(query="What is X?")
Pattern 5: Reranking Pipeline
from llama_index.core.postprocessor import SimilarityPostprocessor, LLMRerank
query_engine = index.as_query_engine(
similarity_top_k=10, # Retrieve more
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.7),
LLMRerank(top_n=3), # Rerank to top 3
]
)
Script Reference
| Script | Purpose | Usage |
|---|---|---|
scripts/ingest_semantic.py | Build index with semantic chunking + graph | python scripts/ingest_semantic.py --doc path/to/file.pdf |
scripts/agent_workflow.py | Event-driven agent template | python scripts/agent_workflow.py |
scripts/requirements.txt | Pinned dependencies | pip install -r scripts/requirements.txt |
Adapt scripts by modifying configuration variables at the top of each file.
Reference Index
Load references based on task:
| Task | Load Reference |
|---|---|
| Configure chunking strategy | references/ingestion.md |
| Add metadata extractors | references/ingestion.md |
| Build knowledge graph | references/property-graphs.md |
| Choose graph store (Neo4j, etc.) | references/property-graphs.md |
| Implement query routing | references/context-rag.md |
| Decompose complex queries | references/context-rag.md |
| Add reranking | references/context-rag.md |
| Build ReAct agent | references/orchestration.md |
| Create Workflow | references/orchestration.md |
| Multi-agent system | references/orchestration.md |
| Setup Phoenix tracing | references/observability.md |
| Debug retrieval failures | references/observability.md |
| Evaluate agent quality | references/observability.md |
Troubleshooting
Agent says "I don't know" with relevant data
Diagnose:
# Open Phoenix UI at http://localhost:6006
# Navigate to Traces → Select query → Retrieval span → Retrieved Nodes
Fix:
# 1. Increase retrieval candidates
query_engine = index.as_query_engine(similarity_top_k=10) # was 5
# 2. Add reranking to improve precision
from llama_index.core.postprocessor import LLMRerank
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[LLMRerank(top_n=3)]
)
Verify: Re-run query, check Phoenix shows improved relevance scores (>0.7).
Semantic chunking too slow
Diagnose:
# Time the ingestion
import time
start = time.time()
nodes = splitter.get_nodes_from_documents(docs)
print(f"Chunking took {time.time() - start:.1f}s for {len(docs)} docs")
Fix:
# Option 1: Use local embeddings (no API calls)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Option 2: Hybrid strategy for large corpora
bulk_nodes = SentenceSplitter().get_nodes_from_documents(bulk_docs)
critical_nodes = SemanticSplitterNodeParser(...).get_nodes_from_documents(critical_docs)
Verify: Re-run with show_progress=True, confirm <1s per document.
Graph extraction producing noise
Diagnose:
# Check extracted triples
for node in index.property_graph_store.get_triplets():
print(node) # Look for irrelevant or duplicate relationships
Fix:
# Option 1: Reduce paths per chunk
SimpleLLMPathExtractor(max_paths_per_chunk=5) # was 10
# Option 2: Use strict schema
SchemaLLMPathExtractor(
possible_entities=["PERSON", "COMPANY"],
possible_relations=["WORKS_AT", "FOUNDED"],
strict=True
)
Verify: Re-index, confirm triplet count reduced and relationships are relevant.
Workflow step not triggering
Diagnose:
# Enable verbose mode
agent = MyWorkflow(timeout=60, verbose=True)
result = await agent.run(query="test")
# Check console for: [Step Name] Received event: EventType
Fix:
# Verify type hints match exactly
class MyEvent(Event):
query: str
@step
async def my_step(self, ev: MyEvent) -> StopEvent: # Type hint must be MyEvent
...
Verify: Verbose output shows [my_step] Received event: MyEvent.
Phoenix not showing traces
Diagnose:
import phoenix as px
session = px.launch_app()
print(f"Phoenix URL: {session.url}") # Should print http://localhost:6006
Fix:
# MUST call BEFORE any LlamaIndex imports/operations
import phoenix as px
px.launch_app()
import llama_index.core
llama_index.core.set_global_handler("arize_phoenix")
# Now import and use LlamaIndex
from llama_index.core import VectorStoreIndex
Verify: Make a query, refresh Phoenix UI, trace appears within 5 seconds.
When Not to Use This Skill
This skill is specific to LlamaIndex in Python. Do not use for:
- LangChain projects — Different framework, different APIs
- Pure vector search without agents — Simpler solutions exist
- Non-Python environments — All examples are Python 3.9+
- Local-only / offline setups — Scripts default to OpenAI APIs; modification required for local models
- Simple Q&A bots — Overkill if you don't need graphs, routing, or workflows
If unsure: Check if your use case involves semantic chunking, knowledge graphs, query routing, or multi-step agents. If yes, this skill applies.
Glossary
| Term | Definition |
|---|---|
| Node | Chunk of text with metadata, the atomic unit of retrieval |
| PropertyGraphIndex | Index combining vector embeddings with labeled property graph |
| Extractor | Component that generates graph triples from text |
| Retriever | Component that fetches relevant nodes/context |
| Postprocessor | Filters or reranks nodes after retrieval |
| Workflow | Event-driven state machine for agent orchestration |
| Span | Duration-tracked operation in observability |
More by SpillwaveSolutions
View allInstallation and configuration skill for Agent Brain document search system. Use when asked to "install agent brain", "setup agent brain", "configure agent brain", "setting up document search", "installing agent-brain packages", "configuring API keys", "initializing project for search", "troubleshooting agent brain", "pip install agent-brain", "agent brain not working", "agent brain setup error", "configure embeddings provider", "setup ollama for agent brain", or "agent brain environment variables". Covers package installation, provider configuration, project initialization, and server management.
Expert Agent Brain skill for document search with BM25 keyword, semantic vector, hybrid, graph, and multi retrieval modes. Use when asked to "search documentation", "query domain", "find in docs", "bm25 search", "hybrid search", "semantic search", "graph search", "multi search", "find dependencies", "code relationships", "searching knowledge base", "querying indexed documents", "finding code references", "exploring codebase", "what calls this function", "find imports", "trace dependencies", "brain search", "brain query", "knowledge base search", "cache management", "clear embedding cache", "cache hit rate", or "cache status". Supports multi-instance architecture with automatic server discovery. GraphRAG mode enables relationship-aware queries for code dependencies and entity connections. Pluggable providers for embeddings (OpenAI, Cohere, Ollama) and summarization (Anthropic, OpenAI, Gemini, Grok, Ollama). Supports multiple runtimes (Claude Code, OpenCode, Gemini CLI) with shared .agent-brain/ data directory.
Advanced document search with BM25 keyword matching, semantic vector search, and hybrid retrieval.Enables precise technical queries, conceptual understanding, and intelligent result fusion.Supports local document indexing and provides comprehensive search capabilities for knowledge bases.
Set up and maintain a structured project memory system in docs/project_notes/ that tracks bugs with solutions, architectural decisions, key project facts, and work history. Use this skill when asked to "set up project memory", "track our decisions", "log a bug fix", "update project memory", or "initialize memory system". Configures both CLAUDE.md and AGENTS.md to maintain memory awareness across different AI coding tools.
