RAG Architecture Deep Dive

Vector Database vs Vectorless RAG

Traditional RAG systems lean on vector similarity to find relevant context. A new approach — Vectorless RAG with PageIndex — replaces embeddings with tree-structured reasoning. Here's what every architect needs to know.

📅 March 2026 ⏱ 5 min read 🏷️ RAG · LLM · Enterprise Architecture

Traditional RAG with Vector Databases

Retrieval-Augmented Generation has become the backbone of modern AI systems that answer questions using external knowledge. The traditional approach relies on vector databases and embedding-based similarity search — converting documents into high-dimensional numerical representations and finding the closest matches to a user's query.

Vector RAG Pipeline
Documents Chunking Embedding Model Vector DB Similarity Search Top-K Chunks → LLM

How It Works

Large documents are split into smaller chunks. Each chunk is converted into a vector using embedding models (such as OpenAI embeddings, Sentence Transformers, BGE, or E5). These vectors are stored in databases like Pinecone, FAISS, Weaviate, Milvus, or Chroma, which support Approximate Nearest Neighbor (ANN) search.

When a user asks a question, the query is also embedded into a vector. The system then compares it against stored vectors using cosine similarity, dot product, or Euclidean distance — retrieving the top-K most similar chunks and passing them as context to the LLM for answer generation.

Limitations of Vector DB RAG

Despite its power, vector-based retrieval has known shortcomings:

Chunking breaks context Similarity ≠ Relevance Expensive embeddings Infrastructure complexity Black-box retrieval

For long, structured documents like financial reports or legal manuals, vector similarity can return text that is semantically close but not logically relevant to the query. This is where the vectorless approach steps in.

Vectorless RAG with PageIndex

PageIndex introduces a fundamentally different philosophy: instead of embedding text into vector space, it builds a hierarchical tree-structured index of the document — much like a table of contents. Retrieval is then performed via LLM-guided reasoning over that tree, rather than numeric similarity matching.

Vectorless RAG Pipeline
Documents Document Parsing Tree Structure Index LLM Reasoning Relevant Nodes → LLM

Tree Index Structure

📄 Document
 ├── Section 1: Introduction
 │   ├── Subsection 1.1 — Background
 │   └── Subsection 1.2 — Scope
 ├── Section 2: Financials
 │   ├── Quarterly Results
 │   │   └── ▸ Q3 Revenue ← target
 │   └── Annual Summary
 └── Section 3: Conclusion

Each node stores a section title, summary, content, and metadata. When a user asks "What is the Q3 revenue?", the LLM navigates the tree logically — Document → Financials → Quarterly Results → Q3 Revenue — mimicking how a human analyst would read. Because retrieval follows the document structure, the process is fully traceable and explainable.

When to Use Each Approach

Vector DB RAG

Best for breadth
  • Searching across many unrelated documents
  • Semantic similarity is sufficient
  • Real-time search on large datasets
  • Customer support, chatbots, product search
🧠

Vectorless RAG

Best for depth
  • Long, structured documents
  • Logical reasoning is required
  • High accuracy & traceability are critical
  • Financial filings, legal docs, research papers
Dimension Vector DB RAG Vectorless RAG
IndexingEmbedding vectorsHierarchical tree
SearchCosine / ANN similarityLLM-guided tree traversal
ContextChunks (may lose context)Full sections (preserves structure)
ExplainabilityLow (black-box scores)High (traceable path)
Best fitBroad, multi-doc searchDeep, structured documents
Infra costVector DB + embedding modelsLLM reasoning calls

Conclusion

Vector databases revolutionised RAG by enabling semantic search at scale. But they aren't the right hammer for every nail. Vectorless approaches like PageIndex flip the paradigm: instead of retrieving text that is similar, they retrieve text that is logically relevant by navigating document structure with reasoning.

The future of RAG is hybrid. As these architectures mature, the most robust systems will likely combine vector similarity for broad discovery with tree-based reasoning for precision — giving you the best of both worlds.