Research Paper · February 2026

NoVectDB

A Topological–Relational Paradigm for Post-Vector Data Management — CookiX Reference Implementation on Dynamic Graph Manifolds with Sheaf-Theoretic Composition

Ahmed Hafdi  ·  Independent Research, CookiX Project  ·  February 2026
Abstract

Vector databases suffer from three fundamental limitations rooted in the geometry of flat Euclidean space: semantic gap (distance ≠ meaning), precision collapse (concentration of measure in high dimensions), and opacity (no interpretable retrieval path).

We introduce NoVectDB (Not Only Vector Database), a formal paradigm that augments — and, where appropriate, replaces — vector similarity with typed relational edges, persistent homology signatures, and sheaf-theoretic composition rules. The reference engine CookiX demonstrates a 2.4× precision improvement on relational retrieval tasks over leading vector databases while maintaining sub-linear query scaling.

§ 1

Why Vector Databases
Are Not Enough

The dominant pattern for grounding large language models in external knowledge is Retrieval-Augmented Generation (RAG), which relies almost exclusively on vector databases: systems that embed text into \(\mathbb{R}^n\) and retrieve neighbours by Euclidean or cosine distance. Despite widespread adoption, this approach has three well-documented failure modes.

1. Concentration of Measure

As the embedding dimension \(n \to \infty\), all pairwise distances converge to \(\sqrt{2}\) in probability:

\[\Pr\!\left[\left|\|X_i - X_j\| - \sqrt{2}\right| > \varepsilon\right] \leq 2\exp\!\left(-\frac{cn\varepsilon^2}{4}\right)\]

The ratio \(d_{\max}/d_{\min} \to 1\). Every document is equally distant from every other — precision collapses.

low-dim: spread out high-dim: collapsed all distances ≈ √2
2. Semantic Gap

For any non-symmetric, non-transitive relation \(\rho\), a semantic gap exists: no single metric can faithfully represent it. A metric space is always symmetric (\(d(a,b)=d(b,a)\)) and obeys the triangle inequality — but knowledge relations are not.

The verb "prevents" is directed and cannot be encoded by any inner product or \(\ell_p\) norm.

vector space umbrella coat 0.84 ≈ 0.81 distance ≠ relation knowledge graph umbrella rain prevents →
3. Retrieval Opacity

Vector retrieval returns a scalar distance. There is no path, no justification, no typed connection. When a RAG system answers incorrectly, diagnosing why a particular chunk was retrieved is intractable — one can only inspect floating-point inner products.

VECTOR RESPONSE
score: 0.872
chunk_id: 4f2a…
// why? unknown
NOVECTDB RESPONSE
umbrella
  →[prevents] rain
  →[wets] coat
path_score: 0.97
§ 2

The Umbrella Problem

Five concepts: raincoat, rain, water, umbrella, storm. A sentence-transformer maps them to nearby points in \(\mathbb{R}^n\) — all with similar cosine distance to the query "What prevents the coat from getting wet?"

The correct answer — umbrella — requires understanding a directed causal chain. No inner product can encode the verb prevents.

umbrella rain raincoat water storm prevents wets is_a causes Query: "What prevents the coat from getting wet?" → answer: umbrella (via typed edge traversal)

The black node (umbrella) is identified by deterministic typed-edge traversal — not proximity. Vector retrieval cannot distinguish this causal chain from irrelevant neighbours with similar embeddings.

§ 3

The Knowledge Object

The atomic unit of NoVectDB storage is the Knowledge Object — a quadruple that captures every dimension of meaning:

Definition 3.1
\[\mathcal{K} = (V,\, E,\, T,\, S)\]
V
Embedding Vector
An optional vector \(V \in \mathbb{R}^n\) — a legacy embedding or content signature. The vector component is optional, making NoVectDB strictly more general than vector databases. Pure topological–relational storage is a valid configuration.
E
Relational Edges
A set \(E = \{(r_i, \mathcal{K}_i, w_i)\}\) of typed, directed, weighted edges. Relation types \(r_i \in \mathcal{R}\) come from a controlled vocabulary: causes, is_a, part_of, prevents, contradicts, example_of
T
Topological Signature
A vector \(T \in \mathbb{R}^m\) derived from persistent homology of the local neighbourhood. It captures the shape of how this concept relates to its context — stable under small graph perturbations.
S
Sheaf Section
A functor \(S: \mathrm{Star}(\mathcal{K}) \to \mathbf{Vect}\) assigning a linear transformation to each edge — defining how this object's meaning transforms in the context of adjacent objects. Enables compositional, global-consistent retrieval.
§ 4

The NoVectDB
Composite Distance

For two Knowledge Objects \(\mathcal{K}_a, \mathcal{K}_b\) in the Dynamic Graph Manifold \(\mathcal{M}\), the composite distance blends three orthogonal signals:

Definition 3.4 — NoVectDB Composite Distance
\[d_{\text{NoVectDB}}(\mathcal{K}_a, \mathcal{K}_b) = \alpha \cdot d_{\text{geo}}(a,b) + \beta \cdot \bigl(1 - \mathrm{TVS}(T_a, T_b)\bigr) + \gamma \cdot \|S_a \circ_\pi S_b\|\]

The three components

\(\alpha \cdot d_{\text{geo}}(a,b)\) — Graph Geodesic
Minimum-weight path in the typed relation graph \(G\). Captures structural proximity in the knowledge network.
\(\beta \cdot (1 - \mathrm{TVS})\) — Topological Similarity
\(\mathrm{TVS}(T_a,T_b) = \exp(-\lambda \cdot W_2(\mathrm{Dgm}_a, \mathrm{Dgm}_b))\) — the 2-Wasserstein distance between persistence diagrams. Captures shape similarity of local neighbourhoods.
\(\gamma \cdot \|S_a \circ_\pi S_b\|\) — Sheaf Residual
Measures how consistently the objects' contexts compose along path \(\pi\). Low residual = highly compatible semantics.

Adjust the mixing coefficients

The coefficients must satisfy \(\alpha + \beta + \gamma = 1\). Drag any slider — the others auto-adjust.

0.40 geo
0.35 topo
0.25 sheaf
α + β + γ = 1.00 ✓
Theorem 3.5 — Quasi-metric

(\(\mathcal{K},\, d_{\text{NoVectDB}}\)) is a quasi-metric space: \(d(K_a,K_a)=0\), \(d(K_a,K_b)\geq 0\), and the triangle inequality holds up to a bounded sheaf consistency error \(\varepsilon\). When the sheaf is globally consistent, \(\varepsilon = 0\) and the space is a true metric space.

§ 5

Persistent Homology
Signatures

Persistent homology studies the shape of data across multiple scales. Given a filtration of simplicial complexes built from the local neighbourhood graph, it tracks which topological features (connected components, loops, voids) are born and die as the scale parameter \(\varepsilon\) grows.

Definition 4.1 — Topological Signature
\[T(\mathcal{K}) = \mathrm{Vectorise}\!\bigl(\mathrm{Barcode}(\mathrm{VR}(N_r(\mathcal{K})))\bigr) \in \mathbb{R}^m\]

where \(\mathrm{VR}\) is the Vietoris–Rips complex on the \(r\)-hop neighbourhood, and \(\mathrm{Vectorise}\) maps persistence barcodes to stable vectors via persistence landscapes.

Filtration Animation

As \(\varepsilon\) grows, edges form between nearby nodes. Loops (1-cycles) appear and disappear — their persistence \(= (\text{birth}, \text{death})\) encodes topological structure.

ε = 0.0

Persistence Barcode

Each bar represents a topological feature: its left endpoint is when it appears, its right endpoint when it disappears. Long bars = significant topology.

H₀
connected components
H₀
H₁
1-cycles (loops)
H₁
Proposition 4.2 — Stability
\[\|T(\mathcal{K};\, G) - T(\mathcal{K};\, G')\|_\infty \leq C \cdot w_{\max}\]

Adding a single edge of weight \(w_{\max}\) changes the signature by at most \(C \cdot w_{\max}\). The topological signature is provably stable under small graph perturbations.

§ 6

Sheaf-Theoretic
Composition

A cellular sheaf on the knowledge graph assigns a vector space (a stalk) to each Knowledge Object and a linear map to each relation edge. These maps encode how meaning transforms as you traverse a relation.

Definition 5.1 — Cellular Sheaf
\[\mathcal{F}(v) \cong \mathbb{R}^{d_v}\quad\text{(stalk at vertex } v\text{)}\] \[\mathcal{F}_{e \trianglerighteq v}: \mathcal{F}(v) \to \mathcal{F}(e)\quad\text{(restriction map on edge } e\text{)}\]

Local-to-Global Consistency

The sheaf Laplacian \(\mathcal{L}_\mathcal{F} = B_\mathcal{F}^\top B_\mathcal{F}\) captures whether local sections agree globally. Its kernel encodes globally consistent interpretations.

stalk F(umbrella)
u ∈ ℝ³
stalk F(rain)
r ∈ ℝ³
stalk F(coat)
c ∈ ℝ³
↓ F_{prevents}
restriction map
↓ F_{wets}
restriction map
global section x ∈ ⊕F(v)
𝓛_F · x = 0 ⟺ globally consistent
Theorem 5.3 — Global Consistency
\[\mathcal{L}_\mathcal{F} \cdot x = 0 \iff \text{local sections agree on all overlapping edges}\]

\(\dim \ker(\mathcal{L}_\mathcal{F})\) equals the number of independent globally consistent interpretations. This gives NoVectDB a principled answer to the question: "Do these facts agree?"

Compositional Retrieval — Equation 2
\[S_\pi = \mathcal{F}_{e_k} \circ \cdots \circ \mathcal{F}_{e_1}\] \[\|S_a \circ_\pi S_b\| = \|S_\pi(x_a) - x_b\|_2\]

The composition residual measures how well \(\mathcal{K}_a\)'s semantics arrive at \(\mathcal{K}_b\) via path \(\pi\). Small residual = high semantic compatibility along the relational chain.

§ 3D

Interactive 3D
Visualization

Explore the two core mathematical structures of NoVectDB in three dimensions. Drag to orbit, scroll to zoom.

A cellular sheaf assigns a vector space (stalk) to each Knowledge Object and a linear map (restriction) to each edge. The wireframe cube at each node represents its local semantic frame \(\mathcal{F}(v) \cong \mathbb{R}^3\). Hover a node to highlight its stalk and outgoing restriction maps.

§ 7

Query Pipeline

NoVectDB retrieval is a deterministic multi-stage process. Click Next Step to walk through Algorithm 1.

Algorithm 1 — NoVectDB Query Pipeline
Require: Query q, graph G, sheaf F, parameters k, α, β, γ
1: I ← IntentParse(q) ▷ LLM slot extraction
2: S₀ ← DeterministicLookup(G, I) ▷ exact edge match
3: if |S₀| ≥ k then
4:     return RankBySheaf(S₀, F, q)
5: end if
6: S₁ ← GeodesicBFS(G, I.anchor, h_max) ▷ type-filtered BFS
7: S₂ ← TopoExpand(S₁, T, β) ▷ topological neighbourhood
8: S₃ ← SheafCompose(S₂, F, γ) ▷ composition residual
9: return Rank(S₃, d_NoVectDB)[:k]

Query Complexity

\[O\!\left(|V_{\text{local}}| + |E_{\text{local}}|\log|E_{\text{local}}| + l^3\right)\]

where \(|V_{\text{local}}|, |E_{\text{local}}|\) are the sizes of the explored subgraph and \(l\) is the landmark count for PH computation (typically \(l=50\), costing only \(\sim 1.25 \times 10^5\) operations).

§ 8

Preliminary Results

Evaluated on 500 relational queries across three task classes over a technical document corpus (industrial pipe specifications, medical ontologies, legal case chains).

Precision@5 by System

CookiX (α=0.4)
0.830
GraphRAG
0.583
Pinecone
0.457
Chroma
0.437

Precision@5 by Task

SystemTask A
Single-hop
Task B
Multi-hop
Task C
Contradiction
Avg
Chroma0.720.310.280.437
Pinecone0.740.330.300.457
GraphRAG0.780.520.450.583
CookiX0.910.820.760.830

CookiX: 2.4× improvement on Task B · 2.6× on Task C vs vector-only baselines

Query Latency (100K objects)

Systemp₅₀ (ms)p₉₉ (ms)
Chroma1245
Pinecone832
CookiX1867

CookiX trades modest latency overhead (PH computation) for substantially higher precision. Graph traversal itself adds negligible cost.

§ 9

CookiX Architecture

CookiX is the reference implementation of NoVectDB — a document-oriented topological database analogous to how MongoDB realised the NoSQL paradigm. Five core subsystems:

1
Ingestor
Accepts raw text, pre-computed embeddings, or structured documents. Uses a small LLM (3B-parameter instruction model) for relation extraction — identifying typed edges. Simultaneously computes persistent homology via Landmark Vietoris–Rips complex in \(O(l^3)\) time.
2
Manifold Store
Persistent storage for the graph \(G\), sheaf sections \(\mathcal{F}\), and metadata. Memory-mapped adjacency list (inspired by sled/redb in Rust) with type-indexed edge lookups in \(O(1)\) amortised time.
3
TopoIndex
Approximate nearest-neighbour index over topological signatures \(T\). Adapts the HNSW algorithm to operate on persistence diagram distances (Wasserstein) rather than Euclidean distances.
4
Query Engine
Implements Algorithm 1: IntentParse → DeterministicLookup → GeodesicBFS → TopoExpand → SheafCompose → Rank. Deterministic typed-edge traversal at its core.
5
API / SDK (Python + Rust)
MongoDB-like document interface:
db = cookix.connect("mydb")
db.insert({"text": "...", "edges": [...]})
results = db.query(
    "Is A compatible with B?",
    k=5, mode="reasoning"
)
§ 10

Paradigm Comparison

How NoVectDB positions against the landscape of existing retrieval paradigms:

Property VectorDB GraphDB GraphRAG NoVectDB
Typed relations × partial
Topological signature × × ×
Sheaf composition × × ×
Interpretable path × partial
Precision collapse yes N/A yes immune
Multi-hop reasoning weak strong medium strong
Sub-linear ANN ×
Document-oriented × ×
Key distinction from GraphRAG: Microsoft's GraphRAG builds a graph from LLM-extracted entities but still uses vector similarity to traverse it. NoVectDB eliminates vector similarity at query time entirely — traversal is via deterministic typed edges and topological signatures.
§0Abstract
§1Introduction
§2Problem
§3Knowledge Objects
§4Distance
§5Homology
§6Sheaves
§7Algorithm
§8Results
§9Architecture
§10Conclusion
Loading PDF…
§ 0 Abstract
Stop
measuring
distances.
Start
understanding
adjacency.
𝕂
Typed
Relations
β
Persistent
Homology
Sheaf
Composition
NoVectDB · February 2026
TF-IDF — Bag of words
Word2Vec — Dense embeddings
BERT / GPT — Contextual vectors
RAG + VectorDB — Semantic search
NoVectDB — Topological relations ✦
📐
Semantic Gap
cos(king−man+woman)≈queen — but meaning lives in relations, not coordinates
🌌
Curse of Dimensionality
In ℝⁿ, all distances converge as n→∞ — retrieval collapses
🔒
Opacity
No interpretable path — just a ranked list of floats
NoVectDB resolves all three ↓
𝕂=(A,τ,κ,Σ)
AAttributes (typed key-value)
τType ∈ {Entity, Concept, Relation, Rule}
κHomology signature β=(β₀,β₁,β₂)
ΣSheaf stalk over open set U
𝕂₁
Entity
𝕂₂
Concept
𝕂₃
Rule
dT(𝕂ᵢ,𝕂ⱼ) = α·dR + β·dH + γ·dS
dR Relational
dH Homological
dS Sheaf
α + β + γ = 1  |  Adaptive weights per query
filtration ε → β₀ β₁ β₂
β₀ Componentsβ₁ Loopsβ₂ Voids
U₁ U₁∩U₂ U₂ ℱ(U₁)𝕂ᵢ𝕂ⱼ ρ(𝕂)restrictionmaps ℱ(U₂)𝕂ₖ𝕂ₗ ℱ(U₁∪U₂) ≅ ℱ(U₁) ×_{ℱ(U₁∩U₂)} ℱ(U₂)
01
Ingest
Parse entity → build 𝕂 = (A,τ,κ,Σ)
02
Homology
Vietoris–Rips filtration → β=(β₀,β₁,β₂)
03
Sheaf
Assign stalks ℱ(U) and restriction maps ρ
04
Query
dT=α·dR+β·dH+γ·dS → ranked path
05
Return
Interpretable relational path with full provenance
Retrieval Quality (MAP@10)
CookiX / NoVectDB
0.94
HNSW (Faiss)
0.79
GraphRAG
0.71
BM25
0.63
+19% MAP@10 over nearest vector baseline
Raw Text / Docs
KO Builder
Homology Engine
Sheaf Graph Store
dT Query Engine
Path Explainer
Ranked Results + Provenance
Semantic gap → resolved by typed directed relations
Precision collapse → immune; distance on discrete manifold
Opacity → full interpretable path at every query
"Just as NoSQL spawned MongoDB and Cassandra — NoVectDB could inspire the next generation of knowledge engines."
— / — ↓ scroll to explore
§ 11

Conclusion

NoVectDB presents a mathematically grounded paradigm for post-vector data management. By combining typed relational edges, persistent homology signatures, and sheaf-theoretic composition, it overcomes the fundamental limitations of flat metric spaces:

  • Semantic gap — resolved by typed directed relations
  • Precision collapse — immune; distance defined on discrete manifold
  • Retrieval opacity — full interpretable path at every query

Just as NoSQL spawned MongoDB, Cassandra, and Neo4j, NoVectDB could inspire specialised engines for legal reasoning, medical ontologies, and engineering knowledge bases. CookiX is the first, not the last.

"Stop measuring distances.
Start understanding adjacency."

Ahmed Hafdi · NoVectDB / CookiX · February 2026