§1 Problem
§3 Knowledge Objects
§4 Distance
§5 Topology
§6 Sheaves
§3D 3D Visualization
§8 Results
§9 Architecture
§ Paper Explorer
§11 Conclusion

Research Paper · February 2026

NoVectDB

A Topological–Relational Paradigm for Post-Vector Data Management — CookiX Reference Implementation on Dynamic Graph Manifolds with Sheaf-Theoretic Composition

Ahmed Hafdi · Independent Research, CookiX Project · February 2026

Abstract

Vector databases suffer from three fundamental limitations rooted in the geometry of flat Euclidean space: semantic gap (distance ≠ meaning), precision collapse (concentration of measure in high dimensions), and opacity (no interpretable retrieval path).

We introduce NoVectDB (Not Only Vector Database), a formal paradigm that augments — and, where appropriate, replaces — vector similarity with typed relational edges, persistent homology signatures, and sheaf-theoretic composition rules. The reference engine CookiX demonstrates a 2.4× precision improvement on relational retrieval tasks over leading vector databases while maintaining sub-linear query scaling.

§ 1

Why Vector Databases
Are Not Enough

The dominant pattern for grounding large language models in external knowledge is Retrieval-Augmented Generation (RAG), which relies almost exclusively on vector databases: systems that embed text into \(\mathbb{R}^n\) and retrieve neighbours by Euclidean or cosine distance. Despite widespread adoption, this approach has three well-documented failure modes.

1. Concentration of Measure

As the embedding dimension \(n \to \infty\), all pairwise distances converge to \(\sqrt{2}\) in probability:

\[\Pr\!\left[\left|\|X_i - X_j\| - \sqrt{2}\right| > \varepsilon\right] \leq 2\exp\!\left(-\frac{cn\varepsilon^2}{4}\right)\]

The ratio \(d_{\max}/d_{\min} \to 1\). Every document is equally distant from every other — precision collapses.

2. Semantic Gap

For any non-symmetric, non-transitive relation \(\rho\), a semantic gap exists: no single metric can faithfully represent it. A metric space is always symmetric (\(d(a,b)=d(b,a)\)) and obeys the triangle inequality — but knowledge relations are not.

The verb "prevents" is directed and cannot be encoded by any inner product or \(\ell_p\) norm.

3. Retrieval Opacity

Vector retrieval returns a scalar distance. There is no path, no justification, no typed connection. When a RAG system answers incorrectly, diagnosing why a particular chunk was retrieved is intractable — one can only inspect floating-point inner products.

VECTOR RESPONSE

score: 0.872
chunk_id: 4f2a…
// why? unknown

NOVECTDB RESPONSE

umbrella
  →[prevents] rain
  →[wets] coat
path_score: 0.97

§ 2

The Umbrella Problem

Five concepts: raincoat, rain, water, umbrella, storm. A sentence-transformer maps them to nearby points in \(\mathbb{R}^n\) — all with similar cosine distance to the query "What prevents the coat from getting wet?"

The correct answer — umbrella — requires understanding a directed causal chain. No inner product can encode the verb prevents.

The black node (umbrella) is identified by deterministic typed-edge traversal — not proximity. Vector retrieval cannot distinguish this causal chain from irrelevant neighbours with similar embeddings.

§ 3

The Knowledge Object

The atomic unit of NoVectDB storage is the Knowledge Object — a quadruple that captures every dimension of meaning:

Definition 3.1 \[\mathcal{K} = (V,\, E,\, T,\, S)\]

Embedding Vector

An optional vector \(V \in \mathbb{R}^n\) — a legacy embedding or content signature. The vector component is optional, making NoVectDB strictly more general than vector databases. Pure topological–relational storage is a valid configuration.

Relational Edges

A set \(E = \{(r_i, \mathcal{K}_i, w_i)\}\) of typed, directed, weighted edges. Relation types \(r_i \in \mathcal{R}\) come from a controlled vocabulary: causes, is_a, part_of, prevents, contradicts, example_of…

Topological Signature

A vector \(T \in \mathbb{R}^m\) derived from persistent homology of the local neighbourhood. It captures the shape of how this concept relates to its context — stable under small graph perturbations.

Sheaf Section

A functor \(S: \mathrm{Star}(\mathcal{K}) \to \mathbf{Vect}\) assigning a linear transformation to each edge — defining how this object's meaning transforms in the context of adjacent objects. Enables compositional, global-consistent retrieval.

§ 4

The NoVectDB
Composite Distance

For two Knowledge Objects \(\mathcal{K}_a, \mathcal{K}_b\) in the Dynamic Graph Manifold \(\mathcal{M}\), the composite distance blends three orthogonal signals:

Definition 3.4 — NoVectDB Composite Distance \[d_{\text{NoVectDB}}(\mathcal{K}_a, \mathcal{K}_b) = \alpha \cdot d_{\text{geo}}(a,b) + \beta \cdot \bigl(1 - \mathrm{TVS}(T_a, T_b)\bigr) + \gamma \cdot \|S_a \circ_\pi S_b\|\]

The three components

\(\alpha \cdot d_{\text{geo}}(a,b)\) — Graph Geodesic

Minimum-weight path in the typed relation graph \(G\). Captures structural proximity in the knowledge network.

\(\beta \cdot (1 - \mathrm{TVS})\) — Topological Similarity

\(\mathrm{TVS}(T_a,T_b) = \exp(-\lambda \cdot W_2(\mathrm{Dgm}_a, \mathrm{Dgm}_b))\) — the 2-Wasserstein distance between persistence diagrams. Captures shape similarity of local neighbourhoods.

\(\gamma \cdot \|S_a \circ_\pi S_b\|\) — Sheaf Residual

Measures how consistently the objects' contexts compose along path \(\pi\). Low residual = highly compatible semantics.

Adjust the mixing coefficients

The coefficients must satisfy \(\alpha + \beta + \gamma = 1\). Drag any slider — the others auto-adjust.

\(\alpha\) 0.40 geo

\(\beta\) 0.35 topo

\(\gamma\) 0.25 sheaf

α + β + γ = 1.00 ✓

Theorem 3.5 — Quasi-metric (\(\mathcal{K},\, d_{\text{NoVectDB}}\)) is a quasi-metric space: \(d(K_a,K_a)=0\), \(d(K_a,K_b)\geq 0\), and the triangle inequality holds up to a bounded sheaf consistency error \(\varepsilon\). When the sheaf is globally consistent, \(\varepsilon = 0\) and the space is a true metric space.

§ 5

Persistent Homology
Signatures

Persistent homology studies the shape of data across multiple scales. Given a filtration of simplicial complexes built from the local neighbourhood graph, it tracks which topological features (connected components, loops, voids) are born and die as the scale parameter \(\varepsilon\) grows.

Definition 4.1 — Topological Signature \[T(\mathcal{K}) = \mathrm{Vectorise}\!\bigl(\mathrm{Barcode}(\mathrm{VR}(N_r(\mathcal{K})))\bigr) \in \mathbb{R}^m\] where \(\mathrm{VR}\) is the Vietoris–Rips complex on the \(r\)-hop neighbourhood, and \(\mathrm{Vectorise}\) maps persistence barcodes to stable vectors via persistence landscapes.

Filtration Animation

As \(\varepsilon\) grows, edges form between nearby nodes. Loops (1-cycles) appear and disappear — their persistence \(= (\text{birth}, \text{death})\) encodes topological structure.

Persistence Barcode

Each bar represents a topological feature: its left endpoint is when it appears, its right endpoint when it disappears. Long bars = significant topology.

H₀

connected components

H₀

H₁

1-cycles (loops)

H₁

Proposition 4.2 — Stability \[\|T(\mathcal{K};\, G) - T(\mathcal{K};\, G')\|_\infty \leq C \cdot w_{\max}\] Adding a single edge of weight \(w_{\max}\) changes the signature by at most \(C \cdot w_{\max}\). The topological signature is provably stable under small graph perturbations.

§ 6

Sheaf-Theoretic
Composition

A cellular sheaf on the knowledge graph assigns a vector space (a stalk) to each Knowledge Object and a linear map to each relation edge. These maps encode how meaning transforms as you traverse a relation.

Definition 5.1 — Cellular Sheaf \[\mathcal{F}(v) \cong \mathbb{R}^{d_v}\quad\text{(stalk at vertex } v\text{)}\] \[\mathcal{F}_{e \trianglerighteq v}: \mathcal{F}(v) \to \mathcal{F}(e)\quad\text{(restriction map on edge } e\text{)}\]

Local-to-Global Consistency

The sheaf Laplacian \(\mathcal{L}_\mathcal{F} = B_\mathcal{F}^\top B_\mathcal{F}\) captures whether local sections agree globally. Its kernel encodes globally consistent interpretations.

stalk F(umbrella)

u ∈ ℝ³

stalk F(rain)

r ∈ ℝ³

stalk F(coat)

c ∈ ℝ³

↓ F_{prevents}
restriction map

↓ F_{wets}
restriction map

global section x ∈ ⊕F(v)

𝓛_F · x = 0 ⟺ globally consistent

Theorem 5.3 — Global Consistency \[\mathcal{L}_\mathcal{F} \cdot x = 0 \iff \text{local sections agree on all overlapping edges}\] \(\dim \ker(\mathcal{L}_\mathcal{F})\) equals the number of independent globally consistent interpretations. This gives NoVectDB a principled answer to the question: "Do these facts agree?"

Compositional Retrieval — Equation 2 \[S_\pi = \mathcal{F}_{e_k} \circ \cdots \circ \mathcal{F}_{e_1}\] \[\|S_a \circ_\pi S_b\| = \|S_\pi(x_a) - x_b\|_2\] The composition residual measures how well \(\mathcal{K}_a\)'s semantics arrive at \(\mathcal{K}_b\) via path \(\pi\). Small residual = high semantic compatibility along the relational chain.

§ 3D

Interactive 3D
Visualization

Explore the two core mathematical structures of NoVectDB in three dimensions. Drag to orbit, scroll to zoom.

A cellular sheaf assigns a vector space (stalk) to each Knowledge Object and a linear map (restriction) to each edge. The wireframe cube at each node represents its local semantic frame \(\mathcal{F}(v) \cong \mathbb{R}^3\). Hover a node to highlight its stalk and outgoing restriction maps.

§ 7

Query Pipeline

NoVectDB retrieval is a deterministic multi-stage process. Click Next Step to walk through Algorithm 1.

Algorithm 1 — NoVectDB Query Pipeline

Require: Query q, graph G, sheaf F, parameters k, α, β, γ

1: I ← IntentParse(q) ▷ LLM slot extraction

2: S₀ ← DeterministicLookup(G, I) ▷ exact edge match

3: if |S₀| ≥ k then

4: return RankBySheaf(S₀, F, q)

5: end if

6: S₁ ← GeodesicBFS(G, I.anchor, h_max) ▷ type-filtered BFS

7: S₂ ← TopoExpand(S₁, T, β) ▷ topological neighbourhood

8: S₃ ← SheafCompose(S₂, F, γ) ▷ composition residual

9: return Rank(S₃, d_NoVectDB)[:k]

Query Complexity

\[O\!\left(|V_{\text{local}}| + |E_{\text{local}}|\log|E_{\text{local}}| + l^3\right)\] where \(|V_{\text{local}}|, |E_{\text{local}}|\) are the sizes of the explored subgraph and \(l\) is the landmark count for PH computation (typically \(l=50\), costing only \(\sim 1.25 \times 10^5\) operations).

§ 8

Preliminary Results

Evaluated on 500 relational queries across three task classes over a technical document corpus (industrial pipe specifications, medical ontologies, legal case chains).

Precision@5 by System

CookiX (α=0.4)

0.830

GraphRAG

0.583

Pinecone

0.457

Chroma

0.437

Precision@5 by Task

System	Task A Single-hop	Task B Multi-hop	Task C Contradiction	Avg
Chroma	0.72	0.31	0.28	0.437
Pinecone	0.74	0.33	0.30	0.457
GraphRAG	0.78	0.52	0.45	0.583
CookiX	0.91	0.82	0.76	0.830

CookiX: 2.4× improvement on Task B · 2.6× on Task C vs vector-only baselines

Query Latency (100K objects)

System	p₅₀ (ms)	p₉₉ (ms)
Chroma	12	45
Pinecone	8	32
CookiX	18	67

CookiX trades modest latency overhead (PH computation) for substantially higher precision. Graph traversal itself adds negligible cost.

§ 9

CookiX Architecture

CookiX is the reference implementation of NoVectDB — a document-oriented topological database analogous to how MongoDB realised the NoSQL paradigm. Five core subsystems:

Ingestor

Accepts raw text, pre-computed embeddings, or structured documents. Uses a small LLM (3B-parameter instruction model) for relation extraction — identifying typed edges. Simultaneously computes persistent homology via Landmark Vietoris–Rips complex in \(O(l^3)\) time.

↓

Manifold Store

Persistent storage for the graph \(G\), sheaf sections \(\mathcal{F}\), and metadata. Memory-mapped adjacency list (inspired by sled/redb in Rust) with type-indexed edge lookups in \(O(1)\) amortised time.

↓

TopoIndex

Approximate nearest-neighbour index over topological signatures \(T\). Adapts the HNSW algorithm to operate on persistence diagram distances (Wasserstein) rather than Euclidean distances.

↓

Query Engine

Implements Algorithm 1: IntentParse → DeterministicLookup → GeodesicBFS → TopoExpand → SheafCompose → Rank. Deterministic typed-edge traversal at its core.

↓

API / SDK (Python + Rust)

MongoDB-like document interface:

db = cookix.connect("mydb")
db.insert({"text": "...", "edges": [...]})
results = db.query(
    "Is A compatible with B?",
    k=5, mode="reasoning"
)

§ 10

Paradigm Comparison

How NoVectDB positions against the landscape of existing retrieval paradigms:

Property	VectorDB	GraphDB	GraphRAG	NoVectDB
Typed relations	×	✓	partial	✓
Topological signature	×	×	×	✓
Sheaf composition	×	×	×	✓
Interpretable path	×	✓	partial	✓
Precision collapse	yes	N/A	yes	immune
Multi-hop reasoning	weak	strong	medium	strong
Sub-linear ANN	✓	×	✓	✓
Document-oriented	✓	×	×	✓

Key distinction from GraphRAG: Microsoft's GraphRAG builds a graph from LLM-extracted entities but still uses vector similarity to traverse it. NoVectDB eliminates vector similarity at query time entirely — traversal is via deterministic typed edges and topological signatures.

§0Abstract

§1Introduction

§2Problem

§3Knowledge Objects

§4Distance

§5Homology

§6Sheaves

§7Algorithm

§8Results

§9Architecture

§10Conclusion

Loading PDF…

§ 0 Abstract

Stop

measuring

distances.

Start

understanding

adjacency.

𝕂

Typed
Relations

Persistent
Homology

ℱ

Sheaf
Composition

NoVectDB · February 2026

TF-IDF — Bag of words

Word2Vec — Dense embeddings

BERT / GPT — Contextual vectors

RAG + VectorDB — Semantic search

NoVectDB — Topological relations ✦

📐

Semantic Gap

cos(king−man+woman)≈queen — but meaning lives in relations, not coordinates

🌌

Curse of Dimensionality

In ℝⁿ, all distances converge as n→∞ — retrieval collapses

🔒

Opacity

No interpretable path — just a ranked list of floats

NoVectDB resolves all three ↓

𝕂=(A,τ,κ,Σ)

AAttributes (typed key-value)

τType ∈ {Entity, Concept, Relation, Rule}

κHomology signature β=(β₀,β₁,β₂)

ΣSheaf stalk over open set U

𝕂₁
Entity

𝕂₂
Concept

𝕂₃
Rule

d_T(𝕂ᵢ,𝕂ⱼ) = α·d_R + β·d_H + γ·d_S

d_R Relational

d_H Homological

d_S Sheaf

α + β + γ = 1 | Adaptive weights per query

β₀ Componentsβ₁ Loopsβ₂ Voids

Ingest

Parse entity → build 𝕂 = (A,τ,κ,Σ)

↓

Homology

Vietoris–Rips filtration → β=(β₀,β₁,β₂)

↓

Sheaf

Assign stalks ℱ(U) and restriction maps ρ

↓

Query

d_T=α·d_R+β·d_H+γ·d_S → ranked path

↓

05
Return
Interpretable relational path with full provenance

Retrieval Quality (MAP@10)

CookiX / NoVectDB

0.94

HNSW (Faiss)

0.79

GraphRAG

0.71

BM25

0.63

+19% MAP@10 over nearest vector baseline

Raw Text / Docs

↓

KO Builder

Homology Engine

↓

Sheaf Graph Store

↓

d_T Query Engine

Path Explainer

↓

Ranked Results + Provenance

✓Semantic gap → resolved by typed directed relations

✓Precision collapse → immune; distance on discrete manifold

✓Opacity → full interpretable path at every query

"Just as NoSQL spawned MongoDB and Cassandra — NoVectDB could inspire the next generation of knowledge engines."

— / — ↓ scroll to explore

§ 11

Conclusion

NoVectDB presents a mathematically grounded paradigm for post-vector data management. By combining typed relational edges, persistent homology signatures, and sheaf-theoretic composition, it overcomes the fundamental limitations of flat metric spaces:

Semantic gap — resolved by typed directed relations
Precision collapse — immune; distance defined on discrete manifold
Retrieval opacity — full interpretable path at every query

Just as NoSQL spawned MongoDB, Cassandra, and Neo4j, NoVectDB could inspire specialised engines for legal reasoning, medical ontologies, and engineering knowledge bases. CookiX is the first, not the last.

"Stop measuring distances.
Start understanding adjacency."

Ahmed Hafdi · NoVectDB / CookiX · February 2026