Similarity Isn't Relevance: How Draft Retrieves Context by Reasoning, Not Vectors

The Bottom Line

When an AI assistant needs context on a large codebase, the industry default is to embed everything and search by similarity: chunk the repo, vectorize the chunks, and at query time return the top-k nearest neighbors. The problem is baked into the method — similarity is not relevance. "Chunks whose vectors sit near your query" is a fuzzy proxy for "the sections that actually answer the task," and it fails in both directions: it surfaces lookalikes that don't help, and it misses the genuinely relevant section that happened to use different words.

Draft takes a different route. In its richer output mode, /draft:init emits a knowledge wiki — a tree of concept pages with a routing table at the root. When a skill needs context, it navigates that tree by reasoning, the way a human expert scans a table of contents: read the section descriptions, open only the branches that match the task, stop when the task is covered. No embeddings. No chunking. No similarity search. Retrieval is a reasoning step, and it's fully traceable.

Why Top-K Similarity Quietly Misleads

The standard retrieval pipeline looks reasonable until you watch it work on real engineering tasks:

Chunk the codebase — and hope the boundaries respect function and module scopes.
Embed every chunk with a model that was trained on prose, not your domain.
At query time, embed the prompt and pull the k nearest chunks.
Feed those chunks to the model and hope the right one is in there.

Each step trades precision for convenience. A query about "tightening the session-expiry check" pulls back every chunk that mentions sessions — the cache layer, three tests, a logging helper — because they're all semantically adjacent. The one module that actually owns the expiry invariant might rank fifth, or fall off the list entirely if it describes itself in terms of "token lifetimes" instead of "sessions."

The model then reasons over a bag of lookalikes. The retrieval step did no reasoning at all — it sorted by cosine distance. The thinking got deferred to the most expensive place possible: a context window full of near-misses.

Draft Builds a Tree, Then Reasons Over It

Draft's wiki mode decomposes a repo into one concept per file — subsystems, modules, features, entrypoints, APIs, data models — each grounded in the live code graph. The pages are organized into a shallow tree:

draft/
├── .ai-context.md         # INDEX ROOT: a short synopsis + the Concept Map
└── wiki/
    ├── index.md           # bundle root + the routing table
    ├── overview/          # system map, getting-started, glossary
    ├── systems/           # subsystems & modules (graph clusters)
    ├── features/          # user-facing capabilities spanning modules
    ├── reference/         # APIs, data models, dependencies, ADRs, runbooks
    └── entrypoints/       # binaries / mains / CLIs / handler roots

The load-bearing piece is the routing description on every concept. It isn't a summary — it's written to answer a single question: "should the agent open this page for the task at hand?" Those one-liners are collected into a Concept Map at the root, so the entire repo's structure is legible from one table before a single concept page is opened.

The Retrieval Loop

When a skill like /draft:implement or /draft:review loads context, it walks the tree instead of flattening it:

Frame the query. Pull the routing terms from the active task — the domain nouns, file paths, and the primary concern (data flow, API, security, performance).
Enter at the root. Read the synopsis. For a broad task — onboarding, an architecture overview — that's enough; retrieval terminates here without opening a single concept page.
Reason over the Concept Map. For a focused task, judge each routing description as a decision, not a string match: does opening this concept help this task? Descend the strong matches first; hold the maybes as a frontier.
Descend to leaves. Open the matching section index, repeat against its concepts, open the right pages. A concept page lists the exact source files it grounds — the precise set to read or edit — and its callers, for the next hop if the task spans them.
Stop on coverage. Terminate when the opened pages cover the task's routing terms, or at a small budget (≈5 pages, ~2 hops). The tree is shallow by construction, so this converges fast.

Every hop is a relevance judgment, not a similarity score. A concept whose description doesn't justify opening it is skipped even if a keyword happens to overlap — the exact failure mode that sinks top-k search.

Similarity vs. Reasoning, Side by Side

Vector / top-k retrieval	Draft's tree-search retrieval
Ranks by embedding distance	Selects by reasoning over routing descriptions
Chunks the repo (boundaries guessed)	One concept per file (boundaries from the graph)
Needs a vector DB + embedding model	Plain markdown + a routing table
Per-token indexing cost, re-embed on change	$0 — regenerated locally on refresh
Opaque: "these scored highest"	Traceable: the navigation path is recorded
Returns lookalikes; misses paraphrases	Returns what the task needs, by name

Retrieval You Can Audit

A cosine score is a dead end — you can't ask why a chunk ranked third. Tree-search retrieval is explainable by construction, because the path is the explanation:

Opened concepts — the pages selected, each with the one-line reason it was opened.
Grounded paths — the union of source files those pages cover: exactly what the task will touch.
Skipped frontier — the maybes that were held but not expanded, so a follow-up task can resume from them.

Instead of "loaded these five chunks," the agent can say "navigated to the auth pipeline and session store concepts because the task changes expiry handling." That trace is reviewable by a human and re-checkable on the next run.

When the Synopsis Is the Whole Answer

Not every task needs to descend. Broad questions — "how does this system fit together," "where do I start" — are answered by the synopsis at the index root, and retrieval stops there. Over-fetching concept pages for a broad task is treated as a mistake, not a default. Focused tasks pay only for the few pages they actually need. The cheap path stays cheap; the deep path stays precise.

And when a repo isn't emitted in wiki mode, Draft falls back to its compact single-file context with section-level relevance scoring — same principle, smaller surface. The retrieval strategy degrades gracefully; it never gates the work.

What This Means for the Three People Reading This

If you're an engineer: the context your assistant loads for a change is now a short, named set of concepts and the exact files they ground — not a grab-bag of similar-looking chunks. Fewer wrong files in the window means fewer wrong edits out of it.

If you're an engineering leader: retrieval becomes auditable. Every AI-assisted change can show which concepts informed it and why — a reviewable trail, with no embedding vendor in the loop.

If you work in a regulated or air-gapped environment: there's no vector database to provision, no embedding API to send code to, and nothing to re-index in the cloud. The wiki is markdown in your repo; the navigation is reasoning the model already does.

The Larger Argument

Embeddings have a place, and similarity search has a place. But the question an AI assistant actually asks of a codebase — "which parts of this system does my task touch?" — is a question about relevance, and relevance requires reasoning. Draft's bet is to make that reasoning a first-class retrieval step over a structure built for it: a navigable wiki, grounded in the code graph, where the model finds the right context the way a senior engineer would — by reading the map, not by measuring distances.

Try It

# Install Draft (Claude Code plugin)
npx @drafthq/draft install claude-code

cd your-repo
/draft:init                 # emits the knowledge wiki + Concept Map

# Then just work — context loading navigates the tree for you:
/draft:implement            # opens only the concepts the task touches
/draft:review               # routes to the affected subsystems by name

No vector store to stand up, no embeddings to pay for. The map is in your repo, and your assistant just got a lot better at reading it.