Skip to content

Hybrid Search

Sigil’s retrieval pipeline runs in a single SQL query that fuses two search signals, re-ranks with cognitive-science-backed scoring, and returns results in ~33ms (p50 on local Postgres).

Your query is embedded using your chosen provider (OpenAI, Voyage, or Ollama). The resulting vector is compared against all stored fact and chunk embeddings using pgvector’s cosine distance operator (<->).

Vector search finds semantically similar results — it understands that “event delivery” and “message queue” are related even if neither word appears in the stored fact.

The same query runs against a tsvector index on all stored fact text. PostgreSQL’s ts_rank_cd gives BM25-like scoring based on term frequency and inverse document frequency.

Keyword search finds exact term matches — “LISTEN/NOTIFY” retrieves facts that mention those exact words, which vector search might miss if the embedding model doesn’t treat them as distinctive.

RRF fuses the two ranked lists into a single ranking:

rrf_score(d) = Σ 1 / (k + rank_i(d))

where k=60 (standard default) and rank_i(d) is the document’s rank in list i. RRF is:

  • Robust — not sensitive to score scale differences between vector (0–1) and keyword (0–∞) outputs
  • Proven — consistently outperforms score averaging and max-score fusion in the IR literature
  • Simple — one formula, no learned weights to tune

After RRF, each result’s score is multiplied by its ACT-R activation:

activation(f) = ln(n_retrievals) × decay(t_last_retrieved)

ACT-R (Adaptive Control of Thought–Rational) is a cognitive architecture from Anderson et al. that models human memory as a function of:

  • Frequency — facts retrieved often have higher activation
  • Recency — recent retrievals matter more than old ones (logarithmic decay)

In practice: if you retrieved a fact five times last week and once six months ago, ACT-R gives it high activation. A fact retrieved once, just now, gets moderate activation. A fact never retrieved gets low activation — it exists but isn’t salient.

Facts that have been retrieved together previously get a small boost when they co-occur in results:

“Neurons that fire together wire together.” — Hebb’s rule

If “auth service” and “JWT jose library” facts are consistently retrieved together, they get a co-retrieval link. Future queries about auth inject both, even if only one is a strong match for the current query.

This models associative memory: context triggers related context, not just direct matches.

The re-ranked results are then allocated across pod budgets (see Pods). Each pod kind has a declared hot-context slot limit. The blender:

  1. Iterates pods in priority order (vital → project → session → person → playbook)
  2. Takes up to slots facts from each pod from the ranked list
  3. Deduplicates across pods (a fact counted for session doesn’t also count for project)

This ensures diverse coverage — a single high-importance project pod doesn’t crowd out session-specific facts that are more relevant right now.

The full pipeline — vector search, keyword search, RRF, ACT-R activation, pod attribution — runs in a single PostgreSQL CTE chain. No round-trips, no result materialization in Node.js. The SQL is in src/memory/search/hybrid-sql.js.

WITH
vector_ranked AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <-> $1) AS rank
FROM sigil_facts
ORDER BY embedding <-> $1 LIMIT 100
),
keyword_ranked AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank_cd(tsv, query) DESC) AS rank
FROM sigil_facts, plainto_tsquery('english', $2) query
WHERE tsv @@ query
),
rrf AS (
SELECT COALESCE(v.id, k.id) AS id,
(COALESCE(1.0 / (60 + v.rank), 0) + COALESCE(1.0 / (60 + k.rank), 0)) AS score
FROM vector_ranked v FULL OUTER JOIN keyword_ranked k ON v.id = k.id
),
activated AS (
SELECT r.id, r.score * actR_activation(f.last_retrieved_at, f.retrieval_count) AS final_score
FROM rrf r JOIN sigil_facts f ON f.id = r.id
)
SELECT * FROM activated
ORDER BY final_score DESC
LIMIT $3

(Simplified for readability — the actual query includes pod attribution, Hebbian boost, and temporal validity filters.)

LongMemEval oracle split, n=100, OpenAI text-embedding-3-large:

MetricResult
R@1100%
R@3100%
R@10100%
p50 search latency33ms
p95 search latency61ms

Oracle split means each query has a known correct answer. The retrieval ceiling at this scale is effectively the quality of the embedding model. Architectural differences (hybrid vs. pure vector, ACT-R vs. no re-ranking) show more clearly at 10K+ chunks where score compression makes ranking differences larger.

Full methodology and reproducible scripts: eval/longmemeval/.