Hybrid Search
Sigil’s retrieval pipeline runs in a single SQL query that fuses two search signals, re-ranks with cognitive-science-backed scoring, and returns results in ~33ms (p50 on local Postgres).
The two search signals
Section titled “The two search signals”Vector search (pgvector cosine)
Section titled “Vector search (pgvector cosine)”Your query is embedded using your chosen provider (OpenAI, Voyage, or Ollama). The resulting vector is compared against all stored fact and chunk embeddings using pgvector’s cosine distance operator (<->).
Vector search finds semantically similar results — it understands that “event delivery” and “message queue” are related even if neither word appears in the stored fact.
Keyword search (PostgreSQL tsvector BM25)
Section titled “Keyword search (PostgreSQL tsvector BM25)”The same query runs against a tsvector index on all stored fact text. PostgreSQL’s ts_rank_cd gives BM25-like scoring based on term frequency and inverse document frequency.
Keyword search finds exact term matches — “LISTEN/NOTIFY” retrieves facts that mention those exact words, which vector search might miss if the embedding model doesn’t treat them as distinctive.
Reciprocal Rank Fusion (RRF)
Section titled “Reciprocal Rank Fusion (RRF)”RRF fuses the two ranked lists into a single ranking:
rrf_score(d) = Σ 1 / (k + rank_i(d))where k=60 (standard default) and rank_i(d) is the document’s rank in list i. RRF is:
- Robust — not sensitive to score scale differences between vector (0–1) and keyword (0–∞) outputs
- Proven — consistently outperforms score averaging and max-score fusion in the IR literature
- Simple — one formula, no learned weights to tune
ACT-R activation re-ranking
Section titled “ACT-R activation re-ranking”After RRF, each result’s score is multiplied by its ACT-R activation:
activation(f) = ln(n_retrievals) × decay(t_last_retrieved)ACT-R (Adaptive Control of Thought–Rational) is a cognitive architecture from Anderson et al. that models human memory as a function of:
- Frequency — facts retrieved often have higher activation
- Recency — recent retrievals matter more than old ones (logarithmic decay)
In practice: if you retrieved a fact five times last week and once six months ago, ACT-R gives it high activation. A fact retrieved once, just now, gets moderate activation. A fact never retrieved gets low activation — it exists but isn’t salient.
Hebbian co-retrieval boost
Section titled “Hebbian co-retrieval boost”Facts that have been retrieved together previously get a small boost when they co-occur in results:
“Neurons that fire together wire together.” — Hebb’s rule
If “auth service” and “JWT jose library” facts are consistently retrieved together, they get a co-retrieval link. Future queries about auth inject both, even if only one is a strong match for the current query.
This models associative memory: context triggers related context, not just direct matches.
Pod-aware blending
Section titled “Pod-aware blending”The re-ranked results are then allocated across pod budgets (see Pods). Each pod kind has a declared hot-context slot limit. The blender:
- Iterates pods in priority order (vital → project → session → person → playbook)
- Takes up to
slotsfacts from each pod from the ranked list - Deduplicates across pods (a fact counted for session doesn’t also count for project)
This ensures diverse coverage — a single high-importance project pod doesn’t crowd out session-specific facts that are more relevant right now.
Running in one SQL query
Section titled “Running in one SQL query”The full pipeline — vector search, keyword search, RRF, ACT-R activation, pod attribution — runs in a single PostgreSQL CTE chain. No round-trips, no result materialization in Node.js. The SQL is in src/memory/search/hybrid-sql.js.
WITH vector_ranked AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <-> $1) AS rank FROM sigil_facts ORDER BY embedding <-> $1 LIMIT 100 ), keyword_ranked AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank_cd(tsv, query) DESC) AS rank FROM sigil_facts, plainto_tsquery('english', $2) query WHERE tsv @@ query ), rrf AS ( SELECT COALESCE(v.id, k.id) AS id, (COALESCE(1.0 / (60 + v.rank), 0) + COALESCE(1.0 / (60 + k.rank), 0)) AS score FROM vector_ranked v FULL OUTER JOIN keyword_ranked k ON v.id = k.id ), activated AS ( SELECT r.id, r.score * actR_activation(f.last_retrieved_at, f.retrieval_count) AS final_score FROM rrf r JOIN sigil_facts f ON f.id = r.id )SELECT * FROM activatedORDER BY final_score DESCLIMIT $3(Simplified for readability — the actual query includes pod attribution, Hebbian boost, and temporal validity filters.)
Benchmark
Section titled “Benchmark”LongMemEval oracle split, n=100, OpenAI text-embedding-3-large:
| Metric | Result |
|---|---|
| R@1 | 100% |
| R@3 | 100% |
| R@10 | 100% |
| p50 search latency | 33ms |
| p95 search latency | 61ms |
Oracle split means each query has a known correct answer. The retrieval ceiling at this scale is effectively the quality of the embedding model. Architectural differences (hybrid vs. pure vector, ACT-R vs. no re-ranking) show more clearly at 10K+ chunks where score compression makes ranking differences larger.
Full methodology and reproducible scripts: eval/longmemeval/.