The Brain — memory layer architecture

// 01ingest

One pipeline. Every source.

Connect Gmail, Drive, Calendar, Slack, GitHub — or wire up your own. Brains backfills history, then keeps it fresh — real-time for filesystem changes, polled for inboxes, push for webhooks. Every event is content-hash deduped with a 24-hour window so you never ingest the same thing twice.

Three intake paths, one event shape.

Every source emits the same event: a kind, a URI, a content hash, an ingested-at timestamp. The supervisor dedupes on the hash and dispatches downstream jobs for chunking, embedding, and entity extraction.

Real-time file-watcher for any local folder you point at it
Polled inbox for email or a shared drive, with retry on failure
Webhook receiver for integration push, authenticated per source
Native pulls on a per-source cadence — Gmail, Drive, Calendar, Slack, GitHub, and more

// sourcesLIVE

Gmailwebhook

Drivepoll · 5m

Calendarpoll · 15m

Slackwebhook

GitHubwebhook

Fileschokidar

↓ ingestion daemon ↓

✓Validate event + content hash2ms

✓Dedup against 24h window1ms

✓Write page row · stamp provenance11ms

✓Dispatch chunk + embed jobs3ms

// 02store

Typed pages. Not a blob.

Every ingested item becomes a page — a structured object with frontmatter, a normalized body, an extracted timeline, and full provenance. Pages are the unit of retrieval, sharing, and reasoning. The schema is open: extend it with your own page kinds via schema packs.

inbox/2026-06-19-acme-renewal email

// frontmatter

from:"sarah@acme.com" subject:"Re: renewal terms" date:2026-06-19T14:32 thread:acme-renewal-2026 labels:["inbox", "deals/acme"]

// compiled_truth

Sarah agreed to revised pricing by Wed. You committed to legal-review draft + a follow-up call Thu. Deal blocker: Acme legal review on enterprise tier.

// timeline

2026-06-19 · event_date · derived from email Date header

// provenance

source_kind: webhook
source_uri: gmail:msg/CABc...DEF
ingested_at: 2026-06-19T14:33:08Z

One page. Many indexes.

Each page is chunked into smaller passages, and every passage is indexed for vector similarity and full-text search in parallel. Image-bearing pages get a multimodal index too — so a query like "the dashboard mockup" finds the actual screenshot, not just text near it.

title — derived or explicit
compiled_truth — normalized body, 2× retrieval boost
timeline — extracted temporal spine (event date → date → published → filename)
frontmatter — open-ended; ranking uses structured date fields
page_kind — markdown · code · image
generation — per-page snapshot counter for cache invalidation

// 03connect

A knowledge graph, extracted for free.

Brains walks your pages and writes typed edges — people, companies, deals, events — connected by relationships like attended, works_at, invested_in, founded, advises. The graph powers entity-aware retrieval, "find experts" walks, and trajectory queries across time — so the AI can answer "who connects me to X?" or "what did Sarah work on before joining Y?"

// knowledge graphlive

person

event

company

deal

Entities + typed edges, written as you ingest.

When a page lands, brains scans it for entity patterns (canonical company list, person heuristics, link extraction) and writes typed edges to your graph. Every edge is joinable, walkable, auditable.

Typed edges — the relationship kind is part of the schema
Walked at retrieval — find_experts, find_trajectory, code-graph expansion
Auditable — every edge points back to a source page; nothing is hallucinated

// 04retrieve

Hybrid retrieval, in milliseconds.

Brains answers a query by fusing three signals — vector embeddings, full-text search, and a knowledge graph walk — with Reciprocal Rank Fusion. The fused list is re-scored with boosts for compiled-truth, backlinks, salience, and recency, then optionally reranked by a cross-encoder. Typical brain returns in 20–50 ms.

One query, three signals, one ranked list.

Retrieval lanes run in parallel against the same chunk corpus. Reciprocal Rank Fusion (K=60) merges per-rank scores from each lane; boosts then nudge for known-good signals (compiled-truth gets 2.0×, recent dailies decay aggressively, evergreen concepts stay flat). An optional cross-encoder rerank polishes precision on hard queries.

Vector — cosine similarity on 1536-dim text embeddings
Full-text — weighted, stemmed, exact-match aware
Graph — typed edges (attended, works_at, invested_in) walked at depth 2
Boosts — compiled-truth, backlinks, salience, recency per page kind
Optional rerank — cross-encoder pass over the top N

// retrieval pipeline~32ms

"what did Sarah say about partnership last week — and what did I commit to?"

▸ vector cosine · 0.92

▸ keyword "partnership"

▸ graph Sarah → Acme

↓ RRF (K=60) · boosts ↓

2.0×compiled-truth +8%backlinks +12%salience recency3d decay

★ inbox/2026-06-19-acme-renewal 94%

// 05private

Your data. Your keys. Your brain.

Brains is architected so your memory never leaves your trust boundary by accident. Row-level security scopes pages per source; OAuth clients see only the sources their token allows; remote MCP callers get a privacy-stripped view that hides anything fenced as private.

// row-level security

Scoped at the storage layer.

Every read carries a sourceId resolved from the caller's token. Cross-source enumeration is impossible from the API layer, not just discouraged.

// scoped tokens

OAuth clients see what you say.

Tokens carry an allowedSources[] list. A token scoped to ['shared'] can never read your private brain — even if it asks the right question.

// private fences

Remote callers get a stripped view.

Remote MCP requests (ctx.remote === true) skip private facts and per-token allow-lists entirely. Local CLI sees full fences. The boundary is architectural.

// never trains

Your pages don't train anyone's model.

Brains never sends user pages to LLM providers for training. Embeddings + synthesis calls run on your API keys, with the prompt you can audit. Bring your own model.

// soft-delete

Deletes are reversible for 72 hours.

Pages marked deleted_at stay recoverable with include_deleted: true. After the grace window autopilot hard-deletes — no zombie rows.

// generation clock

Caches that can't go stale.

A two-tier generation clock (global + per-page) invalidates the semantic query cache the instant a page changes. Hot reads stay sub-millisecond; you never see yesterday's answer.

A real memory layer. Not a chat log.