Brains turns your inbox, drive, calendar, calls, and conversations into a typed, queryable substrate — indexed by vector embeddings, full-text search, and a knowledge graph, with a background process that keeps it organized.
Connect Gmail, Drive, Calendar, Slack, GitHub — or wire up your own. Brains backfills history, then keeps it fresh — real-time for filesystem changes, polled for inboxes, push for webhooks. Every event is content-hash deduped with a 24-hour window so you never ingest the same thing twice.
Every source emits the same event: a kind, a URI, a content hash, an ingested-at timestamp. The supervisor dedupes on the hash and dispatches downstream jobs for chunking, embedding, and entity extraction.
Every ingested item becomes a page — a structured object with frontmatter, a normalized body, an extracted timeline, and full provenance. Pages are the unit of retrieval, sharing, and reasoning. The schema is open: extend it with your own page kinds via schema packs.
Each page is chunked into smaller passages, and every passage is indexed for vector similarity and full-text search in parallel. Image-bearing pages get a multimodal index too — so a query like "the dashboard mockup" finds the actual screenshot, not just text near it.
Brains walks your pages and writes typed edges — people, companies, deals, events — connected by relationships like attended, works_at, invested_in, founded, advises. The graph powers entity-aware retrieval, "find experts" walks, and trajectory queries across time — so the AI can answer "who connects me to X?" or "what did Sarah work on before joining Y?"
When a page lands, brains scans it for entity patterns (canonical company list, person heuristics, link extraction) and writes typed edges to your graph. Every edge is joinable, walkable, auditable.
find_experts, find_trajectory, code-graph expansionBrains answers a query by fusing three signals — vector embeddings, full-text search, and a knowledge graph walk — with Reciprocal Rank Fusion. The fused list is re-scored with boosts for compiled-truth, backlinks, salience, and recency, then optionally reranked by a cross-encoder. Typical brain returns in 20–50 ms.
Retrieval lanes run in parallel against the same chunk corpus. Reciprocal Rank Fusion (K=60) merges per-rank scores from each lane; boosts then nudge for known-good signals (compiled-truth gets 2.0×, recent dailies decay aggressively, evergreen concepts stay flat). An optional cross-encoder rerank polishes precision on hard queries.
attended, works_at, invested_in) walked at depth 2Brains is architected so your memory never leaves your trust boundary by accident. Row-level security scopes pages per source; OAuth clients see only the sources their token allows; remote MCP callers get a privacy-stripped view that hides anything fenced as private.
Every read carries a sourceId resolved from the caller's token. Cross-source enumeration is impossible from the API layer, not just discouraged.
Tokens carry an allowedSources[] list. A token scoped to ['shared'] can never read your private brain — even if it asks the right question.
Remote MCP requests (ctx.remote === true) skip private facts and per-token allow-lists entirely. Local CLI sees full fences. The boundary is architectural.
Brains never sends user pages to LLM providers for training. Embeddings + synthesis calls run on your API keys, with the prompt you can audit. Bring your own model.
Pages marked deleted_at stay recoverable with include_deleted: true. After the grace window autopilot hard-deletes — no zombie rows.
A two-tier generation clock (global + per-page) invalidates the semantic query cache the instant a page changes. Hot reads stay sub-millisecond; you never see yesterday's answer.
Pages, indexes, graph, and autopilot — running on your data, with your keys.