get_indexed_file
Get Indexed File ·read-only · non-destructive · idempotent · closed-world
Retrieve the complete indexed file by source_file, assembled in source order.
Unlike search_content which caps per-source chunks at MAX_CHUNKS_PER_SOURCE (2),
this tool returns all chunks for an exact source_file path, joined in
chunk_index order. Use when you already know the canonical source_file (from
search_content or search_unified) and need the full document.
Returns: str
Source: backend/src/engram/mcp/tools/context.py
Parameters
Exact source_file value (e.g., ‘projects/engram/investment-memo.md’, ‘github:B3dmar/b3dmar-hq:projects/engram/investment-memo.md’).
Optional life-context filter (‘personal’ or ‘work’). When set, the tool rejects source_files whose path does not match the scope’s configured prefixes.
reindex
Reindex ·writes · non-destructive · idempotent · open-world
Re-index markdown files from CONTENT_ROOT into the document vector store.
Refreshes search_content results; does not touch memories.
Returns: str
Source: backend/src/engram/mcp/tools/context.py
Parameters
Optional specific file or directory to reindex (relative to CONTENT_ROOT). If not provided, reindexes everything.
search_content
Search Content ·read-only · non-destructive · idempotent · open-world
Semantic search over indexed documents (markdown files, GitHub repos) — not memories.
For memory/commitment search use search_memories instead.
A similarity_floor of 0.3 is applied by default (#2755): indexed
corpora contain enough generic text that an unrelated query (e.g. a
topic never mentioned in any indexed file) used to surface README
fragments at ~0.1 cosine similarity as top hits, which LLM callers then
treated as authoritative and confabulated answers from. Returning an
empty set is the correct behaviour — it means “no relevant matches”.
Set include_low_confidence=True to bypass the floor and inspect the
long tail (debugging, audit, or deliberate broad browse).
Each result row carries both raw similarity (float in [0,1]) and a
corpus-relative relevance label ("high" / "medium" /
"low"). Prefer branching on relevance — raw cosine values are
not portable across environments. high means “top of this query’s
distribution”, low means “tail noise for this query”.
Response shape: the raw function returns formatted markdown for human
consumption; when invoked via MCP, the transport layer attaches a
structuredContent payload matching SearchContentResponse
({results, total, query, index_size, sources_searched, filters_applied}) so agent hosts can distinguish “unseeded index”
(index_size == 0) from “valid query, no match”
(index_size > 0 and total == 0) without a second round-trip.
Returns: str
Source: backend/src/engram/mcp/tools/context.py
Parameters
Natural language search query
Max results to return (default 8)
Optional filter by path substring (e.g., ‘career/’, ‘daily-notes/’)
Optional life context filter (‘personal’ or ‘work’)
Filter by source (‘github’ for repo content, ‘local’ for local files)
Drop results whose similarity is below this threshold, applied before ordering. Range [0.0, 1.0], default 0.3. Zero disables the filter. Ignored when
include_low_confidence is True.If True, bypass
similarity_floor and return the full tail including weak matches. Use for debugging / deliberate broad browse; not recommended for agent responses.