LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people’s experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There’s no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn’t just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you’ve read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You’re in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement
  • Research: going deep on a topic over weeks or months
  • Reading a book: building companion wiki (characters, themes, plot threads)
  • Business/team: internal wiki fed by Slack threads, meeting transcripts, project docs
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives

Architecture

Three layers:

  • Raw sources — immutable source documents (articles, papers, images, data files). The LLM reads from them but never modifies them.
  • The wiki — LLM-generated markdown files (summaries, entity pages, concept pages, comparisons, index, synthesis). The LLM owns this layer entirely.
  • The schema — a document (CLAUDE.md, AGENTS.md) that tells the LLM how the wiki is structured, conventions, and workflows.

Operations

  • Ingest — drop a source into raw/, tell LLM to process it. LLM reads, discusses, writes summary, updates index, updates 10-15 relevant wiki pages, logs entries. Prefer one-at-a-time with human involvement.
  • Query — ask questions against the wiki. LLM searches index, reads relevant pages, synthesizes answer with citations. Good answers can be filed back into the wiki as new pages.
  • Lint — periodic health-check: contradictions, stale claims, orphan pages, missing pages, missing cross-references, data gaps.

Indexing and logging

  • index.md — content-oriented catalog. Each page listed with link, one-line summary, organized by category. LLM updates on every ingest.
  • log.md — append-only chronological record. Consistent prefix makes it grep-parseable.

Optional: CLI tools

At larger scale, consider local search engines (qmd with hybrid BM25/vector search and LLM re-ranking).

Tips and tricks

  • Obsidian Web Clipper browser extension for converting web articles to markdown
  • Download images locally (Obsidian Settings → Files and links → Attachment folder path)
  • Graph View for seeing the shape of your wiki
  • Marp for markdown-based slide decks
  • Dataview plugin for queries over page frontmatter
  • The wiki is a git repo — version history, branching, collaboration for free

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it’s the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don’t get bored, don’t forget to update a cross-reference, and can touch 15 files in one pass.

The human’s job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM’s job is everything else.

The idea is related in spirit to Vannevar Bush’s Memex (1945) — a personal, curated knowledge store with associative trails between documents. The part he couldn’t solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. Everything mentioned above is optional and modular — pick what’s useful, ignore what isn’t. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs.