Skip to content

Run your first loop in 15 minutes

By the end of this tutorial you’ll have stood up a sovereign knowledge base, served it to an agent over MCP, and watched a query come back with the exact edges it traversed — the explainable retrieval that makes Dossier’s memory trustworthy. The whole path runs from this repository, offline.

We’ll build it in three moves: provision a tenant, serve a real OKF knowledge base, and query it back. Then you’ll learn how to ingest your own sources.

You’ll need:

  • Node.js ≥ 22 and git on your PATH. Check with node --version and git --version.

  • pnpm (this is a pnpm workspace).

  • A clone of the Dossier repository, built once:

    Terminal window
    git clone https://github.com/twofoldtech-dakota/dossier.git
    cd dossier
    pnpm install
    pnpm build

    pnpm build compiles the CLIs you’ll run below (dossier-runtime, dossier-mcp). Run every command from the repository root.

  1. Provision a tenant.

    A tenant is one client’s siloed workspace: its own OKF repo, a manifest, and — by default — its own git history. This runs fully offline.

    Terminal window
    node ./packages/runtime/dist/cli.js provision \
    --root ./clients \
    --client acme-co

    You’ll see the silo created and a JSON record on stdout:

    dossier-runtime: provisioned "acme-co" at ./clients/acme-co (vcs:git).

    Confirm it landed:

    Terminal window
    node ./packages/runtime/dist/cli.js list --root ./clients
  2. Serve a real knowledge base over MCP.

    To see retrieval work right now, serve the DXA reference knowledge base that ships with the repo — 53 real OKF atoms with 174 typed edges, no network, no key:

    Terminal window
    node ./packages/mcp/dist/cli.js \
    --repo ./verticals/digital-experience-agency \
    --client dxa \
    --known-external-ids knowledge-model

    The server announces what it loaded, then waits for MCP requests on stdio:

    dossier-mcp: serving tenant "dxa" — 53 atom(s), 174 edge(s), 0 load error(s), 0 graph error(s).
    dossier-mcp: connected on stdio — awaiting MCP requests.

    This is a live Model Context Protocol server. It exposes five tools over one tenant’s repo: search_concepts, get_concept, get_related, list_concepts, and kb_health.

  3. Query your knowledge back — explainably.

    Point any MCP client at the server. The fastest way to see it is to add it to Claude Code and ask a question in plain language:

    Terminal window
    claude mcp add dossier-dxa -- \
    node ./packages/mcp/dist/cli.js \
    --repo ./verticals/digital-experience-agency \
    --client dxa \
    --known-external-ids knowledge-model

    Then, in Claude Code, ask: “Search the dxa knowledge base for the discovery process, then show me what it’s related to.” Claude calls search_concepts, then get_related, and the answer comes back with the typed edges traversed — for example:

    search_concepts("discovery process") →
    dxa-discovery (process) score 3.04
    dxa-discovery-report (artifact) score 2.99
    get_related("dxa-discovery", depth 1) → 9 neighbours, via edges:
    dxa-discovery —[owner]→ dxa-solution-architect
    dxa-discovery —[uses]→ dxa-work-management
    dxa-discovery —[produces]→ dxa-discovery-report

    That —[produces]→ edge is the point: the answer isn’t a similarity guess, it’s a walk of the real graph, and you can see why each neighbour came back. That’s explainable GraphRAG.

  4. Curate — keep the human in the loop.

    The atoms you just queried are plain files in verticals/digital-experience-agency/. Open one — say processes/dxa-discovery.md — and you’ll see its frontmatter: confidence, source, and the typed edges (owner, uses, produces). To curate is to edit the file and commit it: change a fact, promote an atom from inferred to verified, fix an edge. The repo is the system of record, so curation is just a git commit. There’s no separate database to keep in sync.

You’ve proven the serve-and-query half of the loop. The other half — turning your raw documents into atoms — is the extract step, and it’s the one place a model runs.

The run verb does the whole loop in one command: ingest a source → extract OKF → emit atoms → commit. Because extraction calls a model, it needs a transport. Pick the one you have:

If you have the claude CLI on your PATH and an active subscription, extract with --subscription — no API key:

Terminal window
node ./packages/runtime/dist/cli.js run \
--root ./clients \
--client acme-co \
--source-dir ./my-docs \
--subscription

Where ./my-docs is a directory of clean markdown or HTML. (Prefer to learn a public website instead? Swap --source-dir ./my-docs for --url https://example.com — the default web crawler is keyless.)

When it finishes, your tenant’s OKF repo holds freshly extracted atoms, committed as one diff in ./clients/acme-co/’s git history. Serve it the same way you served the reference KB in step 2 — just point --repo at your tenant’s OKF repo:

Terminal window
node ./packages/mcp/dist/cli.js \
--repo ./clients/acme-co/okf \
--client acme-co

That’s the full loop: ingest → extract → OKF → serve → query → curate, and every pass is a diff in your own git history.

A clients/acme-co/okf/ directory of OKF atoms in your own git — cat-able, git clone-able, yours forever — and a live, explainable retrieval server over it. The indexes are caches; the files are the truth.