Workbench¶

Status: Accepted Date: 2026-06-07 Build target: v0.5.0 Related: ADR 0002 (Rust Core), ADR 0004 (Progressive Ontology), ADR 0005 (Adjacency Index), ADR 0007 (Epistemic Model)

Context¶

GraphForge is positioned throughout its documentation as a lightweight, embedded, local-first knowledge analysis workbench — not a graph database, not a server, not a distributed system. The graph is the load-bearing core, but it is one asset inside a larger analytical project, not the whole product.

That positioning appears in the requirements, the overview, and the v0.5 refactor reference, but it has never been turned into an architectural decision. There is no named boundary between:

the graph (nodes, edges, properties, traversal, Cypher), and
the knowledge an analyst accretes around the graph (provenance, confidence, evidence, epistemic status), and
the workbench experience the analyst works through (the analyst verbs, search, workflows, exploration).

Two concrete symptoms show the cost of the missing boundary:

Schema drift toward the graph. The node fact schema documented in execution-model.md carried props_json, confidence, provenance_id, valid_from_ts, and valid_to_ts — graph, knowledge, and temporal concerns fused into one row. The shipped storage schema (crates/gf-storage/src/schemas.rs) had already moved away from this (topology = identity + type only; properties split out), but the doc lagged because there was no rule about where these concerns belong.
Knowledge concerns with no home. Provenance is a 17-line stub; confidence is a hardcoded 1.0; there is no model for evidence or for the evolution of an analyst's understanding. These are not graph concerns, but with no knowledge layer to own them they have drifted as unowned columns on graph tables.

Without an explicit layering, every new capability risks being implemented as more columns on the graph — contaminating graph semantics, bloating the hot path, and eroding the lightweight model.

This ADR establishes the layering and, critically, the boundary rule that keeps the graph graph-native.

Decision¶

GraphForge is organised into three layers. Each owns distinct concerns, and lower layers never depend on higher ones.

┌──────────────────────────────────────────────────────────────────┐
│  WORKBENCH LAYER                                                   │
│  Analyst experience: forge.rank / cluster / paths / analyze /      │
│  similar / find · hybrid search · workflows / recipes ·            │
│  exploration · project portability                                 │
│  — consumes the layers below; holds NO graph-semantic state        │
└───────────────────────────────┬──────────────────────────────────┘
                                 │ consumes (Arrow + UUID references)
┌───────────────────────────────┴──────────────────────────────────┐
│  KNOWLEDGE LAYER                                                   │
│  provenance · confidence · evidence · ontology-inference lineage · │
│  epistemic assertions + status + supersession + valid-time         │
│  (ADR 0007)                                                        │
│  — attaches to graph objects BY UUID REFERENCE ONLY                │
└───────────────────────────────┬──────────────────────────────────┘
                                 │ references (UUID), never embeds
┌───────────────────────────────┴──────────────────────────────────┐
│  GRAPH LAYER                                                       │
│  nodes · edges · properties · traversal · pattern matching ·       │
│  graph algorithms · adjacency (ADR 0005)                           │
│  — graph-native; surrogate-keyed execution; UUID identity          │
│  — stores NO knowledge or workbench semantics                      │
└────────────────────────────────────────────────────────────────────┘

Layer responsibilities¶

Layer	Owns	Crates / storage (today)	Must NOT hold
Graph	Nodes, edges, properties, traversal, pattern matching, graph algorithms, adjacency index	`gf-cypher`, `gf-ir`, `gf-rel`, `gf-exec`, `gf-storage` (`topology/`, `properties/`, `indexes/adjacency/`)	confidence semantics, epistemic status, evidence, reasoning
Knowledge	Provenance events + lineage, confidence + policy, evidence links, ontology-inference lineage, epistemic assertions/status/supersession/valid-time	`gf-provenance` (to be de-stubbed) + a knowledge module; `provenance/`, `knowledge/` Parquet	graph topology, traversal logic, Cypher semantics
Workbench	Analyst verbs, hybrid search, workflows/recipes, exploration, project envelope	`gf-api`, bindings, search modules	graph-semantic state, persisted knowledge logic

The boundary rule (load-bearing)¶

Knowledge attaches to the graph by UUID reference, never by embedding.

The graph layer stores identity (*_uuid), type, topology, and properties. It does not store what a fact means epistemically, how confident we are, what evidence supports it, or how belief evolved.
The knowledge layer stores those concerns in its own tables, each row referencing graph objects by their *_uuid. A node carries a node_uuid; a confidence score, a provenance event, or an epistemic assertion about that node lives in the knowledge layer and points back via node_uuid.
The workbench layer consumes both lower layers and returns Arrow. It holds no persisted graph-semantic state; a verb is a function over the graph + knowledge layers, not a new place to store graph meaning.

Consequence for graph-native execution¶

Cypher execution, traversal, and graph algorithms operate only on the graph layer. The presence or absence of knowledge-layer data (provenance, confidence, epistemic history) must never change the result of a graph-native query. This is testable and is enforced as a boundary regression test (see ADR 0007): a graph with full epistemic history returns identical Cypher results to the same graph without it.

"Which layer does X belong to?" checklist¶

When adding a capability, ask in order:

Is it about nodes/edges/properties/traversal/pattern-matching/graph-algorithms? → Graph layer. Keep it surrogate-keyed and graph-native. Do not add knowledge columns.
Is it about where a fact came from, how confident we are, what supports it, or how the analyst's understanding evolved? → Knowledge layer. Store it in its own table, referencing graph objects by UUID. Never as a column on a topology table.
Is it about how the analyst works — a verb, a search, a workflow, an exploration, a result shape? → Workbench layer. Implement it as a consumer of the layers below; persist no graph semantics.

If a capability seems to span layers, split it: the graph part stays in the graph layer, the meaning/evidence part goes to the knowledge layer, the experience part goes to the workbench.

Consequences¶

Positive¶

Graph stays lean and fast. The hot traversal path reads only topology/ (and the adjacency index), never knowledge columns. This protects the lightweight-embedded model.
Knowledge becomes a real subsystem, not unowned columns — a prerequisite for making provenance, confidence, and the epistemic model (ADR 0007) real in v0.5.0.
Clear extension story. Future capabilities have an obvious home, reducing the risk of graph-semantic contamination.
Testable boundary. "Knowledge never changes graph results" is a concrete regression test.

Negative / Risks¶

More tables and a join (graph ↔ knowledge by UUID) where previously a column would have sufficed. Mitigated by: knowledge tables are capability-gated and optional; the graph hot path never reads them; UUID joins are bounded and local.
A discipline cost: contributors must classify each capability by layer. Mitigated by the checklist above and the boundary regression test.

Lightweight-embedded impact¶

None negative. The layering is conceptual + storage-organisational; it introduces no server, no service, no infrastructure. Knowledge and workbench layers are local Parquet + in-process code, the same as the graph layer. If anything, keeping knowledge out of the graph hot path protects embedded performance.

Alternatives Considered¶

Alternative	Rejected because
No explicit layering (status quo)	Knowledge concerns drift onto graph tables as unowned columns; graph semantics get contaminated; the hot path bloats. The two concrete symptoms above are the result.
Two layers (graph + everything-else)	Conflates "what an analyst believes" (knowledge) with "how an analyst works" (workbench). Workbench verbs would end up owning persistence logic that belongs to the knowledge layer.
Knowledge as graph properties	Embeds confidence/provenance/epistemic status as node/edge columns. Violates the boundary rule, slows traversal, and makes the graph collapse to current state (kills the epistemic model).

References¶

Architecture Overview — layer diagram and per-layer sections
Architecture Refactor v0.5 — layering subsection, §6 knowledge layer
Storage Architecture — provenance/ and knowledge/ tables
ADR 0002: Rust Core — crate layout
ADR 0005: Adjacency Index — a graph-layer derived accelerator
ADR 0007: Epistemic Model — the knowledge-layer epistemic extension