Changelog¶

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased ¶

In Development — v0.5.0 (Rust Core, `rust-core` branch)¶

Clause-ordered mixed-write statements (#792 Step 2) — one statement may now combine CREATE/DELETE/SET/REMOVE clauses (same or different kinds): a single read prefix runs once and each write clause applies in clause order against that frontier, extended with the variables CREATE clauses mint so later clauses target created entities like matched ones. openCypher interactions hold: a plain DELETE sees edges created in-statement (error without DETACH; DETACH cancels the pending edge before it ever hits disk), created-then-deleted entities never reach disk but count in both counters, SET/REMOVE on a created entity edits the write buffer, writing to a deleted entity errors, and repeated deletes are no-ops. A value expression reading a property written earlier in the statement is rejected loudly (cross-clause property visibility is a follow-up), as is a read clause after a write clause (WITH execution is a pre-existing gap). Mixed statements return a one-row six-counter summary (nodes_created/edges_created/ nodes_deleted/edges_deleted/properties_set/properties_removed); single-write statements keep their existing paths and result schemas byte-for-byte. Bonus fixes: multiple same-kind clauses (two SETs, two CREATEs) used to pass the old guard and die in lowering — they now sequence, and two CREATE clauses no longer mint colliding node_id surrogates (one shared writer per statement).
All-or-nothing Parquet writes (#790) — every write in gf-storage stages a sibling temp file and atomically renames it into place, so an I/O failure mid-write (disk full, encode error) can never leave a truncated file. The multi-file rewrites (DELETE across topology/property files, SET/REMOVE across stems, the CREATE append-merge flush, and a #792 mixed statement's whole effect set) stage into one RewriteBatch and commit once — topology/nodes.parquet renames last on the delete path so even a (rare) rename-phase failure leaves at worst orphaned-but-unreferenced rows, never a deleted node with surviving edges. Failures while building replacement files leave the prior on-disk state byte-identical; durability (fsync) remains out of scope for this non-production engine.
Named path variables over variable-length patterns: nodes(p) / relationships(p) / length(p) (#754, first slice) — MATCH p = (a)-[:KNOWS*1..2]->(b) now binds p and the three path functions execute end-to-end. The binder records the path's constituent variables and rewrites the calls at bind time (p itself never reaches the plan): relationships(p) is the #709 relationship-list column verbatim, length(p) is its element count (the hop count; 0-hop → 0), and nodes(p) lowers to a new cypher_path_nodes UDF that recovers the traversal node sequence per row by walking the edge list from the start node's uuid (edges store src/dst in storage orientation, but every BFS emission is a connected walk, so each next node is deterministically the edge's other endpoint — correct for In/Undirected traversals and self-loops). nodes(p) returns List<Struct{node_uuid}>, mirroring the edge-list element shape so node properties/labels can be added later without changing the container kind. Fixed single-hop paths (MATCH p = (a)-[:KNOWS]->(b), including an explicit *1..1) compose the same three functions from the join's scalar columns (length(p) → constant 1, UInt64 like the var-length form; nodes(p) → the two endpoint structs in traversal order; relationships(p) → a one-element list carrying the topology fields with the bind-time rel_type — no edge properties yet, a documented gap vs the var-length list). Bare RETURN p returns one Arrow Struct{nodes, relationships} column per path (named p when unaliased), assembled at projection from the same expressions. All path values null-propagate: an unmatched OPTIONAL MATCH row's p, nodes(p), relationships(p), and length(p) are Cypher null (the fixed-hop forms gate on the edge's edge_uuid; the var-length forms inherit it from the list column — whose registration now survives OPTIONAL MATCH, fixing a latent var-map gap). Deferred (rejected with a clear message): multi-segment path patterns; edge-property parity for the fixed-segment relationship struct and node property/label augmentation are follow-ups.
Property SET / REMOVE write execution with runtime-expression values (#791) — SET n.prop = <expr> and REMOVE n.prop now execute end-to-end for both node and edge targets (previously a bind error). The value is a full runtime expression evaluated per matched row (SET n.age = n.age + 1, SET a.x = b.y), not a baked literal: the lowerer turns it into a DataFusion Expr (resolving against the matched-row schema — the eager property join from #784/#789 makes var_<n>.<prop> available), and the new GraphSetExec evaluates it per batch via create_physical_expr(...).evaluate(...) (the UnwindExec pattern), converting each row's result to an IrLiteral (scalar_to_ir_literal). Writes rewrite the property files in place via new gf-storage primitives (set_node_properties / remove_node_properties / set_edge_properties_rewrite / remove_edge_properties) that reuse the writer's decode → merge → re-infer cycle — SET on a propertyless node mints a fresh row; REMOVE of an absent key/uuid is a no-op (openCypher). Node file stems resolve per row from type_id (entity name, or _untyped in Exploratory mode); edge stems from the row's rel_type_name. The #792 guard extends to SET/REMOVE (no mixing with CREATE; one rewrite kind per query for now), and execute_stream rejects them. Deferred (each rejected with a clear message): SET n:Label / REMOVE n:Label (needs a foundational multi-label storage model — a follow-up issue); SET n += {…} / n = {…} map merge/replace; SET/REMOVE on an untyped edge; mixed/multi-rewrite ordering; non-scalar stored values.
Edge properties on variable-length relationship-list elements (#755) — MATCH (a)-[r:KNOWS*1..2]->(b) RETURN r[0].since now reads the edge's persisted properties. The var-length edge-list struct (List<Struct<{edge_uuid, src_uuid, dst_uuid, rel_type}>>) gains the relation's property columns: the lowerer discovers them from edge_properties/<REL>.parquet (the same dynamic schema the fixed-hop join_edge_properties uses) and bakes the field list into the node schema (the single source of truth); VarLenExpandExec materialises the values by take-ing each property column at the row matching each hop's edge_uuid (NULL for an edge with no property row — LEFT-join semantics). The access lowering needed no change — r[i].<prop> already composes to get_field(array_element(rels, fixup(i)), "<prop>"). A wildcard (*) or a relation with no persisted properties keeps the topology-only 4-field struct (byte-identical — no plan-shape change). r[i].nonexistent_prop remains a plan error (not openCypher NULL) — out of scope.
Reject mixing CREATE/MERGE with DELETE in one statement (#792, step 1) — a query that combines a buffered-append write (CREATE/MERGE) with an in-place rewrite write (DELETE), e.g. MATCH (a) CREATE (b) DELETE a, was silently mis-handled: the gf-api write router classified the plan as a single write kind and ran only the DELETE side, dropping the CREATE. It now returns a clear GfError::Validation ("a single query may not mix CREATE/MERGE with DELETE yet; run them as separate statements") — a guard at the top of run_plan. Neither side runs on rejection; unmixed CREATE-only / MERGE-only / DELETE-only statements are unaffected. Proper clause-ordered sequencing of mixed read+write remains the larger follow-up (#792 step 2).
Inline relationship-property predicates filter in MATCH (#750) — MATCH (a)-[r:KNOWS {since:2020}]->(b) now matches only edges with since = 2020 (previously the inline map was silently dropped and every KNOWS edge matched). The binder lowers the relationship's inline property map to AND-ed equality predicates and emits a GraphOp::Filter after the Expand — the relationship analogue of the node case (#748), reusing the same variable-generic lower_inline_property_filter. The read side already materialises var_<rel>.<prop> for a fixed hop (#784's join_edge_properties), so this is a binder-only change. Guarded to fixed hops: a variable-length edge var binds to a list column, not scalar properties, so inline props there stay a no-op (that's #755). No logical-plan goldens change.
Properties on a fixed-expand destination node now resolve (#789) — MATCH (a:Person)-[:KNOWS]->(b:Person {name:'Bob'}) RETURN a.node_uuid (and WHERE b.age > 28, RETURN b.name, ORDER BY b.x) previously failed to plan with "No field named var_N.name": the destination of a fixed single-hop expand was lowered as an already-bound NodeScan that filtered by label but never joined its property table, so var_<dst>.<prop> was never materialised (only the leading node's properties were joined, #748/#704). join_node_properties is now append-preserving — it passes every existing input column through (so the source + edge columns of the joined Expand plan survive) and appends the destination's re-qualified property columns — and the already-bound NodeScan arm calls it after the type filter. Fixes the whole class of destination-property access, not only inline {prop:val} filters. No logical-plan goldens shift (the optimizer prunes the property join when no destination property is referenced).
DELETE / DETACH DELETE execution (#740) — MATCH (n) DELETE n and DETACH DELETE now run end-to-end. New GraphDeleteExec drains the matched rows, collects the target UUIDs (node vs edge by the resolved kind), and rewrites the affected Parquet files via the storage mutator. openCypher semantics are enforced: a node may be deleted without DETACH only if every relationship still incident to it is also deleted by the same statement — so MATCH (a)-[r]->(b) DELETE r, a is legal, while leaving an untargeted edge on a deleted node raises an ExecutionError ("Cannot delete node, because it still has relationships…"); DETACH DELETE folds the node's incident edges into the delete set and removes them too. DELETE of a NULL (e.g. an unmatched OPTIONAL MATCH row) is a no-op per openCypher, not an error. Wired through the ExtensionPlanner dispatch, a new ExecutionSession::execute_delete, and the gf-api write router (GraphOp::Delete → the write path; rejected on the streaming path like other writes). The errors.feature no-DETACH scenario now runs against a real connected graph (its given step built an empty forge before). SET/REMOVE remain deferred (still reject at bind).
DELETE — logical node + write-gated lowering (#740) — new GraphDeleteNode (gf-plan) mirroring GraphCreateNode: an input-driven write node carrying the resolved DeleteTargets (var + node/edge kind) and the detach flag, emitting a one-row nodes_deleted/edges_deleted summary. gf-rel lowers GraphOp::Delete beside Create, gated on the write target (new_for_writes) so a read-only session still rejects it. Each target's node-vs-edge kind is resolved from the input plan's schema (var_<n>.node_uuid vs var_<n>.edge_uuid), so the executor reads the right identity column. Physical execution follows.
DELETE — IR + binder lowering (#740) — DELETE/DETACH DELETE now lower to a new GraphOp::Delete { vars, detach } instead of failing at bind time with UnsupportedClause. The binder resolves each delete target to a bound variable (node or edge); a non-variable target (DELETE n.prop) is a new InvalidDeleteTarget bind error and an undeclared target stays UndeclaredVariable — only DELETE <var> is valid per openCypher. Whether a target is a node or edge is resolved downstream at execution time. (SET/ REMOVE/CALL still reject at bind for now.) Physical execution follows.
DELETE — storage rewrite primitive (#740) — new gf-storage::mutator module with delete_nodes, delete_edges, and incident_edge_uuids. Unlike the append-only GraphWriter, these rewrite committed Parquet files in place: read the current rows, drop the targeted ones (by node_uuid/edge_uuid via a keep-mask + filter_record_batch), and write the survivors back; an untouched file is never rewritten. delete_nodes also drops the deleted nodes' rows from properties/*.parquet (and delete_edges from edge_properties/*.parquet). incident_edge_uuids maps target node UUIDs → surrogate node_ids and scans the edge files for edges touching them (either endpoint) — the basis for the openCypher "no relationships without DETACH" check and for DETACH's incident- edge cleanup. The binder/lowering/exec wiring lands in follow-ups.
Edge-property persistence — write path (#784) — CREATE (a)-[:KNOWS {since: 2020}]->(b) now persists edge properties instead of erroring with "CREATE edge properties are not yet supported". GraphWriter::set_edge_properties(edge_uuid, rel_type, props) mirrors the node-property path but is keyed by edge_uuid and written to a dedicated edge_properties/REL_TYPE.parquet directory (routed by relation name in every mode, so a relation type never collides with a same-named node label under properties/). The dynamic per-stem schema inference (first-seen column order, first-non-null type, mixed-type→Utf8 coercion, null-first inference) and the read-merge-rewrite append cycle are shared with the node path; the CREATE executor mints the edge UUID and persists the buffered props. On the read side, a fixed single-hop expand LEFT-joins the edge scan with edge_properties/<rel>.parquet on edge_uuid (new EdgePropertyTable provider, mirroring PropertyTable), so MATCH (a)-[r:KNOWS]->(b) RETURN r.since resolves the property to a real column — the edge analogue of inline node-property reads. (The variable-length edge-list var r in [r:KNOWS*1..2] still carries only topology, no properties — that's #755.) Two hardening guards: a CREATE edge that carries properties but no relation type is rejected (its props would route to _untyped where the relation-name-keyed read can never reach them); and a property whose name collides with a topology column (created_at, src_id, …) is dropped from the join rather than building a duplicate-qualified field that breaks the MATCH plan (the topology column stays authoritative; symmetric guard added to the node-property join).
Relationship-list access on the variable-length edge var (#743) — list/ relationship functions now work on r (the List<Struct> edge list from #709), lowered to DataFusion in resolve_builtin: indexing r[i] (0-based, negative-from-end, null on out-of-range — via array_element with a CASE 0→1-based fixup), slicing r[i..j]/r[i..]/r[..j] (0-based, end-exclusive, null-unbounded → DataFusion's 1-based inclusive array_slice, with the unbounded end cast to Int64), head(r)/last(r), type(r[i]) (reads the element's rel_type), and size() as a new polymorphic cypher_size scalar UDF that dispatches on the argument's runtime type (list element count or string char count) rather than mis-mapping size("str") to array_length. These also work on plain list literals ([10,20,30][−1], size('hello')). Rust-core parity with the Python reference's list ops. Deferred (correctness-blocked, follow-ups filed): startNode/endNode (#753, need node materialization), relationships(path)/nodes(path) (#754, need named path variables), r[i].<prop> (#755, need edge-property persistence).
CREATE streams its input instead of materializing it (#747) — GraphCreateExec now drains its child plan batch-by-batch (via DataFusion's partition-coalescing execute_stream) and writes each batch as it arrives, instead of collect-ing the whole MATCH/UNWIND frontier first. A large mixed MATCH … CREATE no longer materializes the full frontier in memory. The single GraphWriter is opened once and flushed once, so per-row reference/mint semantics and the write summary are unchanged. (VarLenExpand/OptionalMatch/Unwind still collect — their kernels genuinely need the full input.)
Inline node-property predicates filter in MATCH (#748) — MATCH (a:Person {name:'Alice'}) now matches only Persons named Alice (previously the inline map was silently dropped and every Person matched). The binder lowers the property map to AND-ed equality predicates (a.k = v AND …, keys sorted for a stable plan) and emits a GraphOp::Filter after the node scan — the same shape a WHERE clause produces. Inline relationship properties (-[r:KNOWS {since:2020}]->) remain deferred (#750) until edge properties are persisted.
Mixed MATCH … CREATE runs the write per matched row (#703) — MATCH (a:Person) CREATE (a)-[:KNOWS]->(b:Person) now creates one new b + one edge per matched a, referencing the matched node's identity rather than minting a duplicate (previously the MATCH was silently discarded and the CREATE ran once). GraphCreateNode/GraphCreateExec are now input-driven: the executor iterates input rows, reads MATCH-bound vars' node_uuid/node_id from the row (via GraphWriter::register_existing_node) and mints fresh nodes/edges per row; the binder marks MATCH-bound CREATE vars as references (CreateNodeSpec::is_reference) and the lowerer folds the preceding pipeline in as the create node's input. A standalone CREATE is driven by the implicit single unit row (creates once, unchanged); an empty MATCH creates nothing. RETURN-after-CREATE, edge properties, and input streaming (vs collect, #747) remain deferred.
Variable-length edge variable binds to the relationship list (#709) — the edge var r in (a)-[r:KNOWS*1..3]->(b) now binds to the openCypher list of relationships along each path, surfaced as a forward-stable Arrow List<Struct<{edge_uuid, src_uuid, dst_uuid, rel_type}>> column (public UUID identity only — no surrogate ids leak). VarLenExpandExec's BFS now tracks the ordered per-path edge sequence (not just a dedup set) and emits it as the trailing list column; the lowerer registers the edge var and length(r) lowers to DataFusion array_length (so MATCH (a)-[r:KNOWS*1..2]->(b) RETURN length(r) returns the hop count per path). size() stays deferred (string vs list polymorphism). Richer access (indexing r[i], type(r[i]), relationships(path), edge properties) is a follow-up (#743) that needs no column-type change.
Reject unimplemented writes + invalid CREATE patterns (#724) — two openCypher semantics that previously "passed" only because execute was a stub now fail loudly. A CREATE relationship with a type disjunction (CREATE (a)-[:KNOWS|LIKES]->(b)) is rejected as a ParseError (a created relationship must name exactly one type; the disjunction is only valid in MATCH, which is unchanged). Unimplemented write/side-effecting clauses — DELETE, SET, REMOVE, CALL — now surface an explicit bind error (GfError::Plan) instead of the binder's previous silent no-op (which made … DELETE n report success while doing nothing). The binder's catch-all arm is now an error rather than a no-op, so any future clause must be wired in deliberately. Full DELETE/DETACH DELETE (and SET/REMOVE) execution semantics are deferred to a follow-up.
Streaming results + cross-session catalog persistence (#725) — new GraphForge::execute_stream/execute_stream_with_params return a SendableRecordBatchStream (backed by ExecutionSession::execute_plan_stream) with the same UUID-only output shaping + schema metadata as execute, so the schema is available before the first batch is pulled (the RecordBatchReader contract the bindings rely on, #587). Writes are rejected (GfError::Validation) — only reads stream. A GraphForge now owns a long-lived multi-thread Tokio runtime so a stream's background tasks (repartition/coalesce) are not cancelled when the call that built it returns; it shuts down in the background on drop, so dropping an instance inside an async context (e.g. the BDD harness) no longer panics. The write path now flushes the shared RuntimeCatalog to topology/runtime_catalog.parquet, so a later GraphForge::new(path) reloads observed labels/relation-types/ properties across sessions.
Query parameters + e2e gate complete (#584) — $name placeholders are now bound to values: execute_with_params("… WHERE n.age > $min …", {min: 28}) substitutes the placeholder via DataFusion's LogicalPlan::with_param_values before physical planning. ir_literal_to_scalar (gf-rel) converts IrLiteral → ScalarValue (one source of truth shared with literal lowering); the Parameter placeholder id now carries the $ so DataFusion's named-param binding resolves it. New ExecutionSession::execute_plan_with_params (reads only; writes carry no placeholders). A missing parameter value errors clearly. This was the last open scenario of the #584 end-to-end gate — the full parse → bind → lower → execute → Arrow pipeline is now exercised end-to-end through the public API, closing the #583 Execution-Baseline epic.
Incremental writes (#733) — GraphWriter::flush now appends/merges with existing on-disk data instead of overwriting, so separate GraphForge::execute("CREATE …") calls accumulate (e.g. CREATE A,B then CREATE C → 3 Person). open seeds surrogate node_id/edge_id from the on-disk maximum (monotonic across sessions); node/edge files concat their new rows with the existing ones; property files are decoded back to literals and re-inferred so the dynamic schema evolves across flushes (new columns, cross-flush type widening to Utf8, leading-null columns) using the same rules as a single flush. No cross-session dedup (pure CREATE mints fresh UUIDs; MATCH…CREATE upsert is #703). The merge is not atomic (acceptable for this small-graph engine). Unblocks the #584 incremental-CREATE scenario.
End-to-end execution baseline tests (#584) — a gf-api integration suite drives the full pipeline (parse → bind → lower → execute → Arrow) through the public GraphForge::execute against a fixture graph (5 Person, 4 KNOWS, 1 LIKES). Covers simple scans, WHERE, count(), single-/two-hop and variable-length traversal, OPTIONAL MATCH, UNWIND, ORDER BY/LIMIT, and the Arrow result contract (FixedSizeBinary(16) node_uuid, no surrogate node_id/edge_id, query metadata, unique query_id). Parameter injection ($min) and incremental CREATE (the writer overwrites on flush — #733) remain deferred. Milestone 13 read-path exit gate.
count() aggregation in RETURN (#729) — RETURN count(n) / count(*) now work. The binder detects top-level aggregate calls (count/sum/avg/min/max/ collect) in a RETURN and emits a GraphOp::Aggregate (non-aggregate items become group-by keys) instead of a Project of an unknown function call. count(*) and count(<node/edge var>) count bound rows (the var is always bound in a MATCH), avoiding a reference to the non-existent bare var_N column; count(DISTINCT expr) maps to a distinct count. Part of the #583 read-path epic.
Exploratory traversal + OPTIONAL projection (#728, #730) — relationship queries now work end-to-end without an ontology. TypedEdgeScan/EdgeScan/var-length Expand resolve relation-type names from the RuntimeCatalog (exposed via GraphCatalog::rel_names()), and CREATE write-lowering does the same so an exploratory edge is written to _exploratory.parquet with its real rel_type_name (previously _UNKNOWN, which no MATCH ()-[:REL]->() filter matched). The catalog no longer registers a typed edges_<rel> table for a runtime relation unless its Parquet file actually exists on disk, so the read path correctly routes exploratory edges through the _exploratory catch-all. OPTIONAL MATCH now registers the optional-side variables it binds, so RETURN m.x over the optional side resolves (was “unbound variable”). Part of the #583 read-path epic; surfaced by the #584 e2e work.
List literals (#714) — IrExpr::ListLiteral now lowers instead of erroring, so UNWIND [1, 2, 3] AS x and UNWIND [] AS x work end-to-end. All-constant lists fold to a ScalarValue::List literal (the form UnwindExec consumes); lists with expression elements lower to a make_array(...) call. Also fixed the lowering base: a source-free pipeline (UNWIND […], RETURN 1) now starts from the single relational unit row rather than a zero-row relation, so it produces output (a pipeline with a scan still starts empty — the scan supplies its own rows). Part of the #583 read-path epic.
Public GraphForge::execute (#719) — the facade is wired into the real parse → bind → lower → execute pipeline and returns an Arrow-backed ExecutionResult. GraphForge::new(Some(path)) loads graphforge.yaml + ontology.yaml (resolving the OntologyMode) and seeds the RuntimeCatalog; new(None) is an in-memory exploratory instance over a temp dir. execute routes CREATE to the write path and reads to execute_plan, sharing one RuntimeCatalog across queries. Results expose UUID identity columns (node_uuid/edge_uuid) — never the surrogate node_id/edge_id — and carry schema metadata (graphforge.query_id, ir_version, ontology_mode, and ontology_version when present). Added ontology_mode() / runtime_catalog() accessors and empty-query validation. Closes the #583 read-path epic's facade PR; execute_stream + runtime-catalog persistence are tracked in #725, and unimplemented write-clause validation (DELETE, multi-type CREATE) in #724.
Property-table JOINs (#704) — RETURN n.name now reads real property values. A labelled node scan LEFT-joins its property table (properties/<Entity>.parquet in strict/advisory mode, properties/_untyped.parquet in exploratory) on node_uuid, projecting each property column re-qualified under the node's var_N so PropertyAccess resolves to the joined column. PropertyAccess IDs resolve to names via the RuntimeCatalog (the binder's PropId space), surfaced through GraphCatalog::prop_names(); the catalog now also registers the exploratory properties__untyped table with its on-disk schema. Part of the #583 read-path epic.
Fixed single-hop joins (#718) — a fixed (a)-[:R]->(b) pattern now lowers to a connected relational join instead of disconnected scans. The binder emits a single Expand { min_hops: 1, max_hops: Some(1) } (the only op carrying the src/dst node vars) for fixed hops, which the lowerer routes to the existing two-join chain; variable-length hops are unchanged. A destination label ((b:Label)) is now applied as a filter on the already-bound node rather than silently dropped (this also fixes the var-length destination label). Surfaced and fixed a latent OPTIONAL MATCH schema bug: now that the optional child is a real join it re-binds shared variables, so the node excludes all of a shared variable's columns from the inner output (not just the join key), avoiding duplicate var_<shared> fields. Part of the #583 read-path epic.
General read execution (#717) — ExecutionSession::execute_plan is wired (lower → physical plan → collect), and read scans now bind their real Parquet-backed catalog providers (TopologyNodeTable/TypedEdgeTable) instead of a schema-only source, so a MATCH (n:Label) reads the rows a prior CREATE wrote. (Schema-only lowering is retained for explain/golden paths that have no project directory.) Part of the #583 read-path epic.
New gf-api crate (#716) — the public GraphForge engine facade now lives in a top-level gf-api crate (depends on the full pipeline: gf-cypher/gf-ir/ gf-rel/gf-exec/gf-storage/gf-ontology). It was relocated out of gf-core, which is the foundation crate every other crate depends on and therefore can't host the facade without a dependency cycle. gf-cli and the language bindings now depend on gf-api; the BDD API/TCK harness moved with it. (First step of the #583 read-path epic; facade methods remain stubs until #717–#719.)
UNWIND execution (#582) — UnwindExec evaluates the list expression per input row and explodes it into one output row per element (input columns + the element column), dropping rows whose list is null or empty (openCypher semantics). Routed from the UnwindNode Extension by the query planner. The full UNWIND … RETURN round-trip and UNWIND [literal] lowering await #583 / list-literal lowering.
UNWIND schema plumbing (#582, foundation) — UnwindNode now declares an output schema that extends the input with the unwound element column (named after the alias, unqualified, nullable, so a bare RETURN x resolves). The lowerer resolves the element type from the list expression.
OPTIONAL MATCH execution (#581) — OptionalMatchExec left-joins the outer input against the optional sub-plan on the shared-variable join keys, preserving every outer row and setting the optional-side columns to null when there is no match (openCypher null-shaping, via take with null indices). Handles fan-out (one outer → many inner), multi-key tuples, empty inner, and the unconditional (no-key) case; routed from the OptionalMatchNode Extension by the query planner.
OPTIONAL MATCH join-key plumbing (#581, foundation) — OptionalMatchNode now carries the shared-variable join_keys and declares an output schema that extends the outer columns with the optional sub-plan's columns made nullable (openCypher null-shaping). The lowerer computes the shared variables (outer ∩ optional) as join keys. Physical execution lands in a follow-up PR; non-empty join keys for real MATCH (a)-[:R]->(b) queries await fixed single-hop join lowering (#583).
Variable-length Expand execution (#580) — VarLenExpandExec performs an iterative BFS over the edge table for patterns like (a)-[:KNOWS*1..3]->(b), with openCypher relationship-isomorphism (no edge reused within a path, so unbounded * terminates on cycles). Handles Out/In/Undirected, bounded and unbounded ranges, and is routed from the VarLenExpandNode Extension by the query planner. The edge variable r (relationship list) is not yet bound — deferred to a follow-up issue.
Variable-length Expand lowering (#580, foundation) — VarLenExpandNode now carries the source/destination vars, direction, relation type, and baked project dir/mode, and declares an output schema that extends the input with the destination node's columns (so a downstream RETURN b.node_id resolves). Physical BFS execution and the edge-variable (relationship-list) binding land in follow-up PRs.
Catalog-free edge/node readers in gf-storage (#580) — read_edges(dir, rel_name, mode) and read_nodes(dir) read the topology Parquet files directly, mirroring the GraphWriter layout (typed vs _exploratory). Lets physical execution nodes (e.g. the upcoming VarLenExpandExec) read edges without a GraphCatalog, which the DataFusion TaskContext does not expose.
LALRPOP parser replacing Python LALR(1) executor
DataFusion-backed execution engine with Arrow result streams
execute_arrow() returning pyarrow.Table via Arrow C Data Interface
execute_stream() returning pyarrow.RecordBatchReader
Parquet storage provider (replaces SQLite in the Rust core)
PyO3 + maturin Python binding (gf-bindings-py)
napi-rs Node binding with Arrow IPC (gf-bindings-node)
See ADR 0002 and roadmap

[0.4.0] - 2026-05-07¶

Added¶

db.gds graph algorithm surface (#484, PRs #514–#516) — 8 compiled algorithms dispatched to igraph (preferred) or NetworkX: pagerank, betweenness_centrality, closeness_centrality, degree_centrality, louvain, connected_components, clustering_coefficient, triangle_count. Stream mode returns dict[node_id, scalar]; write mode persists results via set_node_properties(). Optional node_label, rel_type, directed filters.
db.search hybrid retrieval surface (#476, PRs #517 #533–#539) — FTS5 text search (db.search.text()), vector cosine similarity (db.search.vector()), and RRF-fused hybrid (db.search() / db.search.hybrid()). All methods return list[SearchHit] with .ref, .score, .sources, .text_score, .vector_score. Multi-space vector storage: set_node_vector(id, vec, space="model-name"). Bring-your-own-vectors; space can represent text embeddings, geo coordinates, temporal features, or any numeric feature array.
SearchHit result type — frozen dataclass with score provenance; every .ref.id is directly addressable in db.execute().
set_node_properties() bulk write-back (#480, PR #515) — batch-update node properties from a dict[node_id, dict[prop, value]].
directed and node_id_property export parameters (#478 #479, PR #514) — to_networkx(directed=True), to_igraph(node_id_property="name"), etc.
graphforge.recipes.neighbourhood() (#492, PR #518) — n-hop subgraph expansion returning list[dict] for LLM context building.
Benchmark suite (#388, PR #519) — make benchmark runs real-dataset benchmarks through M-tier (XS/S/M/L).
Integration tests for three-surface interoperability (PR #541) — GDS write-back readable via Cypher, search results addressable via id(), cross-surface PageRank threshold queries.
Use-case documentation (PR #542, closes #523 #524 #525 #526) — db.gds examples in network-analysis.md; db.search + recipes in llm-workflows.md; deduplication pattern in knowledge-graph-construction.md; hybrid tool selection in agent-grounding.md.

[0.3.10] - 2026-05-06¶

Added¶

Schema introspection API (#469, #499) — labels(), relationship_types(), node_count(label=None), relationship_count(type=None) on GraphForge; efficient set-union over storage backend
JSON export/import (#470, #500) — gf.to_json(path, metadata, indent) serialises the full graph to a portable JSON file; GraphForge.from_json(path) reconstructs it; Hypothesis roundtrip property test
merge_node() safe upsert (#471, #501) — merge_node(labels, match_on, on_create, on_match) with regex label validation, index-safe on_match updates, and create/match semantics matching MERGE clause behaviour
add_graph_documents() LangChain-compatible ingestion (#472, #502) — duck-typed batch ingestion accepting any object with .nodes/.relationships attributes; idempotent edges; rel type pre-check; no langchain-community import required
Parse/plan LRU cache (#464, #504) — GraphForge(cache_size=N) caches parsed+planned query pipelines; clear_cache() and cache_info() for introspection; thread-safe; default size 128

Fixed¶

ORDER BY on aliased RETURN DISTINCT property no longer raises UndefinedVariable (#481, #494) — planner now records (variable, property) paths for aliased PropertyAccess return items, so RETURN DISTINCT a.year AS year ORDER BY a.year resolves correctly
Variable reuse across WITH boundary no longer raises KeyError (#482, #495) — optimizer's redundant-traversal-elimination pass now treats Aggregate as a scope boundary (same as With/Union/Subquery), preventing fresh post-WITH variable bindings from being silently dropped
EXISTS {} subquery crash on anonymous inner node with outer-scope property (#474, #496) — _plan_match now passes var_name (always set, from pattern variable or generated) instead of node_pattern.variable (can be None) to _properties_to_predicate
shortestPath() / allShortestPaths() parse correctly and raise NotImplementedError (#468, #497) — previously caused unhelpful SyntaxError: Unexpected token; now parsed into ShortestPathExpression AST nodes; planner raises NotImplementedError with BFS workaround: MATCH path = (a)-[*1..N]-(b) RETURN length(path) ORDER BY length(path) LIMIT 1

[0.3.9] - 2026-05-04¶

Added¶

LALR(1) parser migration (#365, #430) — grammar refactored to LALR(1); linear-time parsing replaces Earley; unknown function names now detected at parse time; 586 new parser tests; OOM-safe local coverage via sharded make coverage
Property equality index for O(1) node lookup (#390, #427) — in-memory _prop_index (prop → value → {node_id}) on Graph; get_nodes_by_property and get_nodes_by_label_and_property methods; executor _extract_equality_hints replaces full scan with index lookup when a ScanNodes is followed by a simple equality Filter; index maintained on SET/REMOVE/DELETE
Bulk ingestion API (#389, #444) — create_node_bulk, create_relationship_bulk (skip per-call Pydantic validation); bulk_ingest() context manager defers statistics rebuilds until exit via _flush_deferred_stats(); CSV loader uses bulk path; significant throughput improvement for dataset loading
Real-dataset QA and performance profiling suite (#417, #419) — 5-tier perf suite (karate 34n, ego-facebook 4k, amazon 334k, livejournal 4M, orkut 117M edges); scripts/perf_report.py renders baseline/delta Markdown tables; make test-perf, make test-perf-xs, make test-perf-large, make test-perf-slow targets; L/XL tier benchmarks added; perf_slow marker applied consistently (#440, #441)
elementId() scalar function (#211, #447) — GQL-spec elementId(node|rel) returns the element id as CypherString; id() continues to return CypherInt; NULL-safe; arity-checked
Scale limits documentation — docs/reference/scale-limits.md explains LIMIT-respecting (~20M edges) vs full-scan (~1M edges) performance ceilings; README graph-size claim updated to reflect edge-count as binding constraint (#443)
make pre-push-fast (#429, #437) — fast-fail pre-push shortcut (format, lint, type, security, docstrings; no coverage); inline coverage thresholds (85% total, 90% patch)

Fixed¶

Fixed-length hop ignores rel predicate (#431, #439) — fixed-length traversal now evaluates rel_pattern.predicate inline; parser predicate whitelist replaced with open accept
var-length relationship predicates (#381, #425) — _var_length_reachable now evaluates inline relationship predicates (e.g. [*1..3 {weight: 1}]) at each hop instead of ignoring them
Cross-hop edge uniqueness in multi-segment patterns (#382, #425) — initial_used_edge_ids seeds the visited-edge set from prior hops, preventing the same edge being traversed twice across pattern segments
SQLite delete persistence (#409, #421) — deleted nodes and edges now removed from SQLite backend; previously they resurrected on reload
SQLite bulk I/O (#392, #421) — save_nodes_bulk / save_edges_bulk use executemany; _save_graph_to_backend wrapped in rollback; resolves >20s hang on M-tier reload
sys.setrecursionlimit thread safety (#433, #445) — limit set once in CypherParser.__init__ instead of per-request get/set/restore, eliminating races in concurrent parser use
Transaction rollback restores ID counters (#412) — rollback() restores _next_node_id and _next_edge_id, preventing ID gaps after aborted transactions
Closed-instance guard (#411) — execute() and mutation methods raise RuntimeError after close()
DatasetInfo rejects ftp:// URLs (#414)
TCK metrics CI gate (#410) — tck_metrics.py exits non-zero on TCK failures

Performance¶

UNWIND + WITH LIMIT short-circuit (#393, #443) — _pipeline_limit reverse-scan treats pass-through With(limit_count=N, no sort, no filter) as transparent; _execute_unwind respects row_limit and exits early; UNWIND range(1M, 2M) AS i WITH i LIMIT 3000 RETURN sum(i) drops from ~4.5s to <10ms
LIMIT short-circuit for scan and expand operators (#422, #423) — _pipeline_limit look-ahead stops scan/expand at row budget; write operators and Filter block short-circuit to preserve correctness
SQLite PRAGMA tuning (#392, #446) — synchronous=NORMAL (eliminates fdatasync per commit), 64 MB page cache (cache_size=-65536), temp_store=MEMORY; safe with WAL mode for analytical workloads
Lazy statistics timestamp — _stats_last_updated stamped at read time in get_statistics() rather than on every add; _defer_stats flag eliminates per-edge stat overhead during bulk ingest

CI / Tooling¶

Test suite zero-base renovation (#428, #438) — fixture strategy audit, duplication removal, marker consistency, parametrization improvements across unit and integration suites
GHA actions updated — actions/upload-artifact v7, actions/download-artifact v8, codecov/codecov-action v6, actions/upload-pages-artifact v5, astral-sh/setup-uv v8.1.0

[0.3.8] - 2026-05-02¶

Fixed - TCK Feature Completeness (3885/3885 passing)¶

Nanosecond precision in temporal types — CypherDateTime and CypherTime now store a _ns_residue (0–999) for sub-microsecond precision; _components tuple extended to 8 elements with nanoseconds; construction, accessors, arithmetic, comparison, and serialization all updated (#224)
Statement clock caching — now(), datetime(), date(), time() (without arguments) now return consistent values within a single query by caching the start time via time.time_ns() (#224)
Extreme year support — dates/datetimes outside Python's 1–9999 range now handled via _WideDate / _WideDateTime classes; durations too large for timedelta use _BigDuration; Julian Day Number arithmetic for cross-extreme-year duration computation (#224)
IANA timezone name preservation — CypherDateTime stores the IANA zone name (e.g. Europe/Stockholm) separately; .timezone accessor returns the zone name rather than the numeric offset; utcoffset() now receives a concrete datetime for DST-aware lookup (#224)
Aggregate detection in QuantifierExpression — ALL(ok IN collect(...) WHERE ok) patterns now correctly detected as aggregate-containing; planner emits Aggregate instead of Project (#224)
OPTIONAL MATCH multi-WHERE placement — multiple WHERE clauses no longer overwrite each other; multi-hop OPTIONAL MATCH WHERE is emitted after all hops so all variables are bound (#224)
coalesce() type inference — return type is inferred from argument types so MATCH (x) after WITH coalesce(a, b) AS x accepts node-typed variables (#224)
Non-deterministic aggregate arguments — count(rand()) and similar expressions now raise AmbiguousAggregationExpression per the openCypher spec (#224)
Temporal accessor for extreme-year datetimes — year, month, day, hour, minute, second accessors work on _WideDateTime-backed CypherDateTime values (#224)
All 23 xfail markers removed — previously marked as expected failures; all were fixable (#224)

Added¶

75-test unit coverage for extreme temporal classes — tests/unit/types/test_extreme_temporal_values.py covers _WideDate, _WideDateTime, _BigDuration, extreme-year construction/accessors, IANA timezone preservation, and large duration handling

Fixed¶

WITH clause WHERE filtering and variable scoping (#362) — WITH n WHERE n.age > 30 now correctly filters post-projection; chained WITH propagates bindings properly
Triadic OPTIONAL MATCH semantics (#302 partial) — OPTIONAL MATCH (a)-[r]->(b) with pre-bound b now correctly filters to only edges reaching that specific destination (LEFT JOIN semantics)
Relationship type disjunction without colon prefix (#363) — [:KNOWS|FOLLOWS] (pipe without colon on subsequent types) now parses correctly
Multi-CREATE variable scoping (#364 partial) — variables created in one CREATE clause are now visible in subsequent CREATE clauses within the same query

Performance¶

O(n²) → O(1) graph statistics updates (#366) — _update_statistics_after_add_edge no longer scans all edges on every insert; replaced with incremental _unique_sources_by_type tracking. Eliminated 2,000+ Pydantic model_copy() calls per 1,000-node graph by switching to mutable counters with lazy GraphStatistics construction
Parser fast path for long CREATE sequences (#366) — queries with many consecutive CREATE clauses (e.g. the TCK movie graph with 971 CREATEs) are pre-split into batches of 5 before Earley parsing, reducing wall-clock time from 27 minutes to ~26 seconds. See docs/development/parser-performance-analysis.md for full analysis and the LALR(1) migration plan (issue #365)

CI¶

TCK sharding across 4 parallel runners — TCK tests now split using pytest-split with duration-based balancing (~4x faster CI)
Coverage sharding — coverage collection also split across 4 shards and merged in report job
Python 3.14 experimental matrix — added as non-blocking continue-on-error entries
TCK performance reporting — new scripts/tck_perf_report.py runs after each TCK run, reporting tests over 5s and tracking use-case tests (movie graph, school graph) for performance regression detection

TCK Coverage¶

3,885 / 3,885 passing (100%) — first release with zero TCK failures and zero expected failures

[0.3.7] - 2026-04-07¶

Added - Functions¶

Math functions: e, pi, exp, log, log10, trig, degrees, radians, timestamp (#326) — full set of mathematical and trigonometric functions per openCypher spec
startNode() and endNode() graph functions (#318) — extract the start/end node from a relationship
properties() and keys() graph functions (#316) — inspect node/relationship property maps and key sets
List concatenation + operator (#315) — [1,2] + [3,4] → [1,2,3,4]
Map subscript access m['key'] (#314) — dynamic map property lookup
Exponent float notation and leading-dot floats (#311) — 1.5e3, .5 now parsed correctly

Fixed - Sorting & Aggregation¶

ORDER BY mixed-type sort ordering (#320) — values of different Cypher types now sort in the correct openCypher type hierarchy order
Aggregation in composite expressions and ORDER BY (#329) — count(n) + 1, ORDER BY count(n) and similar patterns now evaluate correctly
Validate ORDER BY aggregation expressions at compile time (#331) — non-aggregated variables in aggregating ORDER BY now raise a clear error
Detect non-projected ORDER BY aggregates at compile time (#332) — aggregates in ORDER BY that aren't in the projection are caught early

Fixed - Expression Evaluation¶

Operator precedence, null propagation, and type coercion (#321) — NOT, AND, OR precedence; null propagation in arithmetic; integer/float coercion edge cases
Type conversion TypeError/null per openCypher spec (#327) — toInteger(), toFloat(), toString() now return null on unconvertible input per spec
List equality with null uses three-valued logic (#330) — [1, null] = [1, null] now correctly returns null
NaN handling in comparisons, arithmetic, and sort (#323) — NaN propagation and sort position per openCypher spec
SKIP/LIMIT accept expression values (#319) — SKIP toInteger(n.offset) and similar dynamic skip/limit values now work

Fixed - Parser & Clause Gaps¶

WITH keyword position gaps (#310) — WITH is now accepted after SET, REMOVE, DELETE, and in chained multi-part queries
ASCENDING/DESCENDING keywords (#312) — ORDER BY n.name ASCENDING / DESCENDING now parse correctly; removed stale xfail markers (#325)
NodeRef/EdgeRef comparison methods (#313) — nodes and relationships are now sortable and comparable by identity

Fixed - Graph Semantics¶

Relationship uniqueness in MATCH patterns (#333) — runtime enforcement that the same physical edge cannot fill two relationship variables in one MATCH pattern; extended to anonymous multi-hop rels and path-variable (ExpandMultiHop) patterns
Temporal select-into constructors (#334) — cross-type temporal projection (date({date: other}), localdatetime({datetime: dt}), etc.) with component overrides; null propagation, hour-overflow normalization, and timezone conversion all correct

TCK Coverage¶

3,235 / 3,885 passing (83.3%) (up from ~2,507 at v0.3.6) — +728 net new passing scenarios
Theme: ORDER BY correctness, aggregation, operator precedence, graph semantics, temporal completeness

[0.3.6] - 2026-02-28¶

Fixed - TCK Harness Correctness¶

Integer property comparison (#252) — TCK harness was comparing integer properties as strings, causing ~423 false failures; now coerced correctly
Map literal value coercion (#261) — _parse_map_literal in conftest now coerces numeric string values, fixing ~30 false failures

Fixed - Temporal Correctness¶

date({}) / datetime({}) / time({}) map constructors (#253, #268) — component validation, UTC defaults, and function-as-argument parsing
UnexpectedEOF on duration/datetime literals (#254) — parser now handles all ISO 8601 duration and datetime forms
Timezone offset +HH:MM parsing (#255) — +HH:MM and -HH:MM offsets now parsed correctly
Date map constructor component validation (#256) — out-of-range components raise ValueError
datetime(string) defaults to UTC (#276) — string datetimes without timezone now default to UTC per spec
Map constructor omitted components default to 1 (#278) — date({year: 2015}) → 2015-01-01, not error
TRUNCATE units: week, weekYear, quarter, decade, century, millennium (#269) — all truncation units now return correct values
Week-based and select-form datetime constructors (#270) — datetime({week: 30, ...}) and similar forms return correct values
Duration serialization, comparison, multiplication, and fractional arithmetic (#271) — full duration correctness including duration.between()
datetime(aDatetime) type coercion (#274) — coercing datetime subtypes no longer conflicts with :E label parser token
Compact ISO 8601 date/datetime string formats (#273) — ordinal (YYYY-DDD, YYYYDDD), week date (YYYY-Www[-D]), year-month (YYYY-MM, YYYYMM), and year-only (YYYY) all parsed correctly
IANA timezone names (#272) — [Region/City] suffix in datetime strings is parsed; IANA zone is authoritative over explicit offset; historical second-precision offsets (e.g. +00:53:28) handled correctly

TCK Coverage¶

2,507 passing (up from ~1,694 at v0.3.5) — +813 net new passing scenarios
Theme: all temporal TCK scenarios now pass; harness accuracy fully corrected

[0.3.5] - 2026-02-19¶

Added - Math Functions (#195, #196, #197)¶

sqrt(x) - Square root function
Returns CypherFloat; negative input returns null; null propagation
Example: RETURN sqrt(4) AS r → 2.0
rand() - Random float in [0.0, 1.0)
Takes no arguments; returns a new CypherFloat each call
Example: RETURN rand() * 100 AS r
pow(base, exponent) - Exponentiation function
Consistent with ^ operator: pow(2, 3) = 2^3 = 8
Null propagation for both arguments
Example: RETURN pow(2, 10) AS r → 1024

Fixed - Aggregation Function Tests (#201, #202, #203, #204)¶

Fixed ~24 syntax bugs (RETURNfunc( → RETURN func() in aggregation test files
percentileDisc(), percentileCont(), stDev(), stDevP() were already implemented; tests now pass
Updated docs to reflect all four as COMPLETE

Added - TCK Step Definitions (#237)¶

Added 5 missing pytest-bdd step definitions unblocking ~129 previously failing TCK scenarios:
executing control query: — alias for executing query:
the result should be (ignoring element order for lists): — row comparison with sorted list values
the result should be, in order (ignoring element order for lists): — ordered variant
parameters are: — parses datatable, substitutes $param references in queries
there exists a procedure — marked xfail (CALL procedures tracked for v0.3.6 in #190)
Extended _parse_value() to handle list literals like ['Foo', 'Bar']
Extended _row_to_comparable() to handle CypherList values

Implementation Status Updates¶

Math functions: sqrt, rand, pow now COMPLETE (was NOT_IMPLEMENTED)
Aggregation functions: percentileDisc, percentileCont, stDev, stDevP now COMPLETE
TCK pass rate: +129 passing scenarios, +50 xfailed (procedure CALL)

[0.3.4] - 2026-02-18¶

Added - Power Arithmetic Operator (#213)¶

^ (power/exponentiation) operator - Full exponentiation support
Right-associative: 2^3^2 = 2^(3^2) = 512
Highest arithmetic precedence (above *, /)
Type handling: int^int returns int if whole, else float
Negative exponents: 2^-1 = 0.5
Fractional exponents: 4^0.5 = 2.0
NULL propagation for both operands
39 integration tests covering associativity, precedence, types, and edge cases

Added - XOR Logical Operator (#212)¶

XOR operator - Exclusive OR with proper ternary logic
true XOR false = true, true XOR true = false
Precedence: NOT > AND > XOR > OR
NULL propagation: any NULL operand yields NULL
Left-associative chaining: a XOR b XOR c
Case-insensitive keyword (XOR, xor, Xor)
22 integration tests + 4 unit tests

Documentation - Feature Completion Updates¶

Already-Implemented Features Now Documented (#193, #214, #215)¶

length() function - Documented as COMPLETE for paths
length(path) returns relationship count in path
For strings/lists, use size() function instead
File: src/graphforge/executor/evaluator.py:2604
List slicing [start..end] - Documented as COMPLETE
Example: RETURN [1, 2, 3, 4, 5][1..3] → [2, 3]
Supports open-ended slicing: [..], [1..], [..3]
File: src/graphforge/executor/evaluator.py:1458
Negative list indexing - Documented as COMPLETE
Example: RETURN [1, 2, 3][-1] → 3 (last element)
Supports both index and slice operations: [1, 2, 3][-2..-1]
File: src/graphforge/executor/evaluator.py:1458

Added - String Function Aliases¶

toUpper()/toLower() camelCase variants (#194)¶

toUpper(string) - CamelCase alias for UPPER(), converts string to uppercase
Example: RETURN toUpper('hello') AS result -> 'HELLO'
toLower(string) - CamelCase alias for LOWER(), converts string to lowercase
Example: RETURN toLower('HELLO') AS result -> 'hello'
Both camelCase and legacy forms (UPPER/LOWER) supported
Case-insensitive keywords: toUpper, TOUPPER, toLower, TOLOWER all work
18 new integration tests (7 toUpper + 7 toLower + 4 interop)

Implementation Status Updates¶

Functions: 58/72 (81%, +3%) - String functions now 13/13 (100%)
Operators: 32/34 (94%, +6%) - All list operators now COMPLETE
Documentation reflects actual implementation status

[0.3.3] - 2026-02-18¶

Added - Pattern & CALL Features (Feature Completion: 88%)¶

Pattern Predicates - COMPLETE (#216)¶

Inline WHERE in patterns - Filter relationships during pattern matching
Example: MATCH ()-[r:KNOWS WHERE r.since > 2020]->(f) RETURN f.name
Property comparisons, complex expressions (AND, OR, NOT)
Function calls in predicates, NULL handling
Variable-length paths with predicates
Works with undirected relationships
16 comprehensive integration tests

CALL Subqueries - COMPLETE (#189)¶

General CALL { } subquery support - Execute nested queries with full openCypher syntax
Example: CALL { MATCH (p:Person) RETURN p.name AS name } RETURN name
Correlated scoping: Access outer variables from parent query
UNION support: CALL can contain UNION and UNION ALL queries
Unit subqueries: Preserve 1:1 cardinality for side-effect queries
Nested CALL: Support for CALL within CALL
Multiple CALL: Cartesian product of multiple subqueries
13 comprehensive integration tests

Pattern Comprehension - COMPLETE (#217)¶

Pattern-based list operations - Transform graph patterns into lists
Example: [(p:Person) WHERE p.age > 18 | p.name]
Simple node patterns: [(p:Person) | p.name]
Relationship patterns: [(p)-[:KNOWS]->(f) | f.name]
Optional WHERE filters for pattern results
Complex expressions in map clause
Correlated variables from outer scope
Nested in RETURN, WHERE, WITH clauses
NULL property handling
15 comprehensive integration tests

Implementation¶

AST Nodes: - PatternComprehension (expression.py) - pattern, filter_expr, map_expr - CallClause (clause.py) - nested query support

Grammar (cypher.lark): - pattern_comprehension rule with optional WHERE - call_clause and call_query rules - Pattern WHERE predicates already existed, now fully documented

Parser (parser.py): - pattern_comprehension transformer - call_clause and call_query transformers - Enhanced list_literal to handle pattern comprehension

Planner (planner.py): - Call operator with UNION support - TypeContext preservation for nested queries

Executor (executor.py, evaluator.py): - _execute_call with correlated scoping and unit subquery detection - PatternComprehension evaluation via temporary MATCH execution - Pattern predicate evaluation in relationship matching (already existed)

Testing¶

44 new integration tests (16 + 13 + 15)
All tests passing with 100% coverage on new code
TCK pass rate: ~40% (1,530 / 3,837 tests)

Documentation¶

Updated patterns.md: 100% completion (8/8 pattern types)
Updated clauses.md: CALL marked as COMPLETE
All features documented with examples and implementation references
Feature completion: 85% → 88% (+3 features)

Performance¶

No performance regressions
Pattern comprehension uses existing pattern matching infrastructure
CALL subqueries properly isolated with TypeContext.copy()

[0.3.2] - 2026-02-17¶

Added - List Operations (100% Complete)¶

List Operation Functions (#198, #199, #200)¶

filter() - Filter lists by predicate
Example: RETURN filter(x IN [1,2,3,4,5] WHERE x > 3) AS result → [4, 5]
NULL list returns NULL, NULL items excluded
Variable binding with proper scoping
extract() - Map transformations over lists
Example: RETURN extract(x IN [1,2,3] | x * 2) AS result → [2, 4, 6]
NULL list returns NULL, NULL items processed normally
Supports complex expressions and property access
reduce() - Fold/reduce with accumulator
Example: RETURN reduce(sum = 0, x IN [1,2,3,4] | sum + x) AS result → 10
Dual variable binding (accumulator + loop variable)
Empty list returns initial value

Implementation¶

Added three AST nodes: FilterExpression, ExtractExpression, ReduceExpression
Grammar rules for all three expressions in cypher.lark
Parser transformers with Pydantic validation
Evaluator handlers with proper NULL handling and variable scoping
Treated as special syntax (like list comprehensions) due to variable binding

Testing¶

42 comprehensive integration tests
Full coverage of edge cases (empty lists, NULL handling, variable shadowing)
Composition and nesting tests
All tests passing with 100% coverage on new code

Documentation¶

Updated implementation status: 58/72 functions complete (81%, +4%)
List Functions: 8/8 (100%)

[0.3.1] - 2026-02-17¶

Added - Predicate Functions (100% Complete)¶

Quantifier Functions (#205, #206, #207, #208)¶

all() - Tests if all elements in a list satisfy a predicate
Example: RETURN all(x IN [2, 4, 6] WHERE x % 2 = 0) AS result → true
Implements three-valued NULL logic per OpenCypher spec
Returns false if any element fails, true if all pass, NULL if indeterminate
any() - Tests if any element in a list satisfies a predicate
Example: RETURN any(x IN [1, 2, 3] WHERE x > 2) AS result → true
Returns true if any element passes, NULL if no true but some NULL
none() - Tests if no elements in a list satisfy a predicate
Example: RETURN none(x IN [1, 3, 5] WHERE x % 2 = 0) AS result → true
Inverse of any() with proper NULL handling
single() - Tests if exactly one element satisfies a predicate
Example: RETURN single(x IN [1, 2, 3] WHERE x = 2) AS result → true
Returns true only if exactly one match and no NULLs
Returns NULL if uniqueness cannot be determined (NULLs present)

Property and Collection Testing (#209, #210)¶

exists() - Tests if a property exists or expression is not NULL
Example: MATCH (p:Person) WHERE exists(p.age) RETURN p.name
Evaluates before NULL propagation for accurate property checking
Returns false for missing properties (NULL values are not stored)
isEmpty() - Tests if a list, string, or map is empty
Example: RETURN isEmpty([]) AS result → true
Works with lists, strings, and maps
Returns NULL for NULL input (three-valued logic)

Testing¶

57 comprehensive integration tests (34 quantifier + 23 exists/isEmpty)
Parametrized tests for better maintainability
Full coverage of NULL handling edge cases
All tests passing with 100% coverage on new code

Documentation¶

Updated implementation status: 55/72 functions complete (76%, +2%)
Predicate Functions: 6/6 (100%)
Detailed function signatures and examples in docs/reference/implementation-status/functions.md

Performance¶

No performance regressions
Efficient NULL propagation handling
Optimized list iteration for quantifiers

0.3.0 - 2026-02-09¶

Added - Major Cypher Features¶

OPTIONAL MATCH (Left Outer Joins)¶

Left outer join semantics with NULL preservation (#104)
Example: MATCH (p:Person) OPTIONAL MATCH (p)-[:KNOWS]->(f) RETURN p.name, f.name
6 integration tests, comprehensive NULL handling

UNION and UNION ALL¶

Combine query results with automatic deduplication (UNION) or preserve duplicates (UNION ALL) (#104)
Example: MATCH (p:Person) RETURN p.name UNION MATCH (c:Company) RETURN c.name
Tree-based operator structure for nested queries
9 integration tests

List Comprehensions¶

Transform and filter lists declaratively (#104)
Example: RETURN [x IN [1,2,3,4,5] WHERE x > 3 | x * 2]
Supports WHERE filtering, map expressions, and nested comprehensions
12 integration tests

EXISTS and COUNT Subquery Expressions¶

Correlated subqueries for existence checks and counting (#104)
Example: MATCH (p:Person) WHERE EXISTS { MATCH (p)-[:KNOWS]->() } RETURN p.name
Full operator pipeline execution for nested queries
13 integration tests

Variable-Length Path Patterns¶

Recursive traversal with cycle detection (#104)
Example: MATCH (a)-[:KNOWS*1..3]->(b) RETURN a.name, b.name
Depth-first search with per-path cycle prevention
Configurable min/max hop counts
2 integration tests

IS NULL / IS NOT NULL Operators¶

Boolean NULL checking (distinct from = NULL ternary logic) (#104)
Example: MATCH (p:Person) WHERE p.age IS NULL RETURN p.name
Always returns boolean (never NULL)

Added - Dataset Integration¶

NetworkRepository Datasets (#110, #113)¶

10 new graph datasets from NetworkRepository
GraphML loader for complex graph formats
Comprehensive metadata with node/edge counts, categories, licenses
Examples: Polblogs, Polbooks, Karate club, Dolphin social network, C. elegans, Les Miserables
All datasets validated with comprehensive test suite

Spatial and Temporal Types¶

Point type for geographic coordinates
Distance function for spatial queries
Date, DateTime, Time, Duration types
Full openCypher compatibility for type system

Dataset Validation Infrastructure (#112, #113)¶

Comprehensive validation script (scripts/validate_datasets.py)
Validates downloads, caching, node/edge counts, query functionality
Performance benchmarking for all datasets
100% validation success rate (13/13 datasets tested)

Fixed¶

make coverage-diff command now works correctly (#111)
Dataset validation script handles missing query results (#112)
NetworkRepository dataset URLs and metadata corrections (#113)
Exception handling improvements in validation infrastructure (#113)
Resource cleanup with proper try/finally blocks

Architecture Improvements¶

Tree-based operator structure for nested queries
Dual serialization: SQLite+MessagePack (data) + Pydantic+JSON (metadata)
Enhanced expression evaluator with recursive execution
Operator pipeline supports nested query planning

Testing¶

767 integration tests passing (42+ new tests for v0.3.0)
91.96% code coverage maintained
Comprehensive dataset validation suite
Property-based testing with Hypothesis

TCK Compatibility¶

Progress from 16.6% to ~29% openCypher TCK coverage
312+ additional scenarios passing
Foundation for continued TCK improvements toward 39% target

Documentation¶

Complete v0.3.0 feature documentation (CHANGELOG_v0.3.0.md)
Dataset integration guides (docs/datasets/)
Performance benchmarks and optimization tips
Updated openCypher compatibility matrix

Breaking Changes¶

None. All changes maintain backward compatibility with v0.2.0 and v0.2.1.

Known Limitations¶

Variable-length paths: no configurable max depth limit in unbounded queries
UNION: no post-UNION ORDER BY (must be in each branch)
Pattern predicates (WHERE inside patterns) not yet supported

0.2.1 - 2026-02-03¶

Added¶

Dataset loading infrastructure (#68)
Automatic dataset download and caching system
Dataset registry with metadata (nodes, edges, size, category, license)
Built-in support for HTTP downloads with retry logic and timeout
Local cache directory (~/.graphforge/datasets/) with TTL-based expiration
Public API: load_dataset(), list_datasets(), get_dataset_info(), clear_cache()
GraphForge.from_dataset() convenience method for loading datasets
Example: gf = GraphForge.from_dataset("snap-ego-facebook")
CSV edge-list loader (#69)
Load edge-list datasets in CSV/TSV/space-delimited formats
Auto-delimiter detection (tab, comma, space)
Gzip compression support for .gz files
Comment line handling (lines starting with #)
Weighted and unweighted edge support
Node deduplication via caching
Consecutive whitespace handling
Example: Load SNAP datasets with simple edge-list format
5 SNAP datasets available (#69)
snap-ego-facebook - Facebook social circles (4K nodes, 88K edges, 0.5 MB)
snap-email-enron - Enron email network (37K nodes, 184K edges, 2.5 MB)
snap-ca-astroph - Astrophysics collaboration (19K nodes, 198K edges, 1.8 MB)
snap-web-google - Google web graph (876K nodes, 5.1M edges, 75 MB)
snap-twitter-combined - Twitter social circles (81K nodes, 1.8M edges, 25 MB)
Auto-registered on module import
Filterable by source, category, and size
MERGE ON CREATE SET syntax (#65)
Conditional property setting when creating nodes: MERGE (n:Person {id: 1}) ON CREATE SET n.created = timestamp()
Parser support in Lark grammar
Executor tracks whether MERGE created or matched nodes
Comprehensive test coverage (parser, executor, integration)
MERGE ON MATCH SET syntax (#66)
Conditional property setting when matching existing nodes: MERGE (n:Person {id: 1}) ON MATCH SET n.updated = timestamp()
Supports both ON CREATE and ON MATCH in same statement
OpenCypher-compliant semantics

Fixed¶

WITH clause variable passing in aggregation (#67)
Fixed variable scoping issues when using WITH after aggregation
Correctly passes aggregated values to subsequent clauses
Example: MATCH (n) WITH count(n) AS cnt RETURN cnt now works correctly

Documentation¶

Complete dataset documentation (#69)
New dataset overview guide with usage examples
SNAP dataset documentation with 5 available datasets
Updated quick start with dataset loading examples
Added dataset examples to main README
Performance tips for large datasets

Known Limitations¶

Only SNAP datasets available in this release (5 datasets)
Neo4j example datasets, LDBC, and NetworkRepository planned for v0.3.0
See Issue #70 for roadmap to 100+ SNAP datasets

0.2.0 - 2026-02-03¶

Added¶

CASE expressions for conditional logic (#49)
Full openCypher CASE expression support with WHEN/THEN/ELSE/END syntax
Simple CASE (CASE expr WHEN value) and searched CASE (CASE WHEN condition)
NULL-safe semantics following openCypher specification
Example: RETURN CASE WHEN n.age < 18 THEN 'minor' ELSE 'adult' END
COLLECT aggregation function (#46, #48)
Aggregate values into lists with COLLECT() function
DISTINCT support: COLLECT(DISTINCT n.name) removes duplicates
Handles complex types (CypherList, CypherMap) correctly in DISTINCT mode
NULL filtering: NULL values excluded from collected lists
Example: MATCH (n) RETURN COLLECT(n.name)
Arithmetic operators (#44)
Binary operators: +, -, *, /, % (modulo)
Unary minus: -n.value
NULL propagation: operations with NULL return NULL
Type coercion for mixed integer/float operations
Division by zero returns NULL (openCypher-compliant)
Example: RETURN n.price * 1.1 AS price_with_tax
String matching operators (#43)
STARTS WITH: Prefix matching
ENDS WITH: Suffix matching
CONTAINS: Substring matching
Case-sensitive matching following openCypher specification
NULL handling: returns NULL if either operand is NULL
Example: MATCH (n) WHERE n.email ENDS WITH '@example.com'
REMOVE clause (#42)
Remove properties: REMOVE n.property
Remove labels: REMOVE n:Label
Multi-target support: REMOVE n.prop1, n.prop2, n:Label
Idempotent: removing non-existent properties/labels is a no-op
Example: MATCH (n:Person) REMOVE n.age, n:Temporary
UNWIND clause (#40)
Unwind lists into rows: UNWIND [1, 2, 3] AS x RETURN x
Supports nested lists, empty lists, NULL values
Can be used with MATCH, WHERE, and other clauses
Example: UNWIND $ids AS id MATCH (n) WHERE n.id = id RETURN n
NOT logical operator (#30)
Unary negation operator for boolean expressions
NULL-safe semantics: NOT NULL returns NULL
Example: MATCH (n) WHERE NOT n.active RETURN n
DETACH DELETE clause (#33)
OpenCypher-compliant DELETE semantics
DELETE raises error if node has relationships
DETACH DELETE removes all connected edges first, then the node
Example: MATCH (n:Person) DETACH DELETE n
Comprehensive MATCH-CREATE combination tests (#41)
12 integration tests for MATCH followed by CREATE patterns
Validates correctness of mixed read-write operations
Complete documentation reorganization (#56)
Restructured docs into logical sections (getting-started, user-guide, reference, development)
New datasets documentation section with examples
Improved navigation and discoverability
Code of Conduct - Added Contributor Covenant Code of Conduct

Fixed¶

ORDER BY after aggregation (#39)
ORDER BY now correctly finds aliased variables after aggregation
Example: MATCH (n) RETURN COUNT(n) AS cnt ORDER BY cnt now works
RETURN DISTINCT after projection (#38)
RETURN DISTINCT now works correctly after projection expressions
Fixes issue where DISTINCT was applied to wrong columns

Changed¶

Test coverage improved (#37)
Coverage increased from 88.69% to 93.76% (+4.94%)
Added 50+ new tests across parser, planner, and executor
GitHub Pages deployment modernization (#32)
Migrated from legacy mkdocs gh-deploy to GitHub Actions native deployment
Uses actions/upload-pages-artifact@v3 and actions/deploy-pages@v4
Simpler, faster, more secure deployment with id-token authentication
Issue workflow documentation (#45)
Updated ISSUE_WORKFLOW.md to reflect current development process

0.1.4 - 2026-02-02¶

Added¶

Local coverage validation workflow (#15, #16)
make pre-push now validates 85% combined line+branch coverage locally before pushing
make coverage - Run tests with coverage measurement and generate reports
make check-coverage - Validate 85% combined coverage threshold
make coverage-strict - Strict 90% threshold validation for new features
make coverage-report - Open HTML coverage report in browser
make coverage-diff - Show coverage for changed files only
Catches 90% of coverage issues before CI, eliminating codecov patch failures
Current coverage: 88.69% (92.41% line + 81.19% branch)
Codecov Test Analytics integration (#17, #18)
JUnit XML generation for all test runs across 8,203 tests (481 unit/integration + 7,722 TCK)
Test performance monitoring with execution time trends
Flaky test detection for intermittent failures
Failure rate tracking and reliability pattern analysis
Cross-platform analytics tracking (12 OS/Python combinations)
make test-analytics - Generate JUnit XML locally for analysis
Analytics dashboard at https://app.codecov.io/gh/DecisionNerd/graphforge
List and map literal support in CREATE statements (#15)
CREATE now accepts complex property types: lists, maps, and nested structures
Proper bidirectional CypherValue ↔ Python type conversion
10 new integration tests covering complex property edge cases
Codecov integration for automated coverage tracking
Coverage reports uploaded from GitHub Actions
Component-level coverage tracking (parser, planner, executor, storage, ast, types)
PR comments with coverage changes
Branch coverage analysis
Configuration file (.codecov.yml) with 85% project target, 80% patch target

Changed¶

Development workflow modernization (#16)
Updated CONTRIBUTING.md with comprehensive make-based workflow documentation
Single command for all pre-push validation: make pre-push (format-check, lint, type-check, coverage, check-coverage)
Clear documentation of coverage requirements (85% project, 80% patch)
Complete guidance on all available make targets with examples
Test suite significantly expanded to 479 unit/integration tests + 7,722 TCK compliance tests

Fixed¶

Codecov test analytics deprecation warning (#18)
Migrated from deprecated test-results-action@v1 to codecov-action@v5
Uses report_type: test_results parameter for future-proof compatibility

[0.1.3] - 2026-02-01¶

Changed¶

Column naming now uses variable names for simple variable references (openCypher TCK compliance)
RETURN n now produces column name "n" (previously "col_0")
RETURN n AS alias produces column name "alias" (unchanged)
RETURN n.property produces column name "col_0" (unchanged - complex expression)
This aligns GraphForge with the openCypher specification and improves Neo4j compatibility
Note: This is a breaking change from v0.1.2 but necessary for WITH clause correctness
Rationale: WITH clause requires preserving variable names through query pipeline
Test suite expanded with WITH clause coverage (17 comprehensive test cases)

Added¶

Comprehensive WITH clause integration tests covering:
Basic projection and variable renaming
WHERE filtering on intermediate results
Aggregation with GROUP BY semantics
ORDER BY, SKIP, LIMIT on intermediate results
Multi-part query chaining
Edge cases and null handling

Fixed¶

WITH clause bugs: column naming, aggregations, and DISTINCT behavior
CodeRabbit configuration file to use only valid schema properties
Removed unused pytest import from WITH clause tests

0.1.2 - 2026-02-01¶

Added¶

Professional versioning and release management system
Comprehensive CHANGELOG.md following Keep a Changelog format
Automated version bumping script (scripts/bump_version.py)
Release process documentation (RELEASING.md, docs/RELEASE_PROCESS.md, docs/RELEASE_STRATEGY.md)
Weekly automated release check with GitHub issue reminders
Release tracking workflow with auto-labeling
MkDocs Material documentation site
Auto-generated API documentation from docstrings
Complete user guide (installation, quickstart, Cypher guide)
Auto-deploy to GitHub Pages on every push
CI/CD enhancements
CHANGELOG validation workflow (ensures PRs update changelog)
Automated PR labeling based on changed files
Labels for component tracking (parser, planner, executor)
.editorconfig for consistent editor settings across IDEs

Changed¶

Updated GitHub Actions to Node.js 24 (actions/checkout v6, actions/setup-python v6, astral-sh/setup-uv v7)
Enhanced PR guidelines to enforce small PRs and proper fixes
Updated README badges to professional numpy-style flat badges

Fixed¶

Integration test regression from WITH clause implementation
Column naming now correctly uses col_N for unnamed return items
SKIP/LIMIT queries no longer return empty results
TCK test collection error resolved
API documentation now references actual modules (api, ast, parser, planner, executor, storage, types)

[0.1.1] - 2026-01-31¶

Added¶

WITH clause for query chaining and subqueries
Production-grade CI/CD infrastructure
Pre-commit hooks (ruff, mypy, bandit)
CodeRabbit integration
Dependabot configuration
PR and issue templates
Comprehensive documentation (30+ docs)
TCK compliance at 16.6% (638/3,837 scenarios)

Changed¶

Updated project URLs to reflect new organization
Enhanced README with additional badges

Fixed¶

Critical integration test failures (20 tests)
TCK test collection error

0.1.0 - 2026-01-30¶

Added¶

Initial release of GraphForge
Core data model (nodes, edges, properties, labels)
Python builder API (create_node, create_relationship)
SQLite persistence with ACID transactions
openCypher query execution
MATCH, WHERE, CREATE, SET, DELETE, MERGE, RETURN
ORDER BY, LIMIT, SKIP
Aggregations (COUNT, SUM, AVG, MIN, MAX)
Parser and AST for openCypher subset
Query planner and executor
351 tests (215 unit + 136 integration)
81% code coverage
Multi-OS, multi-Python CI/CD (3 OS × 4 Python versions)

Changelog¶

Unreleased¶

In Development — v0.5.0 (Rust Core, rust-core branch)¶

[0.4.0] - 2026-05-07¶

Added¶

[0.3.10] - 2026-05-06¶

Added¶

Fixed¶

[0.3.9] - 2026-05-04¶

Added¶

Fixed¶

Performance¶

CI / Tooling¶

[0.3.8] - 2026-05-02¶

Fixed - TCK Feature Completeness (3885/3885 passing)¶

Added¶

Fixed¶

Performance¶

CI¶

TCK Coverage¶

[0.3.7] - 2026-04-07¶

Added - Functions¶

Fixed - Sorting & Aggregation¶

Fixed - Expression Evaluation¶

Fixed - Parser & Clause Gaps¶

Fixed - Graph Semantics¶

TCK Coverage¶

[0.3.6] - 2026-02-28¶

Fixed - TCK Harness Correctness¶

Fixed - Temporal Correctness¶

TCK Coverage¶

[0.3.5] - 2026-02-19¶

Added - Math Functions (#195, #196, #197)¶

Fixed - Aggregation Function Tests (#201, #202, #203, #204)¶

Added - TCK Step Definitions (#237)¶

Implementation Status Updates¶

[0.3.4] - 2026-02-18¶

Added - Power Arithmetic Operator (#213)¶

Added - XOR Logical Operator (#212)¶

Documentation - Feature Completion Updates¶

Already-Implemented Features Now Documented (#193, #214, #215)¶

Added - String Function Aliases¶

toUpper()/toLower() camelCase variants (#194)¶

Implementation Status Updates¶

[0.3.3] - 2026-02-18¶

Added - Pattern & CALL Features (Feature Completion: 88%)¶

Pattern Predicates - COMPLETE (#216)¶

CALL Subqueries - COMPLETE (#189)¶

Pattern Comprehension - COMPLETE (#217)¶

Implementation¶

Testing¶

Documentation¶

Performance¶

[0.3.2] - 2026-02-17¶

Added - List Operations (100% Complete)¶

List Operation Functions (#198, #199, #200)¶

Implementation¶

Testing¶

Documentation¶

[0.3.1] - 2026-02-17¶

Added - Predicate Functions (100% Complete)¶

Quantifier Functions (#205, #206, #207, #208)¶

Property and Collection Testing (#209, #210)¶

Testing¶

Documentation¶

Performance¶

0.3.0 - 2026-02-09¶

Added - Major Cypher Features¶

OPTIONAL MATCH (Left Outer Joins)¶

UNION and UNION ALL¶

List Comprehensions¶

EXISTS and COUNT Subquery Expressions¶

Variable-Length Path Patterns¶

IS NULL / IS NOT NULL Operators¶

Added - Dataset Integration¶

NetworkRepository Datasets (#110, #113)¶

Spatial and Temporal Types¶

Dataset Validation Infrastructure (#112, #113)¶

Fixed¶

Architecture Improvements¶

Unreleased ¶

In Development — v0.5.0 (Rust Core, `rust-core` branch)¶