Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased¶
In Development — v0.5.0 (Rust Core, rust-core branch)¶
- Clause-ordered mixed-write statements (#792 Step 2) — one statement may now
combine CREATE/DELETE/SET/REMOVE clauses (same or different kinds): a single
read prefix runs once and each write clause applies in clause order against
that frontier, extended with the variables CREATE clauses mint so later
clauses target created entities like matched ones. openCypher interactions
hold: a plain DELETE sees edges created in-statement (error without
DETACH; DETACH cancels the pending edge before it ever hits disk),
created-then-deleted entities never reach disk but count in both counters,
SET/REMOVE on a created entity edits the write buffer, writing to a deleted
entity errors, and repeated deletes are no-ops. A value expression reading a
property written earlier in the statement is rejected loudly (cross-clause
property visibility is a follow-up), as is a read clause after a write
clause (
WITHexecution is a pre-existing gap). Mixed statements return a one-row six-counter summary (nodes_created/edges_created/nodes_deleted/edges_deleted/properties_set/properties_removed); single-write statements keep their existing paths and result schemas byte-for-byte. Bonus fixes: multiple same-kind clauses (two SETs, two CREATEs) used to pass the old guard and die in lowering — they now sequence, and two CREATE clauses no longer mint collidingnode_idsurrogates (one shared writer per statement). - All-or-nothing Parquet writes (#790) — every write in gf-storage stages a
sibling temp file and atomically renames it into place, so an I/O failure
mid-write (disk full, encode error) can never leave a truncated file. The
multi-file rewrites (DELETE across topology/property files, SET/REMOVE
across stems, the CREATE append-merge flush, and a #792 mixed statement's
whole effect set) stage into one
RewriteBatchand commit once —topology/nodes.parquetrenames last on the delete path so even a (rare) rename-phase failure leaves at worst orphaned-but-unreferenced rows, never a deleted node with surviving edges. Failures while building replacement files leave the prior on-disk state byte-identical; durability (fsync) remains out of scope for this non-production engine. - Named path variables over variable-length patterns:
nodes(p)/relationships(p)/length(p)(#754, first slice) —MATCH p = (a)-[:KNOWS*1..2]->(b)now bindspand the three path functions execute end-to-end. The binder records the path's constituent variables and rewrites the calls at bind time (pitself never reaches the plan):relationships(p)is the #709 relationship-list column verbatim,length(p)is its element count (the hop count; 0-hop → 0), andnodes(p)lowers to a newcypher_path_nodesUDF that recovers the traversal node sequence per row by walking the edge list from the start node's uuid (edges store src/dst in storage orientation, but every BFS emission is a connected walk, so each next node is deterministically the edge's other endpoint — correct forIn/Undirectedtraversals and self-loops).nodes(p)returnsList<Struct{node_uuid}>, mirroring the edge-list element shape so node properties/labels can be added later without changing the container kind. Fixed single-hop paths (MATCH p = (a)-[:KNOWS]->(b), including an explicit*1..1) compose the same three functions from the join's scalar columns (length(p)→ constant 1, UInt64 like the var-length form;nodes(p)→ the two endpoint structs in traversal order;relationships(p)→ a one-element list carrying the topology fields with the bind-timerel_type— no edge properties yet, a documented gap vs the var-length list). BareRETURN preturns one ArrowStruct{nodes, relationships}column per path (namedpwhen unaliased), assembled at projection from the same expressions. All path values null-propagate: an unmatchedOPTIONAL MATCHrow'sp,nodes(p),relationships(p), andlength(p)are Cyphernull(the fixed-hop forms gate on the edge'sedge_uuid; the var-length forms inherit it from the list column — whose registration now survivesOPTIONAL MATCH, fixing a latent var-map gap). Deferred (rejected with a clear message): multi-segment path patterns; edge-property parity for the fixed-segment relationship struct and node property/label augmentation are follow-ups. - Property
SET/REMOVEwrite execution with runtime-expression values (#791) —SET n.prop = <expr>andREMOVE n.propnow execute end-to-end for both node and edge targets (previously a bind error). The value is a full runtime expression evaluated per matched row (SET n.age = n.age + 1,SET a.x = b.y), not a baked literal: the lowerer turns it into a DataFusionExpr(resolving against the matched-row schema — the eager property join from #784/#789 makesvar_<n>.<prop>available), and the newGraphSetExecevaluates it per batch viacreate_physical_expr(...).evaluate(...)(theUnwindExecpattern), converting each row's result to anIrLiteral(scalar_to_ir_literal). Writes rewrite the property files in place via newgf-storageprimitives (set_node_properties/remove_node_properties/set_edge_properties_rewrite/remove_edge_properties) that reuse the writer's decode → merge → re-infer cycle — SET on a propertyless node mints a fresh row; REMOVE of an absent key/uuid is a no-op (openCypher). Node file stems resolve per row fromtype_id(entity name, or_untypedin Exploratory mode); edge stems from the row'srel_type_name. The #792 guard extends to SET/REMOVE (no mixing with CREATE; one rewrite kind per query for now), andexecute_streamrejects them. Deferred (each rejected with a clear message):SET n:Label/REMOVE n:Label(needs a foundational multi-label storage model — a follow-up issue);SET n += {…}/n = {…}map merge/replace; SET/REMOVE on an untyped edge; mixed/multi-rewrite ordering; non-scalar stored values. - Edge properties on variable-length relationship-list elements (#755) —
MATCH (a)-[r:KNOWS*1..2]->(b) RETURN r[0].sincenow reads the edge's persisted properties. The var-length edge-list struct (List<Struct<{edge_uuid, src_uuid, dst_uuid, rel_type}>>) gains the relation's property columns: the lowerer discovers them fromedge_properties/<REL>.parquet(the same dynamic schema the fixed-hopjoin_edge_propertiesuses) and bakes the field list into the node schema (the single source of truth);VarLenExpandExecmaterialises the values bytake-ing each property column at the row matching each hop'sedge_uuid(NULL for an edge with no property row — LEFT-join semantics). The access lowering needed no change —r[i].<prop>already composes toget_field(array_element(rels, fixup(i)), "<prop>"). A wildcard (*) or a relation with no persisted properties keeps the topology-only 4-field struct (byte-identical — no plan-shape change).r[i].nonexistent_propremains a plan error (not openCypher NULL) — out of scope. - Reject mixing CREATE/MERGE with DELETE in one statement (#792, step 1) — a query
that combines a buffered-append write (
CREATE/MERGE) with an in-place rewrite write (DELETE), e.g.MATCH (a) CREATE (b) DELETE a, was silently mis-handled: the gf-api write router classified the plan as a single write kind and ran only the DELETE side, dropping the CREATE. It now returns a clearGfError::Validation("a single query may not mix CREATE/MERGE with DELETE yet; run them as separate statements") — a guard at the top ofrun_plan. Neither side runs on rejection; unmixed CREATE-only / MERGE-only / DELETE-only statements are unaffected. Proper clause-ordered sequencing of mixed read+write remains the larger follow-up (#792 step 2). - Inline relationship-property predicates filter in MATCH (#750) —
MATCH (a)-[r:KNOWS {since:2020}]->(b)now matches only edges withsince = 2020(previously the inline map was silently dropped and every KNOWS edge matched). The binder lowers the relationship's inline property map to AND-ed equality predicates and emits aGraphOp::Filterafter theExpand— the relationship analogue of the node case (#748), reusing the same variable-genericlower_inline_property_filter. The read side already materialisesvar_<rel>.<prop>for a fixed hop (#784'sjoin_edge_properties), so this is a binder-only change. Guarded to fixed hops: a variable-length edge var binds to a list column, not scalar properties, so inline props there stay a no-op (that's #755). No logical-plan goldens change. - Properties on a fixed-expand destination node now resolve (#789) —
MATCH (a:Person)-[:KNOWS]->(b:Person {name:'Bob'}) RETURN a.node_uuid(andWHERE b.age > 28,RETURN b.name,ORDER BY b.x) previously failed to plan with "No field named var_N.name": the destination of a fixed single-hop expand was lowered as an already-boundNodeScanthat filtered by label but never joined its property table, sovar_<dst>.<prop>was never materialised (only the leading node's properties were joined, #748/#704).join_node_propertiesis now append-preserving — it passes every existing input column through (so the source + edge columns of the joined Expand plan survive) and appends the destination's re-qualified property columns — and the already-boundNodeScanarm calls it after the type filter. Fixes the whole class of destination-property access, not only inline{prop:val}filters. No logical-plan goldens shift (the optimizer prunes the property join when no destination property is referenced). - DELETE / DETACH DELETE execution (#740) —
MATCH (n) DELETE nandDETACH DELETEnow run end-to-end. NewGraphDeleteExecdrains the matched rows, collects the target UUIDs (node vs edge by the resolved kind), and rewrites the affected Parquet files via the storage mutator. openCypher semantics are enforced: a node may be deleted withoutDETACHonly if every relationship still incident to it is also deleted by the same statement — soMATCH (a)-[r]->(b) DELETE r, ais legal, while leaving an untargeted edge on a deleted node raises an ExecutionError ("Cannot delete node, because it still has relationships…");DETACH DELETEfolds the node's incident edges into the delete set and removes them too.DELETEof a NULL (e.g. an unmatchedOPTIONAL MATCHrow) is a no-op per openCypher, not an error. Wired through theExtensionPlannerdispatch, a newExecutionSession::execute_delete, and the gf-api write router (GraphOp::Delete→ the write path; rejected on the streaming path like other writes). Theerrors.featureno-DETACH scenario now runs against a real connected graph (itsgivenstep built an empty forge before).SET/REMOVEremain deferred (still reject at bind). - DELETE — logical node + write-gated lowering (#740) — new
GraphDeleteNode(gf-plan) mirroringGraphCreateNode: an input-driven write node carrying the resolvedDeleteTargets (var + node/edge kind) and thedetachflag, emitting a one-rownodes_deleted/edges_deletedsummary. gf-rel lowersGraphOp::DeletebesideCreate, gated on the write target (new_for_writes) so a read-only session still rejects it. Each target's node-vs-edge kind is resolved from the input plan's schema (var_<n>.node_uuidvsvar_<n>.edge_uuid), so the executor reads the right identity column. Physical execution follows. - DELETE — IR + binder lowering (#740) —
DELETE/DETACH DELETEnow lower to a newGraphOp::Delete { vars, detach }instead of failing at bind time withUnsupportedClause. The binder resolves each delete target to a bound variable (node or edge); a non-variable target (DELETE n.prop) is a newInvalidDeleteTargetbind error and an undeclared target staysUndeclaredVariable— onlyDELETE <var>is valid per openCypher. Whether a target is a node or edge is resolved downstream at execution time. (SET/REMOVE/CALLstill reject at bind for now.) Physical execution follows. - DELETE — storage rewrite primitive (#740) — new
gf-storage::mutatormodule withdelete_nodes,delete_edges, andincident_edge_uuids. Unlike the append-onlyGraphWriter, these rewrite committed Parquet files in place: read the current rows, drop the targeted ones (bynode_uuid/edge_uuidvia a keep-mask +filter_record_batch), and write the survivors back; an untouched file is never rewritten.delete_nodesalso drops the deleted nodes' rows fromproperties/*.parquet(anddelete_edgesfromedge_properties/*.parquet).incident_edge_uuidsmaps target node UUIDs → surrogatenode_ids and scans the edge files for edges touching them (either endpoint) — the basis for the openCypher "no relationships without DETACH" check and for DETACH's incident- edge cleanup. The binder/lowering/exec wiring lands in follow-ups. - Edge-property persistence — write path (#784) —
CREATE (a)-[:KNOWS {since: 2020}]->(b)now persists edge properties instead of erroring with "CREATE edge properties are not yet supported".GraphWriter::set_edge_properties(edge_uuid, rel_type, props)mirrors the node-property path but is keyed byedge_uuidand written to a dedicatededge_properties/REL_TYPE.parquetdirectory (routed by relation name in every mode, so a relation type never collides with a same-named node label underproperties/). The dynamic per-stem schema inference (first-seen column order, first-non-null type, mixed-type→Utf8 coercion, null-first inference) and the read-merge-rewrite append cycle are shared with the node path; the CREATE executor mints the edge UUID and persists the buffered props. On the read side, a fixed single-hop expand LEFT-joins the edge scan withedge_properties/<rel>.parquetonedge_uuid(newEdgePropertyTableprovider, mirroringPropertyTable), soMATCH (a)-[r:KNOWS]->(b) RETURN r.sinceresolves the property to a real column — the edge analogue of inline node-property reads. (The variable-length edge-list varrin[r:KNOWS*1..2]still carries only topology, no properties — that's #755.) Two hardening guards: a CREATE edge that carries properties but no relation type is rejected (its props would route to_untypedwhere the relation-name-keyed read can never reach them); and a property whose name collides with a topology column (created_at,src_id, …) is dropped from the join rather than building a duplicate-qualified field that breaks the MATCH plan (the topology column stays authoritative; symmetric guard added to the node-property join). - Relationship-list access on the variable-length edge var (#743) — list/
relationship functions now work on
r(theList<Struct>edge list from #709), lowered to DataFusion inresolve_builtin: indexingr[i](0-based, negative-from-end, null on out-of-range — viaarray_elementwith aCASE0→1-based fixup), slicingr[i..j]/r[i..]/r[..j](0-based, end-exclusive, null-unbounded → DataFusion's 1-based inclusivearray_slice, with the unbounded end cast toInt64),head(r)/last(r),type(r[i])(reads the element'srel_type), andsize()as a new polymorphiccypher_sizescalar UDF that dispatches on the argument's runtime type (list element count or string char count) rather than mis-mappingsize("str")toarray_length. These also work on plain list literals ([10,20,30][−1],size('hello')). Rust-core parity with the Python reference's list ops. Deferred (correctness-blocked, follow-ups filed):startNode/endNode(#753, need node materialization),relationships(path)/nodes(path)(#754, need named path variables),r[i].<prop>(#755, need edge-property persistence). - CREATE streams its input instead of materializing it (#747) —
GraphCreateExecnow drains its child plan batch-by-batch (via DataFusion's partition-coalescingexecute_stream) and writes each batch as it arrives, instead ofcollect-ing the wholeMATCH/UNWINDfrontier first. A large mixedMATCH … CREATEno longer materializes the full frontier in memory. The singleGraphWriteris opened once and flushed once, so per-row reference/mint semantics and the write summary are unchanged. (VarLenExpand/OptionalMatch/Unwindstill collect — their kernels genuinely need the full input.) - Inline node-property predicates filter in MATCH (#748) —
MATCH (a:Person {name:'Alice'})now matches only Persons named Alice (previously the inline map was silently dropped and every Person matched). The binder lowers the property map to AND-ed equality predicates (a.k = v AND …, keys sorted for a stable plan) and emits aGraphOp::Filterafter the node scan — the same shape aWHEREclause produces. Inline relationship properties (-[r:KNOWS {since:2020}]->) remain deferred (#750) until edge properties are persisted. - Mixed
MATCH … CREATEruns the write per matched row (#703) —MATCH (a:Person) CREATE (a)-[:KNOWS]->(b:Person)now creates one newb+ one edge per matcheda, referencing the matched node's identity rather than minting a duplicate (previously the MATCH was silently discarded and the CREATE ran once).GraphCreateNode/GraphCreateExecare now input-driven: the executor iterates input rows, reads MATCH-bound vars'node_uuid/node_idfrom the row (viaGraphWriter::register_existing_node) and mints fresh nodes/edges per row; the binder marks MATCH-bound CREATE vars as references (CreateNodeSpec::is_reference) and the lowerer folds the preceding pipeline in as the create node's input. A standaloneCREATEis driven by the implicit single unit row (creates once, unchanged); an empty MATCH creates nothing. RETURN-after-CREATE, edge properties, and input streaming (vscollect, #747) remain deferred. - Variable-length edge variable binds to the relationship list (#709) — the edge
var
rin(a)-[r:KNOWS*1..3]->(b)now binds to the openCypher list of relationships along each path, surfaced as a forward-stable ArrowList<Struct<{edge_uuid, src_uuid, dst_uuid, rel_type}>>column (public UUID identity only — no surrogate ids leak).VarLenExpandExec's BFS now tracks the ordered per-path edge sequence (not just a dedup set) and emits it as the trailing list column; the lowerer registers the edge var andlength(r)lowers to DataFusionarray_length(soMATCH (a)-[r:KNOWS*1..2]->(b) RETURN length(r)returns the hop count per path).size()stays deferred (string vs list polymorphism). Richer access (indexingr[i],type(r[i]),relationships(path), edge properties) is a follow-up (#743) that needs no column-type change. - Reject unimplemented writes + invalid CREATE patterns (#724) — two openCypher
semantics that previously "passed" only because
executewas a stub now fail loudly. A CREATE relationship with a type disjunction (CREATE (a)-[:KNOWS|LIKES]->(b)) is rejected as aParseError(a created relationship must name exactly one type; the disjunction is only valid in MATCH, which is unchanged). Unimplemented write/side-effecting clauses —DELETE,SET,REMOVE,CALL— now surface an explicit bind error (GfError::Plan) instead of the binder's previous silent no-op (which made… DELETE nreport success while doing nothing). The binder's catch-all arm is now an error rather than a no-op, so any future clause must be wired in deliberately. FullDELETE/DETACH DELETE(andSET/REMOVE) execution semantics are deferred to a follow-up. - Streaming results + cross-session catalog persistence (#725) — new
GraphForge::execute_stream/execute_stream_with_paramsreturn aSendableRecordBatchStream(backed byExecutionSession::execute_plan_stream) with the same UUID-only output shaping + schema metadata asexecute, so the schema is available before the first batch is pulled (theRecordBatchReadercontract the bindings rely on, #587). Writes are rejected (GfError::Validation) — only reads stream. AGraphForgenow owns a long-lived multi-thread Tokio runtime so a stream's background tasks (repartition/coalesce) are not cancelled when the call that built it returns; it shuts down in the background on drop, so dropping an instance inside an async context (e.g. the BDD harness) no longer panics. The write path now flushes the sharedRuntimeCatalogtotopology/runtime_catalog.parquet, so a laterGraphForge::new(path)reloads observed labels/relation-types/ properties across sessions. - Query parameters + e2e gate complete (#584) —
$nameplaceholders are now bound to values:execute_with_params("… WHERE n.age > $min …", {min: 28})substitutes the placeholder via DataFusion'sLogicalPlan::with_param_valuesbefore physical planning.ir_literal_to_scalar(gf-rel) convertsIrLiteral→ScalarValue(one source of truth shared with literal lowering); theParameterplaceholder id now carries the$so DataFusion's named-param binding resolves it. NewExecutionSession::execute_plan_with_params(reads only; writes carry no placeholders). A missing parameter value errors clearly. This was the last open scenario of the #584 end-to-end gate — the full parse → bind → lower → execute → Arrow pipeline is now exercised end-to-end through the public API, closing the #583 Execution-Baseline epic. - Incremental writes (#733) —
GraphWriter::flushnow appends/merges with existing on-disk data instead of overwriting, so separateGraphForge::execute("CREATE …")calls accumulate (e.g. CREATE A,B then CREATE C → 3 Person).openseeds surrogatenode_id/edge_idfrom the on-disk maximum (monotonic across sessions); node/edge files concat their new rows with the existing ones; property files are decoded back to literals and re-inferred so the dynamic schema evolves across flushes (new columns, cross-flush type widening toUtf8, leading-null columns) using the same rules as a single flush. No cross-session dedup (pureCREATEmints fresh UUIDs; MATCH…CREATE upsert is #703). The merge is not atomic (acceptable for this small-graph engine). Unblocks the #584 incremental-CREATE scenario. - End-to-end execution baseline tests (#584) — a
gf-apiintegration suite drives the full pipeline (parse → bind → lower → execute → Arrow) through the publicGraphForge::executeagainst a fixture graph (5 Person, 4 KNOWS, 1 LIKES). Covers simple scans,WHERE,count(), single-/two-hop and variable-length traversal,OPTIONAL MATCH,UNWIND,ORDER BY/LIMIT, and the Arrow result contract (FixedSizeBinary(16)node_uuid, no surrogatenode_id/edge_id, query metadata, uniquequery_id). Parameter injection ($min) and incrementalCREATE(the writer overwrites on flush — #733) remain deferred. Milestone 13 read-path exit gate. count()aggregation in RETURN (#729) —RETURN count(n)/count(*)now work. The binder detects top-level aggregate calls (count/sum/avg/min/max/collect) in a RETURN and emits aGraphOp::Aggregate(non-aggregate items become group-by keys) instead of aProjectof an unknown function call.count(*)andcount(<node/edge var>)count bound rows (the var is always bound in a MATCH), avoiding a reference to the non-existent barevar_Ncolumn;count(DISTINCT expr)maps to a distinct count. Part of the #583 read-path epic.- Exploratory traversal + OPTIONAL projection (#728, #730) — relationship queries
now work end-to-end without an ontology.
TypedEdgeScan/EdgeScan/var-lengthExpandresolve relation-type names from theRuntimeCatalog(exposed viaGraphCatalog::rel_names()), andCREATEwrite-lowering does the same so an exploratory edge is written to_exploratory.parquetwith its realrel_type_name(previously_UNKNOWN, which noMATCH ()-[:REL]->()filter matched). The catalog no longer registers a typededges_<rel>table for a runtime relation unless its Parquet file actually exists on disk, so the read path correctly routes exploratory edges through the_exploratorycatch-all.OPTIONAL MATCHnow registers the optional-side variables it binds, soRETURN m.xover the optional side resolves (was “unbound variable”). Part of the #583 read-path epic; surfaced by the #584 e2e work. - List literals (#714) —
IrExpr::ListLiteralnow lowers instead of erroring, soUNWIND [1, 2, 3] AS xandUNWIND [] AS xwork end-to-end. All-constant lists fold to aScalarValue::Listliteral (the formUnwindExecconsumes); lists with expression elements lower to amake_array(...)call. Also fixed the lowering base: a source-free pipeline (UNWIND […],RETURN 1) now starts from the single relational unit row rather than a zero-row relation, so it produces output (a pipeline with a scan still starts empty — the scan supplies its own rows). Part of the #583 read-path epic. - Public
GraphForge::execute(#719) — the facade is wired into the real parse → bind → lower → execute pipeline and returns an Arrow-backedExecutionResult.GraphForge::new(Some(path))loadsgraphforge.yaml+ontology.yaml(resolving theOntologyMode) and seeds theRuntimeCatalog;new(None)is an in-memory exploratory instance over a temp dir.executeroutesCREATEto the write path and reads toexecute_plan, sharing oneRuntimeCatalogacross queries. Results expose UUID identity columns (node_uuid/edge_uuid) — never the surrogatenode_id/edge_id— and carry schema metadata (graphforge.query_id,ir_version,ontology_mode, andontology_versionwhen present). Addedontology_mode()/runtime_catalog()accessors and empty-query validation. Closes the #583 read-path epic's facade PR;execute_stream+ runtime-catalog persistence are tracked in #725, and unimplemented write-clause validation (DELETE, multi-type CREATE) in #724. - Property-table JOINs (#704) —
RETURN n.namenow reads real property values. A labelled node scan LEFT-joins its property table (properties/<Entity>.parquetin strict/advisory mode,properties/_untyped.parquetin exploratory) onnode_uuid, projecting each property column re-qualified under the node'svar_NsoPropertyAccessresolves to the joined column.PropertyAccessIDs resolve to names via theRuntimeCatalog(the binder'sPropIdspace), surfaced throughGraphCatalog::prop_names(); the catalog now also registers the exploratoryproperties__untypedtable with its on-disk schema. Part of the #583 read-path epic. - Fixed single-hop joins (#718) — a fixed
(a)-[:R]->(b)pattern now lowers to a connected relational join instead of disconnected scans. The binder emits a singleExpand { min_hops: 1, max_hops: Some(1) }(the only op carrying the src/dst node vars) for fixed hops, which the lowerer routes to the existing two-join chain; variable-length hops are unchanged. A destination label ((b:Label)) is now applied as a filter on the already-bound node rather than silently dropped (this also fixes the var-length destination label). Surfaced and fixed a latentOPTIONAL MATCHschema bug: now that the optional child is a real join it re-binds shared variables, so the node excludes all of a shared variable's columns from the inner output (not just the join key), avoiding duplicatevar_<shared>fields. Part of the #583 read-path epic. - General read execution (#717) —
ExecutionSession::execute_planis wired (lower → physical plan → collect), and read scans now bind their real Parquet-backed catalog providers (TopologyNodeTable/TypedEdgeTable) instead of a schema-only source, so aMATCH (n:Label)reads the rows a priorCREATEwrote. (Schema-only lowering is retained for explain/golden paths that have no project directory.) Part of the #583 read-path epic. - New
gf-apicrate (#716) — the publicGraphForgeengine facade now lives in a top-levelgf-apicrate (depends on the full pipeline: gf-cypher/gf-ir/ gf-rel/gf-exec/gf-storage/gf-ontology). It was relocated out ofgf-core, which is the foundation crate every other crate depends on and therefore can't host the facade without a dependency cycle.gf-cliand the language bindings now depend ongf-api; the BDD API/TCK harness moved with it. (First step of the #583 read-path epic; facade methods remain stubs until #717–#719.) UNWINDexecution (#582) —UnwindExecevaluates the list expression per input row and explodes it into one output row per element (input columns + the element column), dropping rows whose list is null or empty (openCypher semantics). Routed from theUnwindNodeExtension by the query planner. The fullUNWIND … RETURNround-trip andUNWIND [literal]lowering await #583 / list-literal lowering.UNWINDschema plumbing (#582, foundation) —UnwindNodenow declares an output schema that extends the input with the unwound element column (named after the alias, unqualified, nullable, so a bareRETURN xresolves). The lowerer resolves the element type from the list expression.OPTIONAL MATCHexecution (#581) —OptionalMatchExecleft-joins the outer input against the optional sub-plan on the shared-variable join keys, preserving every outer row and setting the optional-side columns to null when there is no match (openCypher null-shaping, viatakewith null indices). Handles fan-out (one outer → many inner), multi-key tuples, empty inner, and the unconditional (no-key) case; routed from theOptionalMatchNodeExtension by the query planner.OPTIONAL MATCHjoin-key plumbing (#581, foundation) —OptionalMatchNodenow carries the shared-variablejoin_keysand declares an output schema that extends the outer columns with the optional sub-plan's columns made nullable (openCypher null-shaping). The lowerer computes the shared variables (outer ∩ optional) as join keys. Physical execution lands in a follow-up PR; non-empty join keys for realMATCH (a)-[:R]->(b)queries await fixed single-hop join lowering (#583).- Variable-length
Expandexecution (#580) —VarLenExpandExecperforms an iterative BFS over the edge table for patterns like(a)-[:KNOWS*1..3]->(b), with openCypher relationship-isomorphism (no edge reused within a path, so unbounded*terminates on cycles). HandlesOut/In/Undirected, bounded and unbounded ranges, and is routed from theVarLenExpandNodeExtension by the query planner. The edge variabler(relationship list) is not yet bound — deferred to a follow-up issue. - Variable-length
Expandlowering (#580, foundation) —VarLenExpandNodenow carries the source/destination vars, direction, relation type, and baked project dir/mode, and declares an output schema that extends the input with the destination node's columns (so a downstreamRETURN b.node_idresolves). Physical BFS execution and the edge-variable (relationship-list) binding land in follow-up PRs. - Catalog-free edge/node readers in
gf-storage(#580) —read_edges(dir, rel_name, mode)andread_nodes(dir)read the topology Parquet files directly, mirroring theGraphWriterlayout (typed vs_exploratory). Lets physical execution nodes (e.g. the upcomingVarLenExpandExec) read edges without aGraphCatalog, which the DataFusionTaskContextdoes not expose. - LALRPOP parser replacing Python LALR(1) executor
- DataFusion-backed execution engine with Arrow result streams
execute_arrow()returningpyarrow.Tablevia Arrow C Data Interfaceexecute_stream()returningpyarrow.RecordBatchReader- Parquet storage provider (replaces SQLite in the Rust core)
- PyO3 + maturin Python binding (
gf-bindings-py) - napi-rs Node binding with Arrow IPC (
gf-bindings-node) - See ADR 0002 and roadmap
[0.4.0] - 2026-05-07¶
Added¶
db.gdsgraph algorithm surface (#484, PRs #514–#516) — 8 compiled algorithms dispatched to igraph (preferred) or NetworkX:pagerank,betweenness_centrality,closeness_centrality,degree_centrality,louvain,connected_components,clustering_coefficient,triangle_count. Stream mode returnsdict[node_id, scalar]; write mode persists results viaset_node_properties(). Optionalnode_label,rel_type,directedfilters.db.searchhybrid retrieval surface (#476, PRs #517 #533–#539) — FTS5 text search (db.search.text()), vector cosine similarity (db.search.vector()), and RRF-fused hybrid (db.search()/db.search.hybrid()). All methods returnlist[SearchHit]with.ref,.score,.sources,.text_score,.vector_score. Multi-space vector storage:set_node_vector(id, vec, space="model-name"). Bring-your-own-vectors;spacecan represent text embeddings, geo coordinates, temporal features, or any numeric feature array.SearchHitresult type — frozen dataclass with score provenance; every.ref.idis directly addressable indb.execute().set_node_properties()bulk write-back (#480, PR #515) — batch-update node properties from adict[node_id, dict[prop, value]].directedandnode_id_propertyexport parameters (#478 #479, PR #514) —to_networkx(directed=True),to_igraph(node_id_property="name"), etc.graphforge.recipes.neighbourhood()(#492, PR #518) — n-hop subgraph expansion returninglist[dict]for LLM context building.- Benchmark suite (#388, PR #519) —
make benchmarkruns real-dataset benchmarks through M-tier (XS/S/M/L). - Integration tests for three-surface interoperability (PR #541) — GDS write-back readable via Cypher, search results addressable via
id(), cross-surface PageRank threshold queries. - Use-case documentation (PR #542, closes #523 #524 #525 #526) —
db.gdsexamples innetwork-analysis.md;db.search+recipesinllm-workflows.md; deduplication pattern inknowledge-graph-construction.md; hybrid tool selection inagent-grounding.md.
[0.3.10] - 2026-05-06¶
Added¶
- Schema introspection API (#469, #499) —
labels(),relationship_types(),node_count(label=None),relationship_count(type=None)onGraphForge; efficient set-union over storage backend - JSON export/import (#470, #500) —
gf.to_json(path, metadata, indent)serialises the full graph to a portable JSON file;GraphForge.from_json(path)reconstructs it; Hypothesis roundtrip property test merge_node()safe upsert (#471, #501) —merge_node(labels, match_on, on_create, on_match)with regex label validation, index-safe on_match updates, and create/match semantics matching MERGE clause behaviouradd_graph_documents()LangChain-compatible ingestion (#472, #502) — duck-typed batch ingestion accepting any object with.nodes/.relationshipsattributes; idempotent edges; rel type pre-check; nolangchain-communityimport required- Parse/plan LRU cache (#464, #504) —
GraphForge(cache_size=N)caches parsed+planned query pipelines;clear_cache()andcache_info()for introspection; thread-safe; default size 128
Fixed¶
- ORDER BY on aliased RETURN DISTINCT property no longer raises UndefinedVariable (#481, #494) — planner now records
(variable, property)paths for aliasedPropertyAccessreturn items, soRETURN DISTINCT a.year AS year ORDER BY a.yearresolves correctly - Variable reuse across WITH boundary no longer raises KeyError (#482, #495) — optimizer's redundant-traversal-elimination pass now treats
Aggregateas a scope boundary (same asWith/Union/Subquery), preventing fresh post-WITH variable bindings from being silently dropped - EXISTS {} subquery crash on anonymous inner node with outer-scope property (#474, #496) —
_plan_matchnow passesvar_name(always set, from pattern variable or generated) instead ofnode_pattern.variable(can be None) to_properties_to_predicate shortestPath()/allShortestPaths()parse correctly and raise NotImplementedError (#468, #497) — previously caused unhelpfulSyntaxError: Unexpected token; now parsed intoShortestPathExpressionAST nodes; planner raisesNotImplementedErrorwith BFS workaround:MATCH path = (a)-[*1..N]-(b) RETURN length(path) ORDER BY length(path) LIMIT 1
[0.3.9] - 2026-05-04¶
Added¶
- LALR(1) parser migration (#365, #430) — grammar refactored to LALR(1); linear-time parsing replaces Earley; unknown function names now detected at parse time; 586 new parser tests; OOM-safe local coverage via sharded
make coverage - Property equality index for O(1) node lookup (#390, #427) — in-memory
_prop_index(prop → value → {node_id}) onGraph;get_nodes_by_propertyandget_nodes_by_label_and_propertymethods; executor_extract_equality_hintsreplaces full scan with index lookup when aScanNodesis followed by a simple equalityFilter; index maintained onSET/REMOVE/DELETE - Bulk ingestion API (#389, #444) —
create_node_bulk,create_relationship_bulk(skip per-call Pydantic validation);bulk_ingest()context manager defers statistics rebuilds until exit via_flush_deferred_stats(); CSV loader uses bulk path; significant throughput improvement for dataset loading - Real-dataset QA and performance profiling suite (#417, #419) — 5-tier perf suite (karate 34n, ego-facebook 4k, amazon 334k, livejournal 4M, orkut 117M edges);
scripts/perf_report.pyrenders baseline/delta Markdown tables;make test-perf,make test-perf-xs,make test-perf-large,make test-perf-slowtargets; L/XL tier benchmarks added;perf_slowmarker applied consistently (#440, #441) elementId()scalar function (#211, #447) — GQL-specelementId(node|rel)returns the element id asCypherString;id()continues to returnCypherInt; NULL-safe; arity-checked- Scale limits documentation —
docs/reference/scale-limits.mdexplains LIMIT-respecting (~20M edges) vs full-scan (~1M edges) performance ceilings; README graph-size claim updated to reflect edge-count as binding constraint (#443) make pre-push-fast(#429, #437) — fast-fail pre-push shortcut (format, lint, type, security, docstrings; no coverage); inline coverage thresholds (85% total, 90% patch)
Fixed¶
- Fixed-length hop ignores rel predicate (#431, #439) — fixed-length traversal now evaluates
rel_pattern.predicateinline; parser predicate whitelist replaced with open accept - var-length relationship predicates (#381, #425) —
_var_length_reachablenow evaluates inline relationship predicates (e.g.[*1..3 {weight: 1}]) at each hop instead of ignoring them - Cross-hop edge uniqueness in multi-segment patterns (#382, #425) —
initial_used_edge_idsseeds the visited-edge set from prior hops, preventing the same edge being traversed twice across pattern segments - SQLite delete persistence (#409, #421) — deleted nodes and edges now removed from SQLite backend; previously they resurrected on reload
- SQLite bulk I/O (#392, #421) —
save_nodes_bulk/save_edges_bulkuseexecutemany;_save_graph_to_backendwrapped in rollback; resolves >20s hang on M-tier reload sys.setrecursionlimitthread safety (#433, #445) — limit set once inCypherParser.__init__instead of per-request get/set/restore, eliminating races in concurrent parser use- Transaction rollback restores ID counters (#412) —
rollback()restores_next_node_idand_next_edge_id, preventing ID gaps after aborted transactions - Closed-instance guard (#411) —
execute()and mutation methods raiseRuntimeErrorafterclose() DatasetInforejectsftp://URLs (#414)- TCK metrics CI gate (#410) —
tck_metrics.pyexits non-zero on TCK failures
Performance¶
- UNWIND + WITH LIMIT short-circuit (#393, #443) —
_pipeline_limitreverse-scan treats pass-throughWith(limit_count=N, no sort, no filter)as transparent;_execute_unwindrespectsrow_limitand exits early;UNWIND range(1M, 2M) AS i WITH i LIMIT 3000 RETURN sum(i)drops from ~4.5s to <10ms - LIMIT short-circuit for scan and expand operators (#422, #423) —
_pipeline_limitlook-ahead stops scan/expand at row budget; write operators and Filter block short-circuit to preserve correctness - SQLite PRAGMA tuning (#392, #446) —
synchronous=NORMAL(eliminatesfdatasyncper commit), 64 MB page cache (cache_size=-65536),temp_store=MEMORY; safe with WAL mode for analytical workloads - Lazy statistics timestamp —
_stats_last_updatedstamped at read time inget_statistics()rather than on every add;_defer_statsflag eliminates per-edge stat overhead during bulk ingest
CI / Tooling¶
- Test suite zero-base renovation (#428, #438) — fixture strategy audit, duplication removal, marker consistency, parametrization improvements across unit and integration suites
- GHA actions updated —
actions/upload-artifactv7,actions/download-artifactv8,codecov/codecov-actionv6,actions/upload-pages-artifactv5,astral-sh/setup-uvv8.1.0
[0.3.8] - 2026-05-02¶
Fixed - TCK Feature Completeness (3885/3885 passing)¶
- Nanosecond precision in temporal types —
CypherDateTimeandCypherTimenow store a_ns_residue(0–999) for sub-microsecond precision;_componentstuple extended to 8 elements with nanoseconds; construction, accessors, arithmetic, comparison, and serialization all updated (#224) - Statement clock caching —
now(),datetime(),date(),time()(without arguments) now return consistent values within a single query by caching the start time viatime.time_ns()(#224) - Extreme year support — dates/datetimes outside Python's 1–9999 range now handled via
_WideDate/_WideDateTimeclasses; durations too large fortimedeltause_BigDuration; Julian Day Number arithmetic for cross-extreme-year duration computation (#224) - IANA timezone name preservation —
CypherDateTimestores the IANA zone name (e.g.Europe/Stockholm) separately;.timezoneaccessor returns the zone name rather than the numeric offset;utcoffset()now receives a concrete datetime for DST-aware lookup (#224) - Aggregate detection in QuantifierExpression —
ALL(ok IN collect(...) WHERE ok)patterns now correctly detected as aggregate-containing; planner emitsAggregateinstead ofProject(#224) - OPTIONAL MATCH multi-WHERE placement — multiple
WHEREclauses no longer overwrite each other; multi-hop OPTIONAL MATCH WHERE is emitted after all hops so all variables are bound (#224) coalesce()type inference — return type is inferred from argument types soMATCH (x)afterWITH coalesce(a, b) AS xaccepts node-typed variables (#224)- Non-deterministic aggregate arguments —
count(rand())and similar expressions now raiseAmbiguousAggregationExpressionper the openCypher spec (#224) - Temporal accessor for extreme-year datetimes —
year,month,day,hour,minute,secondaccessors work on_WideDateTime-backedCypherDateTimevalues (#224) - All 23 xfail markers removed — previously marked as expected failures; all were fixable (#224)
Added¶
- 75-test unit coverage for extreme temporal classes —
tests/unit/types/test_extreme_temporal_values.pycovers_WideDate,_WideDateTime,_BigDuration, extreme-year construction/accessors, IANA timezone preservation, and large duration handling
Fixed¶
- WITH clause WHERE filtering and variable scoping (#362) —
WITH n WHERE n.age > 30now correctly filters post-projection; chained WITH propagates bindings properly - Triadic OPTIONAL MATCH semantics (#302 partial) —
OPTIONAL MATCH (a)-[r]->(b)with pre-boundbnow correctly filters to only edges reaching that specific destination (LEFT JOIN semantics) - Relationship type disjunction without colon prefix (#363) —
[:KNOWS|FOLLOWS](pipe without colon on subsequent types) now parses correctly - Multi-CREATE variable scoping (#364 partial) — variables created in one CREATE clause are now visible in subsequent CREATE clauses within the same query
Performance¶
- O(n²) → O(1) graph statistics updates (#366) —
_update_statistics_after_add_edgeno longer scans all edges on every insert; replaced with incremental_unique_sources_by_typetracking. Eliminated 2,000+ Pydanticmodel_copy()calls per 1,000-node graph by switching to mutable counters with lazyGraphStatisticsconstruction - Parser fast path for long CREATE sequences (#366) — queries with many consecutive CREATE clauses (e.g. the TCK movie graph with 971 CREATEs) are pre-split into batches of 5 before Earley parsing, reducing wall-clock time from 27 minutes to ~26 seconds. See docs/development/parser-performance-analysis.md for full analysis and the LALR(1) migration plan (issue #365)
CI¶
- TCK sharding across 4 parallel runners — TCK tests now split using
pytest-splitwith duration-based balancing (~4x faster CI) - Coverage sharding — coverage collection also split across 4 shards and merged in report job
- Python 3.14 experimental matrix — added as non-blocking
continue-on-errorentries - TCK performance reporting — new
scripts/tck_perf_report.pyruns after each TCK run, reporting tests over 5s and tracking use-case tests (movie graph, school graph) for performance regression detection
TCK Coverage¶
- 3,885 / 3,885 passing (100%) — first release with zero TCK failures and zero expected failures
[0.3.7] - 2026-04-07¶
Added - Functions¶
- Math functions: e, pi, exp, log, log10, trig, degrees, radians, timestamp (#326) — full set of mathematical and trigonometric functions per openCypher spec
startNode()andendNode()graph functions (#318) — extract the start/end node from a relationshipproperties()andkeys()graph functions (#316) — inspect node/relationship property maps and key sets- List concatenation
+operator (#315) —[1,2] + [3,4]→[1,2,3,4] - Map subscript access
m['key'](#314) — dynamic map property lookup - Exponent float notation and leading-dot floats (#311) —
1.5e3,.5now parsed correctly
Fixed - Sorting & Aggregation¶
- ORDER BY mixed-type sort ordering (#320) — values of different Cypher types now sort in the correct openCypher type hierarchy order
- Aggregation in composite expressions and ORDER BY (#329) —
count(n) + 1,ORDER BY count(n)and similar patterns now evaluate correctly - Validate ORDER BY aggregation expressions at compile time (#331) — non-aggregated variables in aggregating ORDER BY now raise a clear error
- Detect non-projected ORDER BY aggregates at compile time (#332) — aggregates in ORDER BY that aren't in the projection are caught early
Fixed - Expression Evaluation¶
- Operator precedence, null propagation, and type coercion (#321) —
NOT,AND,ORprecedence; null propagation in arithmetic; integer/float coercion edge cases - Type conversion TypeError/null per openCypher spec (#327) —
toInteger(),toFloat(),toString()now returnnullon unconvertible input per spec - List equality with null uses three-valued logic (#330) —
[1, null] = [1, null]now correctly returnsnull - NaN handling in comparisons, arithmetic, and sort (#323) — NaN propagation and sort position per openCypher spec
- SKIP/LIMIT accept expression values (#319) —
SKIP toInteger(n.offset)and similar dynamic skip/limit values now work
Fixed - Parser & Clause Gaps¶
- WITH keyword position gaps (#310) —
WITHis now accepted afterSET,REMOVE,DELETE, and in chained multi-part queries - ASCENDING/DESCENDING keywords (#312) —
ORDER BY n.name ASCENDING/DESCENDINGnow parse correctly; removed stale xfail markers (#325) NodeRef/EdgeRefcomparison methods (#313) — nodes and relationships are now sortable and comparable by identity
Fixed - Graph Semantics¶
- Relationship uniqueness in MATCH patterns (#333) — runtime enforcement that the same physical edge cannot fill two relationship variables in one MATCH pattern; extended to anonymous multi-hop rels and path-variable (
ExpandMultiHop) patterns - Temporal select-into constructors (#334) — cross-type temporal projection (
date({date: other}),localdatetime({datetime: dt}), etc.) with component overrides; null propagation, hour-overflow normalization, and timezone conversion all correct
TCK Coverage¶
- 3,235 / 3,885 passing (83.3%) (up from ~2,507 at v0.3.6) — +728 net new passing scenarios
- Theme: ORDER BY correctness, aggregation, operator precedence, graph semantics, temporal completeness
[0.3.6] - 2026-02-28¶
Fixed - TCK Harness Correctness¶
- Integer property comparison (#252) — TCK harness was comparing integer properties as strings, causing ~423 false failures; now coerced correctly
- Map literal value coercion (#261) —
_parse_map_literalin conftest now coerces numeric string values, fixing ~30 false failures
Fixed - Temporal Correctness¶
date({})/datetime({})/time({})map constructors (#253, #268) — component validation, UTC defaults, and function-as-argument parsingUnexpectedEOFon duration/datetime literals (#254) — parser now handles all ISO 8601 duration and datetime forms- Timezone offset
+HH:MMparsing (#255) —+HH:MMand-HH:MMoffsets now parsed correctly - Date map constructor component validation (#256) — out-of-range components raise
ValueError datetime(string)defaults to UTC (#276) — string datetimes without timezone now default to UTC per spec- Map constructor omitted components default to 1 (#278) —
date({year: 2015})→2015-01-01, not error TRUNCATEunits: week, weekYear, quarter, decade, century, millennium (#269) — all truncation units now return correct values- Week-based and select-form datetime constructors (#270) —
datetime({week: 30, ...})and similar forms return correct values - Duration serialization, comparison, multiplication, and fractional arithmetic (#271) — full duration correctness including
duration.between() datetime(aDatetime)type coercion (#274) — coercing datetime subtypes no longer conflicts with:Elabel parser token- Compact ISO 8601 date/datetime string formats (#273) — ordinal (
YYYY-DDD,YYYYDDD), week date (YYYY-Www[-D]), year-month (YYYY-MM,YYYYMM), and year-only (YYYY) all parsed correctly - IANA timezone names (#272) —
[Region/City]suffix in datetime strings is parsed; IANA zone is authoritative over explicit offset; historical second-precision offsets (e.g.+00:53:28) handled correctly
TCK Coverage¶
- 2,507 passing (up from ~1,694 at v0.3.5) — +813 net new passing scenarios
- Theme: all temporal TCK scenarios now pass; harness accuracy fully corrected
[0.3.5] - 2026-02-19¶
Added - Math Functions (#195, #196, #197)¶
- sqrt(x) - Square root function
- Returns
CypherFloat; negative input returnsnull; null propagation - Example:
RETURN sqrt(4) AS r→2.0 - rand() - Random float in [0.0, 1.0)
- Takes no arguments; returns a new
CypherFloateach call - Example:
RETURN rand() * 100 AS r - pow(base, exponent) - Exponentiation function
- Consistent with
^operator:pow(2, 3) = 2^3 = 8 - Null propagation for both arguments
- Example:
RETURN pow(2, 10) AS r→1024
Fixed - Aggregation Function Tests (#201, #202, #203, #204)¶
- Fixed ~24 syntax bugs (
RETURNfunc(→RETURN func() in aggregation test files percentileDisc(),percentileCont(),stDev(),stDevP()were already implemented; tests now pass- Updated docs to reflect all four as COMPLETE
Added - TCK Step Definitions (#237)¶
- Added 5 missing pytest-bdd step definitions unblocking ~129 previously failing TCK scenarios:
executing control query:— alias forexecuting query:the result should be (ignoring element order for lists):— row comparison with sorted list valuesthe result should be, in order (ignoring element order for lists):— ordered variantparameters are:— parses datatable, substitutes$paramreferences in queriesthere exists a procedure— markedxfail(CALL procedures tracked for v0.3.6 in #190)- Extended
_parse_value()to handle list literals like['Foo', 'Bar'] - Extended
_row_to_comparable()to handleCypherListvalues
Implementation Status Updates¶
- Math functions:
sqrt,rand,pownow COMPLETE (was NOT_IMPLEMENTED) - Aggregation functions:
percentileDisc,percentileCont,stDev,stDevPnow COMPLETE - TCK pass rate: +129 passing scenarios, +50 xfailed (procedure CALL)
[0.3.4] - 2026-02-18¶
Added - Power Arithmetic Operator (#213)¶
- ^ (power/exponentiation) operator - Full exponentiation support
- Right-associative:
2^3^2 = 2^(3^2) = 512 - Highest arithmetic precedence (above
*,/) - Type handling:
int^intreturns int if whole, else float - Negative exponents:
2^-1 = 0.5 - Fractional exponents:
4^0.5 = 2.0 - NULL propagation for both operands
- 39 integration tests covering associativity, precedence, types, and edge cases
Added - XOR Logical Operator (#212)¶
- XOR operator - Exclusive OR with proper ternary logic
true XOR false = true,true XOR true = false- Precedence: NOT > AND > XOR > OR
- NULL propagation: any NULL operand yields NULL
- Left-associative chaining:
a XOR b XOR c - Case-insensitive keyword (
XOR,xor,Xor) - 22 integration tests + 4 unit tests
Documentation - Feature Completion Updates¶
Already-Implemented Features Now Documented (#193, #214, #215)¶
- length() function - Documented as COMPLETE for paths
length(path)returns relationship count in path- For strings/lists, use
size()function instead - File:
src/graphforge/executor/evaluator.py:2604 - List slicing [start..end] - Documented as COMPLETE
- Example:
RETURN [1, 2, 3, 4, 5][1..3]→[2, 3] - Supports open-ended slicing:
[..],[1..],[..3] - File:
src/graphforge/executor/evaluator.py:1458 - Negative list indexing - Documented as COMPLETE
- Example:
RETURN [1, 2, 3][-1]→3(last element) - Supports both index and slice operations:
[1, 2, 3][-2..-1] - File:
src/graphforge/executor/evaluator.py:1458
Added - String Function Aliases¶
toUpper()/toLower() camelCase variants (#194)¶
- toUpper(string) - CamelCase alias for UPPER(), converts string to uppercase
- Example:
RETURN toUpper('hello') AS result->'HELLO' - toLower(string) - CamelCase alias for LOWER(), converts string to lowercase
- Example:
RETURN toLower('HELLO') AS result->'hello' - Both camelCase and legacy forms (UPPER/LOWER) supported
- Case-insensitive keywords: toUpper, TOUPPER, toLower, TOLOWER all work
- 18 new integration tests (7 toUpper + 7 toLower + 4 interop)
Implementation Status Updates¶
- Functions: 58/72 (81%, +3%) - String functions now 13/13 (100%)
- Operators: 32/34 (94%, +6%) - All list operators now COMPLETE
- Documentation reflects actual implementation status
[0.3.3] - 2026-02-18¶
Added - Pattern & CALL Features (Feature Completion: 88%)¶
Pattern Predicates - COMPLETE (#216)¶
- Inline WHERE in patterns - Filter relationships during pattern matching
- Example:
MATCH ()-[r:KNOWS WHERE r.since > 2020]->(f) RETURN f.name - Property comparisons, complex expressions (AND, OR, NOT)
- Function calls in predicates, NULL handling
- Variable-length paths with predicates
- Works with undirected relationships
- 16 comprehensive integration tests
CALL Subqueries - COMPLETE (#189)¶
- General CALL { } subquery support - Execute nested queries with full openCypher syntax
- Example:
CALL { MATCH (p:Person) RETURN p.name AS name } RETURN name - Correlated scoping: Access outer variables from parent query
- UNION support: CALL can contain UNION and UNION ALL queries
- Unit subqueries: Preserve 1:1 cardinality for side-effect queries
- Nested CALL: Support for CALL within CALL
- Multiple CALL: Cartesian product of multiple subqueries
- 13 comprehensive integration tests
Pattern Comprehension - COMPLETE (#217)¶
- Pattern-based list operations - Transform graph patterns into lists
- Example:
[(p:Person) WHERE p.age > 18 | p.name] - Simple node patterns:
[(p:Person) | p.name] - Relationship patterns:
[(p)-[:KNOWS]->(f) | f.name] - Optional WHERE filters for pattern results
- Complex expressions in map clause
- Correlated variables from outer scope
- Nested in RETURN, WHERE, WITH clauses
- NULL property handling
- 15 comprehensive integration tests
Implementation¶
AST Nodes: - PatternComprehension (expression.py) - pattern, filter_expr, map_expr - CallClause (clause.py) - nested query support
Grammar (cypher.lark): - pattern_comprehension rule with optional WHERE - call_clause and call_query rules - Pattern WHERE predicates already existed, now fully documented
Parser (parser.py): - pattern_comprehension transformer - call_clause and call_query transformers - Enhanced list_literal to handle pattern comprehension
Planner (planner.py): - Call operator with UNION support - TypeContext preservation for nested queries
Executor (executor.py, evaluator.py): - _execute_call with correlated scoping and unit subquery detection - PatternComprehension evaluation via temporary MATCH execution - Pattern predicate evaluation in relationship matching (already existed)
Testing¶
- 44 new integration tests (16 + 13 + 15)
- All tests passing with 100% coverage on new code
- TCK pass rate: ~40% (1,530 / 3,837 tests)
Documentation¶
- Updated patterns.md: 100% completion (8/8 pattern types)
- Updated clauses.md: CALL marked as COMPLETE
- All features documented with examples and implementation references
- Feature completion: 85% → 88% (+3 features)
Performance¶
- No performance regressions
- Pattern comprehension uses existing pattern matching infrastructure
- CALL subqueries properly isolated with TypeContext.copy()
[0.3.2] - 2026-02-17¶
Added - List Operations (100% Complete)¶
List Operation Functions (#198, #199, #200)¶
- filter() - Filter lists by predicate
- Example:
RETURN filter(x IN [1,2,3,4,5] WHERE x > 3) AS result→ [4, 5] - NULL list returns NULL, NULL items excluded
- Variable binding with proper scoping
- extract() - Map transformations over lists
- Example:
RETURN extract(x IN [1,2,3] | x * 2) AS result→ [2, 4, 6] - NULL list returns NULL, NULL items processed normally
- Supports complex expressions and property access
- reduce() - Fold/reduce with accumulator
- Example:
RETURN reduce(sum = 0, x IN [1,2,3,4] | sum + x) AS result→ 10 - Dual variable binding (accumulator + loop variable)
- Empty list returns initial value
Implementation¶
- Added three AST nodes: FilterExpression, ExtractExpression, ReduceExpression
- Grammar rules for all three expressions in cypher.lark
- Parser transformers with Pydantic validation
- Evaluator handlers with proper NULL handling and variable scoping
- Treated as special syntax (like list comprehensions) due to variable binding
Testing¶
- 42 comprehensive integration tests
- Full coverage of edge cases (empty lists, NULL handling, variable shadowing)
- Composition and nesting tests
- All tests passing with 100% coverage on new code
Documentation¶
- Updated implementation status: 58/72 functions complete (81%, +4%)
- List Functions: 8/8 (100%)
[0.3.1] - 2026-02-17¶
Added - Predicate Functions (100% Complete)¶
Quantifier Functions (#205, #206, #207, #208)¶
- all() - Tests if all elements in a list satisfy a predicate
- Example:
RETURN all(x IN [2, 4, 6] WHERE x % 2 = 0) AS result→ true - Implements three-valued NULL logic per OpenCypher spec
- Returns false if any element fails, true if all pass, NULL if indeterminate
- any() - Tests if any element in a list satisfies a predicate
- Example:
RETURN any(x IN [1, 2, 3] WHERE x > 2) AS result→ true - Returns true if any element passes, NULL if no true but some NULL
- none() - Tests if no elements in a list satisfy a predicate
- Example:
RETURN none(x IN [1, 3, 5] WHERE x % 2 = 0) AS result→ true - Inverse of any() with proper NULL handling
- single() - Tests if exactly one element satisfies a predicate
- Example:
RETURN single(x IN [1, 2, 3] WHERE x = 2) AS result→ true - Returns true only if exactly one match and no NULLs
- Returns NULL if uniqueness cannot be determined (NULLs present)
Property and Collection Testing (#209, #210)¶
- exists() - Tests if a property exists or expression is not NULL
- Example:
MATCH (p:Person) WHERE exists(p.age) RETURN p.name - Evaluates before NULL propagation for accurate property checking
- Returns false for missing properties (NULL values are not stored)
- isEmpty() - Tests if a list, string, or map is empty
- Example:
RETURN isEmpty([]) AS result→ true - Works with lists, strings, and maps
- Returns NULL for NULL input (three-valued logic)
Testing¶
- 57 comprehensive integration tests (34 quantifier + 23 exists/isEmpty)
- Parametrized tests for better maintainability
- Full coverage of NULL handling edge cases
- All tests passing with 100% coverage on new code
Documentation¶
- Updated implementation status: 55/72 functions complete (76%, +2%)
- Predicate Functions: 6/6 (100%)
- Detailed function signatures and examples in docs/reference/implementation-status/functions.md
Performance¶
- No performance regressions
- Efficient NULL propagation handling
- Optimized list iteration for quantifiers
0.3.0 - 2026-02-09¶
Added - Major Cypher Features¶
OPTIONAL MATCH (Left Outer Joins)¶
- Left outer join semantics with NULL preservation (#104)
- Example:
MATCH (p:Person) OPTIONAL MATCH (p)-[:KNOWS]->(f) RETURN p.name, f.name - 6 integration tests, comprehensive NULL handling
UNION and UNION ALL¶
- Combine query results with automatic deduplication (UNION) or preserve duplicates (UNION ALL) (#104)
- Example:
MATCH (p:Person) RETURN p.name UNION MATCH (c:Company) RETURN c.name - Tree-based operator structure for nested queries
- 9 integration tests
List Comprehensions¶
- Transform and filter lists declaratively (#104)
- Example:
RETURN [x IN [1,2,3,4,5] WHERE x > 3 | x * 2] - Supports WHERE filtering, map expressions, and nested comprehensions
- 12 integration tests
EXISTS and COUNT Subquery Expressions¶
- Correlated subqueries for existence checks and counting (#104)
- Example:
MATCH (p:Person) WHERE EXISTS { MATCH (p)-[:KNOWS]->() } RETURN p.name - Full operator pipeline execution for nested queries
- 13 integration tests
Variable-Length Path Patterns¶
- Recursive traversal with cycle detection (#104)
- Example:
MATCH (a)-[:KNOWS*1..3]->(b) RETURN a.name, b.name - Depth-first search with per-path cycle prevention
- Configurable min/max hop counts
- 2 integration tests
IS NULL / IS NOT NULL Operators¶
- Boolean NULL checking (distinct from = NULL ternary logic) (#104)
- Example:
MATCH (p:Person) WHERE p.age IS NULL RETURN p.name - Always returns boolean (never NULL)
Added - Dataset Integration¶
NetworkRepository Datasets (#110, #113)¶
- 10 new graph datasets from NetworkRepository
- GraphML loader for complex graph formats
- Comprehensive metadata with node/edge counts, categories, licenses
- Examples: Polblogs, Polbooks, Karate club, Dolphin social network, C. elegans, Les Miserables
- All datasets validated with comprehensive test suite
Spatial and Temporal Types¶
- Point type for geographic coordinates
- Distance function for spatial queries
- Date, DateTime, Time, Duration types
- Full openCypher compatibility for type system
Dataset Validation Infrastructure (#112, #113)¶
- Comprehensive validation script (scripts/validate_datasets.py)
- Validates downloads, caching, node/edge counts, query functionality
- Performance benchmarking for all datasets
- 100% validation success rate (13/13 datasets tested)
Fixed¶
- make coverage-diff command now works correctly (#111)
- Dataset validation script handles missing query results (#112)
- NetworkRepository dataset URLs and metadata corrections (#113)
- Exception handling improvements in validation infrastructure (#113)
- Resource cleanup with proper try/finally blocks
Architecture Improvements¶
- Tree-based operator structure for nested queries
- Dual serialization: SQLite+MessagePack (data) + Pydantic+JSON (metadata)
- Enhanced expression evaluator with recursive execution
- Operator pipeline supports nested query planning
Testing¶
- 767 integration tests passing (42+ new tests for v0.3.0)
- 91.96% code coverage maintained
- Comprehensive dataset validation suite
- Property-based testing with Hypothesis
TCK Compatibility¶
- Progress from 16.6% to ~29% openCypher TCK coverage
- 312+ additional scenarios passing
- Foundation for continued TCK improvements toward 39% target
Documentation¶
- Complete v0.3.0 feature documentation (CHANGELOG_v0.3.0.md)
- Dataset integration guides (docs/datasets/)
- Performance benchmarks and optimization tips
- Updated openCypher compatibility matrix
Breaking Changes¶
None. All changes maintain backward compatibility with v0.2.0 and v0.2.1.
Known Limitations¶
- Variable-length paths: no configurable max depth limit in unbounded queries
- UNION: no post-UNION ORDER BY (must be in each branch)
- Pattern predicates (WHERE inside patterns) not yet supported
0.2.1 - 2026-02-03¶
Added¶
- Dataset loading infrastructure (#68)
- Automatic dataset download and caching system
- Dataset registry with metadata (nodes, edges, size, category, license)
- Built-in support for HTTP downloads with retry logic and timeout
- Local cache directory (
~/.graphforge/datasets/) with TTL-based expiration - Public API:
load_dataset(),list_datasets(),get_dataset_info(),clear_cache() GraphForge.from_dataset()convenience method for loading datasets- Example:
gf = GraphForge.from_dataset("snap-ego-facebook") - CSV edge-list loader (#69)
- Load edge-list datasets in CSV/TSV/space-delimited formats
- Auto-delimiter detection (tab, comma, space)
- Gzip compression support for
.gzfiles - Comment line handling (lines starting with
#) - Weighted and unweighted edge support
- Node deduplication via caching
- Consecutive whitespace handling
- Example: Load SNAP datasets with simple edge-list format
- 5 SNAP datasets available (#69)
snap-ego-facebook- Facebook social circles (4K nodes, 88K edges, 0.5 MB)snap-email-enron- Enron email network (37K nodes, 184K edges, 2.5 MB)snap-ca-astroph- Astrophysics collaboration (19K nodes, 198K edges, 1.8 MB)snap-web-google- Google web graph (876K nodes, 5.1M edges, 75 MB)snap-twitter-combined- Twitter social circles (81K nodes, 1.8M edges, 25 MB)- Auto-registered on module import
- Filterable by source, category, and size
- MERGE ON CREATE SET syntax (#65)
- Conditional property setting when creating nodes:
MERGE (n:Person {id: 1}) ON CREATE SET n.created = timestamp() - Parser support in Lark grammar
- Executor tracks whether MERGE created or matched nodes
- Comprehensive test coverage (parser, executor, integration)
- MERGE ON MATCH SET syntax (#66)
- Conditional property setting when matching existing nodes:
MERGE (n:Person {id: 1}) ON MATCH SET n.updated = timestamp() - Supports both ON CREATE and ON MATCH in same statement
- OpenCypher-compliant semantics
Fixed¶
- WITH clause variable passing in aggregation (#67)
- Fixed variable scoping issues when using WITH after aggregation
- Correctly passes aggregated values to subsequent clauses
- Example:
MATCH (n) WITH count(n) AS cnt RETURN cntnow works correctly
Documentation¶
- Complete dataset documentation (#69)
- New dataset overview guide with usage examples
- SNAP dataset documentation with 5 available datasets
- Updated quick start with dataset loading examples
- Added dataset examples to main README
- Performance tips for large datasets
Known Limitations¶
- Only SNAP datasets available in this release (5 datasets)
- Neo4j example datasets, LDBC, and NetworkRepository planned for v0.3.0
- See Issue #70 for roadmap to 100+ SNAP datasets
0.2.0 - 2026-02-03¶
Added¶
- CASE expressions for conditional logic (#49)
- Full openCypher CASE expression support with WHEN/THEN/ELSE/END syntax
- Simple CASE (
CASE expr WHEN value) and searched CASE (CASE WHEN condition) - NULL-safe semantics following openCypher specification
- Example:
RETURN CASE WHEN n.age < 18 THEN 'minor' ELSE 'adult' END - COLLECT aggregation function (#46, #48)
- Aggregate values into lists with
COLLECT()function - DISTINCT support:
COLLECT(DISTINCT n.name)removes duplicates - Handles complex types (CypherList, CypherMap) correctly in DISTINCT mode
- NULL filtering: NULL values excluded from collected lists
- Example:
MATCH (n) RETURN COLLECT(n.name) - Arithmetic operators (#44)
- Binary operators:
+,-,*,/,%(modulo) - Unary minus:
-n.value - NULL propagation: operations with NULL return NULL
- Type coercion for mixed integer/float operations
- Division by zero returns NULL (openCypher-compliant)
- Example:
RETURN n.price * 1.1 AS price_with_tax - String matching operators (#43)
STARTS WITH: Prefix matchingENDS WITH: Suffix matchingCONTAINS: Substring matching- Case-sensitive matching following openCypher specification
- NULL handling: returns NULL if either operand is NULL
- Example:
MATCH (n) WHERE n.email ENDS WITH '@example.com' - REMOVE clause (#42)
- Remove properties:
REMOVE n.property - Remove labels:
REMOVE n:Label - Multi-target support:
REMOVE n.prop1, n.prop2, n:Label - Idempotent: removing non-existent properties/labels is a no-op
- Example:
MATCH (n:Person) REMOVE n.age, n:Temporary - UNWIND clause (#40)
- Unwind lists into rows:
UNWIND [1, 2, 3] AS x RETURN x - Supports nested lists, empty lists, NULL values
- Can be used with MATCH, WHERE, and other clauses
- Example:
UNWIND $ids AS id MATCH (n) WHERE n.id = id RETURN n - NOT logical operator (#30)
- Unary negation operator for boolean expressions
- NULL-safe semantics:
NOT NULLreturns NULL - Example:
MATCH (n) WHERE NOT n.active RETURN n - DETACH DELETE clause (#33)
- OpenCypher-compliant DELETE semantics
DELETEraises error if node has relationshipsDETACH DELETEremoves all connected edges first, then the node- Example:
MATCH (n:Person) DETACH DELETE n - Comprehensive MATCH-CREATE combination tests (#41)
- 12 integration tests for MATCH followed by CREATE patterns
- Validates correctness of mixed read-write operations
- Complete documentation reorganization (#56)
- Restructured docs into logical sections (getting-started, user-guide, reference, development)
- New datasets documentation section with examples
- Improved navigation and discoverability
- Code of Conduct - Added Contributor Covenant Code of Conduct
Fixed¶
- ORDER BY after aggregation (#39)
- ORDER BY now correctly finds aliased variables after aggregation
- Example:
MATCH (n) RETURN COUNT(n) AS cnt ORDER BY cntnow works - RETURN DISTINCT after projection (#38)
- RETURN DISTINCT now works correctly after projection expressions
- Fixes issue where DISTINCT was applied to wrong columns
Changed¶
- Test coverage improved (#37)
- Coverage increased from 88.69% to 93.76% (+4.94%)
- Added 50+ new tests across parser, planner, and executor
- GitHub Pages deployment modernization (#32)
- Migrated from legacy
mkdocs gh-deployto GitHub Actions native deployment - Uses
actions/upload-pages-artifact@v3andactions/deploy-pages@v4 - Simpler, faster, more secure deployment with
id-tokenauthentication - Issue workflow documentation (#45)
- Updated ISSUE_WORKFLOW.md to reflect current development process
0.1.4 - 2026-02-02¶
Added¶
- Local coverage validation workflow (#15, #16)
make pre-pushnow validates 85% combined line+branch coverage locally before pushingmake coverage- Run tests with coverage measurement and generate reportsmake check-coverage- Validate 85% combined coverage thresholdmake coverage-strict- Strict 90% threshold validation for new featuresmake coverage-report- Open HTML coverage report in browsermake coverage-diff- Show coverage for changed files only- Catches 90% of coverage issues before CI, eliminating codecov patch failures
- Current coverage: 88.69% (92.41% line + 81.19% branch)
- Codecov Test Analytics integration (#17, #18)
- JUnit XML generation for all test runs across 8,203 tests (481 unit/integration + 7,722 TCK)
- Test performance monitoring with execution time trends
- Flaky test detection for intermittent failures
- Failure rate tracking and reliability pattern analysis
- Cross-platform analytics tracking (12 OS/Python combinations)
make test-analytics- Generate JUnit XML locally for analysis- Analytics dashboard at https://app.codecov.io/gh/DecisionNerd/graphforge
- List and map literal support in CREATE statements (#15)
- CREATE now accepts complex property types: lists, maps, and nested structures
- Proper bidirectional CypherValue ↔ Python type conversion
- 10 new integration tests covering complex property edge cases
- Codecov integration for automated coverage tracking
- Coverage reports uploaded from GitHub Actions
- Component-level coverage tracking (parser, planner, executor, storage, ast, types)
- PR comments with coverage changes
- Branch coverage analysis
- Configuration file (
.codecov.yml) with 85% project target, 80% patch target
Changed¶
- Development workflow modernization (#16)
- Updated CONTRIBUTING.md with comprehensive make-based workflow documentation
- Single command for all pre-push validation:
make pre-push(format-check, lint, type-check, coverage, check-coverage) - Clear documentation of coverage requirements (85% project, 80% patch)
- Complete guidance on all available make targets with examples
- Test suite significantly expanded to 479 unit/integration tests + 7,722 TCK compliance tests
Fixed¶
- Codecov test analytics deprecation warning (#18)
- Migrated from deprecated
test-results-action@v1tocodecov-action@v5 - Uses
report_type: test_resultsparameter for future-proof compatibility
[0.1.3] - 2026-02-01¶
Changed¶
- Column naming now uses variable names for simple variable references (openCypher TCK compliance)
RETURN nnow produces column name "n" (previously "col_0")RETURN n AS aliasproduces column name "alias" (unchanged)RETURN n.propertyproduces column name "col_0" (unchanged - complex expression)- This aligns GraphForge with the openCypher specification and improves Neo4j compatibility
- Note: This is a breaking change from v0.1.2 but necessary for WITH clause correctness
- Rationale: WITH clause requires preserving variable names through query pipeline
- Test suite expanded with WITH clause coverage (17 comprehensive test cases)
Added¶
- Comprehensive WITH clause integration tests covering:
- Basic projection and variable renaming
- WHERE filtering on intermediate results
- Aggregation with GROUP BY semantics
- ORDER BY, SKIP, LIMIT on intermediate results
- Multi-part query chaining
- Edge cases and null handling
Fixed¶
- WITH clause bugs: column naming, aggregations, and DISTINCT behavior
- CodeRabbit configuration file to use only valid schema properties
- Removed unused pytest import from WITH clause tests
0.1.2 - 2026-02-01¶
Added¶
- Professional versioning and release management system
- Comprehensive CHANGELOG.md following Keep a Changelog format
- Automated version bumping script (
scripts/bump_version.py) - Release process documentation (RELEASING.md, docs/RELEASE_PROCESS.md, docs/RELEASE_STRATEGY.md)
- Weekly automated release check with GitHub issue reminders
- Release tracking workflow with auto-labeling
- MkDocs Material documentation site
- Auto-generated API documentation from docstrings
- Complete user guide (installation, quickstart, Cypher guide)
- Auto-deploy to GitHub Pages on every push
- CI/CD enhancements
- CHANGELOG validation workflow (ensures PRs update changelog)
- Automated PR labeling based on changed files
- Labels for component tracking (parser, planner, executor)
.editorconfigfor consistent editor settings across IDEs
Changed¶
- Updated GitHub Actions to Node.js 24 (actions/checkout v6, actions/setup-python v6, astral-sh/setup-uv v7)
- Enhanced PR guidelines to enforce small PRs and proper fixes
- Updated README badges to professional numpy-style flat badges
Fixed¶
- Integration test regression from WITH clause implementation
- Column naming now correctly uses
col_Nfor unnamed return items - SKIP/LIMIT queries no longer return empty results
- TCK test collection error resolved
- API documentation now references actual modules (api, ast, parser, planner, executor, storage, types)
[0.1.1] - 2026-01-31¶
Added¶
- WITH clause for query chaining and subqueries
- Production-grade CI/CD infrastructure
- Pre-commit hooks (ruff, mypy, bandit)
- CodeRabbit integration
- Dependabot configuration
- PR and issue templates
- Comprehensive documentation (30+ docs)
- TCK compliance at 16.6% (638/3,837 scenarios)
Changed¶
- Updated project URLs to reflect new organization
- Enhanced README with additional badges
Fixed¶
- Critical integration test failures (20 tests)
- TCK test collection error
0.1.0 - 2026-01-30¶
Added¶
- Initial release of GraphForge
- Core data model (nodes, edges, properties, labels)
- Python builder API (
create_node,create_relationship) - SQLite persistence with ACID transactions
- openCypher query execution
- MATCH, WHERE, CREATE, SET, DELETE, MERGE, RETURN
- ORDER BY, LIMIT, SKIP
- Aggregations (COUNT, SUM, AVG, MIN, MAX)
- Parser and AST for openCypher subset
- Query planner and executor
- 351 tests (215 unit + 136 integration)
- 81% code coverage
- Multi-OS, multi-Python CI/CD (3 OS × 4 Python versions)