GraphForge Roadmap¶
Last Updated: 2026-06-07 Current Version: 0.4.0
Released¶
v0.3.8 — Full TCK Compliance¶
- 3,885/3,885 openCypher TCK scenarios passing (zero failures, zero expected failures)
- First embedded Python graph database with complete openCypher TCK compliance
- Full Cypher language support: MATCH, WHERE, RETURN, CREATE, SET, REMOVE, DELETE, MERGE, UNWIND, WITH, ORDER BY, SKIP, LIMIT, OPTIONAL MATCH, variable-length paths, pattern comprehension, temporal types, all standard functions
v0.3.9 — Performance¶
Theme: Maximize the performance of the v0.3.x feature set — O(n) parsing, node/edge indexes, bulk ingestion, SQLite durability tuning, and LIMIT short-circuit.
| Area | Work | PR |
|---|---|---|
| LALR(1) parser migration | Linear-time parsing, unknown function detection | #430 |
| Property equality index | O(1) node lookup by property value | #427 |
| LIMIT short-circuit | Traversal stops at demand; UNWIND + WITH LIMIT optimised | #423, #443 |
| Bulk ingestion API | create_node_bulk, bulk_ingest() context manager |
#444 |
| SQLite PRAGMA tuning | synchronous=NORMAL, 64 MB cache, temp_store=MEMORY |
#446 |
| Recursion limit fix | sys.setrecursionlimit thread-safe, set once at init |
#445 |
elementId() function |
GQL-spec string form of element identity | #447 |
| Test suite renovation | Zero-base fixture audit, marker consistency, parametrization | #438 |
| Perf baseline suite | Real-dataset benchmarks (S/M/L/XL tiers), v0.3.8→v0.3.9 delta | #441 |
v0.3.10 — Analytics Integration¶
Theme: Bridge GraphForge to the Python analytics ecosystem. Export graphs directly to NetworkX and igraph; add a parse/plan cache to eliminate repeated parse overhead in notebook loops.
| Issue | Scope |
|---|---|
| #391 | gf.to_networkx() and gf.to_igraph() with optional subgraph filtering |
| #504 | Parse/plan LRU cache, GraphForge(cache_size=128) |
| #502 | add_graph_documents() LangChain-compatible ingestion |
Release tracker: #448
v0.4.0 — Three-Surface API (Algorithms + Search)¶
Released: 2026-05-07
Theme: Extend GraphForge beyond Cypher with two new API surfaces for graph algorithms and hybrid retrieval — without adding Cypher extensions.
db.execute(...) # Cypher — openCypher query engine
db.gds.pagerank(write_property="rank") # Algorithms — igraph / NetworkX backends
db.search("query", vector=embedding) # Retrieval — FTS5 + vector cosine, RRF fusion
db.gds — 8 compiled algorithms: pagerank, betweenness_centrality, closeness_centrality, degree_centrality, louvain, connected_components, clustering_coefficient, triangle_count.
db.search — FTS5 text search, vector cosine similarity, and RRF-fused hybrid. Returns list[SearchHit] with score provenance.
graphforge.recipes — neighbourhood() n-hop context builder for LLM prompts.
Release tracker: #394
Planned¶
v0.5.0 — Rust Core¶
Theme: Replace the Python executor with a Rust core and redesign the API around analyst intent. The three namespaced Python surfaces (db.gds, db.search, db.execute) are replaced by seven analyst-intent verbs — all returning Apache Arrow Tables:
forge.execute(cypher) # openCypher → Arrow Table
forge.rank(label, by=...) # centrality, structural scoring → Arrow Table + score
forge.cluster(label, by=...) # community detection, components → Arrow Table + community_id
forge.paths(source, target, by=…) # shortest paths, flow, reachability → Arrow Table
forge.analyze(label, by=…) # spanning trees, DAG, coloring, matching, embeddings → Arrow Table
forge.similar(label, by=…) # pairwise node similarity → Arrow Table
forge.find(query, …) # text/vector/hybrid search → Arrow Table + score + matched_on
Full algorithm catalog: Algorithm Verbs
Architecture decisions: ADR 0002, ADR 0003
Milestones¶
The milestone sequence below was renumbered when the adjacency index (ADR 0005) was adopted as a first-class layer and the Architecture Baseline moved to its chronological slot. GitHub milestone IDs are immutable; the
M##prefix in each milestone title is the canonical sequence shown here. See refactor-v0.5 §8.
| # | Milestone | State | Deliverable | Exit criteria |
|---|---|---|---|---|
| 9 | Compiler skeleton | closed | AST, spans, token model, LALRPOP parser harness | Differential parse tests passing on corpus |
| 10 | Ontology runtime | closed | YAML/JSON loader, normalized Arrow tables, validator | Load/validate/migrate round-trips passing |
| 11 | Architecture baseline | closed | UUID identity, typed edge tables, topology/properties split, project structure (docs) | Architecture reference adopted |
| 12 | Graph IR | closed | Typed graph IR + serde envelope + explain output |
AST→IR golden tests stable |
| 13 | Relational lowering | closed | Lower core MATCH/WHERE/RETURN subset to DataFusion | Logical-plan tests stable |
| 14 | Execution baseline | open | DataFusion-backed execution on Parquet provider | End-to-end query tests passing; Arrow RecordBatch streams returned |
| 15 | Adjacency index baseline | open | Consolidated derived CSR adjacency index under indexes/adjacency/ |
No IR change; never alters results; gates M18 |
| 16 | Bindings baseline | open | forge.execute() → PyArrow Table; Node IPC results |
Smoke tests and packaging passing — Python + Node |
| 17 | Conformance hardening | open | openCypher TCK subset, fuzzing, provenance/confidence rules | TCK threshold met; all merge gates pass |
| 18 | Rank and cluster | open | forge.rank/cluster → Arrow Tables |
All by= values tested; write-back working |
| 19 | Find | open | forge.find() + forge.index() → Arrow Tables |
Text, vector, hybrid; lazy indexing working |
| 20 | Layering & boundary reconciliation | open | Layer docs/ADRs (0006/0007); boundary gate regression test | Graph-native query results independent of knowledge layer |
| 21 | Knowledge layer foundation | open | Write provenance events/lineage; propagate confidence | Knowledge attaches by UUID reference only; boundary gate green |
| 22 | Epistemic model | open | Assertions/status, supersession, evidence, bitemporal valid-time | Preservation-over-deletion; boundary gate green |
| 23 | v0.5.0 release | open | Agent-skills work + comprehensive close-out (#742) | M11–M22 complete; boundary gate green |
| 24 | Swift + Kotlin bindings (v0.5.1) | open | UniFFI-generated Swift Package + Kotlin JAR | Round-trip tests + CI packaging green for both languages |
Merge gates (all required before merging to main)¶
- Parser parity — RD+Pratt corpus + syntax goldens pass
- openCypher TCK subset — agreed compliance threshold met
- Ontology round-trips — load/validate/migrate stable
- Arrow/IPC round-trips — data contract stable across all five language bindings
- Parquet provider — core semantics verified
- Python + Node bindings — packaging and smoke tests pass
- Swift + Kotlin bindings — UniFFI packaging and smoke tests pass
- Observability —
explain, query IDs, provenance IDs, structured errors - All seven analyst verbs — Arrow Tables, write-back, via/directed filters
forge.find()/forge.index()— lazy indexing, text + vector + hybrid
Language Binding Matrix¶
| Language | Mechanism | Result | Milestone |
|---|---|---|---|
| Python | PyO3 + maturin | pyarrow.Table |
16 |
| Node / TypeScript | napi-rs | Arrow IPC Buffer |
16 |
| Swift | UniFFI | Arrow IPC Data → GraphForgeResult |
24 |
| Kotlin / JVM | UniFFI | Arrow IPC ByteArray → GraphForgeResult |
24 |
| Rust | Native crate | ExecutionResult |
14 |
Version Numbering¶
GraphForge follows Semantic Versioning and is pre-v1.0. The 0.x version series signals that the API is still maturing.
- Patch (
0.x.y): Bug fixes, small improvements, no API changes - Minor (
0.x.0): New features; backwards-compatible where practical
v0.5.0 is the Rust core release. A v1.0 release will happen when the API is stable enough to commit to long-term.