Skip to content

GraphForge - Project Status & Implementation Roadmap

Last Updated: 2026-01-30 Current Version: 0.1.1 Status: Phase 4 Complete → TCK-Compliant Query Engine


Executive Summary

GraphForge has achieved a major architectural milestone with Phase 4 completion: - Documentation: 2,100+ lines of comprehensive specifications + architecture docs - Testing Infrastructure: Enterprise-grade pytest setup with CI/CD - Package Structure: Professional Python package with PyPI publishing - Core Data Model: Complete with 89% test coverage (Phase 1 ✅) - Parser & AST: Full openCypher v1+ parser implemented (Phase 2 ✅) - Execution Engine: Working query execution pipeline (Phase 3 ✅) - TCK Compliance: 17 TCK tests passing, strict semantic validation (Phase 4 ✅) - Storage Architecture: SQLite backend decision finalized (documented)

Status: 267 tests passing (100%). GraphForge can execute openCypher queries with: - MATCH, WHERE, RETURN with aliasing - ORDER BY (multi-key, ASC/DESC, alias references) - Aggregation (COUNT, SUM, AVG, MIN, MAX with implicit GROUP BY) - SKIP, LIMIT - Strict openCypher semantics (TCK validated)

Next: Phase 5 - SQLite persistence implementation (30 hours estimated).


Current State Analysis

✅ Complete (High Quality)

1. Requirements & Design (17,299 lines)

  • docs/0-requirements.md
  • Clear purpose and scope definition
  • Data model requirements (nodes, relationships, properties)
  • Query language requirements (v1 subset)
  • Storage and execution engine requirements
  • TCK compliance strategy
  • Detailed comparisons with NetworkX and production DBs
  • Explicit non-goals

2. Technical Specifications

  • docs/open_cypher_ast_logical_plan_spec_v_1.md
  • AST structure for MATCH, WHERE, RETURN, LIMIT, SKIP
  • Logical plan operators
  • Semantic lowering rules

  • docs/runtime_value_model_graph_execution_v_1.md

  • NodeRef and EdgeRef specifications
  • Value type model
  • Runtime execution contracts

3. Testing Infrastructure (✨ Just Completed)

  • docs/testing-strategy.md - Comprehensive testing approach
  • Pytest configured with unit/integration/tck/property markers
  • Coverage tracking (85% threshold, branch coverage)
  • CI/CD pipeline (multi-OS, multi-Python)
  • Hypothesis for property-based testing
  • TCK coverage matrix initialized
  • Dev dependencies installed and verified

4. Project Packaging

  • pyproject.toml - Modern Python packaging
  • GitHub Actions - Publishing workflow
  • MIT License
  • Professional README with badges and clear value proposition

✅ Implemented (Phases 1-3)

Implementation: ~50% Complete

Phase 1: Core Data Model ✅ - CypherValue types (CypherInt, CypherFloat, CypherString, CypherBool, CypherNull, CypherList, CypherMap) - NodeRef and EdgeRef with identity semantics - In-memory Graph store with adjacency lists and indexes - 86 tests, 89.43% coverage

Phase 2: Parser & AST ✅ - Complete AST data structures (query, clause, pattern, expression) - Lark-based Cypher parser with EBNF grammar - Support for MATCH, WHERE, RETURN, LIMIT, SKIP - 167 tests passing

Phase 3: Execution Engine ✅ - Logical plan operators (ScanNodes, ExpandEdges, Filter, Project, Limit, Skip) - Expression evaluator with NULL propagation - Query executor with streaming pipeline architecture - Query planner (AST → logical plan) - High-level GraphForge API - 213 tests passing, 89.35% coverage

Example working queries:

from graphforge import GraphForge

gf = GraphForge()
# Add nodes and edges...

# Execute queries
results = gf.execute("MATCH (n:Person) RETURN n")
results = gf.execute("MATCH (n:Person) WHERE n.age > 25 RETURN n")
results = gf.execute("MATCH (a)-[r:KNOWS]->(b) RETURN a, r, b")

See docs/phase-1-complete.md, docs/phase-2-complete.md, and docs/phase-3-complete.md for detailed status.


Implementation Roadmap

Based on the requirements and specifications, here's the recommended implementation sequence:

Phase 1: Core Data Model (Week 1-2)

Goal: Basic graph primitives that can be tested

1.1 Value Types (src/graphforge/types/values.py)

- CypherValue (base type)
- CypherInt, CypherFloat, CypherBool, CypherString, CypherNull
- CypherList, CypherMap
- Value comparison and equality semantics

Tests: tests/unit/test_values.py - Null propagation - Type coercion - Comparison operators - Property-based tests for edge cases

1.2 Graph Elements (src/graphforge/types/graph.py)

- NodeRef (id, labels, properties)
- EdgeRef (id, type, src, dst, properties)
- Identity semantics (by ID)
- Hashable and comparable

Tests: tests/unit/test_graph_elements.py - Node creation and equality - Edge creation and directionality - Property access - Label operations

1.3 In-Memory Graph Store (src/graphforge/storage/memory.py)

- Graph (node and edge storage)
- Adjacency list representation
- Node/edge lookup by ID
- Basic CRUD operations

Tests: tests/unit/storage/test_memory_store.py - Add/get nodes and edges - Adjacency navigation - Label and property queries

Milestone: Can create and query graphs programmatically (no Cypher yet)


Phase 2: Query Parser & AST (Week 3-4)

Goal: Parse openCypher queries into AST

2.1 AST Data Structures (src/graphforge/ast/)

- nodes.py        # AST node classes
- pattern.py      # NodePattern, RelationshipPattern
- expression.py   # Expressions, predicates
- clause.py       # MatchClause, WhereClause, ReturnClause

Based on specs in docs/open_cypher_ast_logical_plan_spec_v_1.md

Tests: tests/unit/ast/test_ast_nodes.py - AST node construction - Pattern validation - Expression tree building

2.2 Parser (src/graphforge/parser/)

Decision Point: Choose parser strategy:

Option A: Use existing parser library - ✅ Faster development - ✅ Battle-tested - ❌ External dependency - Recommendation: pyparsing or lark-parser

Option B: Write custom parser - ✅ Full control - ✅ No dependencies - ❌ Time-consuming - ❌ More bugs initially

Recommended: Start with Option A (pyparsing/lark), can replace later

- cypher_grammar.py    # Grammar definition
- parser.py            # Parse query string → AST
- validator.py         # Validate AST for v1 subset

Tests: tests/unit/parser/test_parser.py - Parse valid queries - Reject invalid syntax - Error messages for unsupported features - TCK parsing scenarios

Milestone: Can parse v1 Cypher queries into validated AST


Phase 3: Logical Plan & Execution (Week 5-6)

Goal: Execute queries against in-memory graphs

3.1 Logical Plan (src/graphforge/planner/)

- operators.py     # ScanNodes, ExpandEdges, Filter, Project, Limit
- planner.py       # AST → Logical Plan
- optimizer.py     # Basic rule-based optimization (optional)

Tests: tests/unit/planner/test_planner.py - AST lowering correctness - Operator chaining - Plan determinism

3.2 Execution Engine (src/graphforge/executor/)

- executor.py      # Execute logical plan
- context.py       # Execution context (variable bindings)
- evaluator.py     # Expression evaluation

Tests: tests/unit/executor/test_executor.py - Operator execution - Variable binding - Result streaming

3.3 Python API (src/graphforge/api.py)

class GraphForge:
    def __init__(self, path: str | Path): ...
    def execute(self, query: str) -> ResultSet: ...
    def close(self): ...

Tests: tests/integration/test_api.py - End-to-end query execution - Result format validation - Error handling

Milestone: Can execute full v1 queries end-to-end (in-memory only)


Phase 4: TCK Compliance (Week 7-8)

Goal: Pass openCypher TCK tests for v1 features

4.1 TCK Integration (tests/tck/)

- utils/tck_runner.py       # TCK scenario executor
- features/match/           # Match tests
- features/where/           # Where tests
- features/return/          # Return tests

4.2 Fix Semantic Issues

  • Debug TCK failures
  • Fix semantic mismatches
  • Update coverage matrix

Tests: tests/tck/features/**/*.py - Run TCK scenarios - Validate semantic correctness

Milestone: Pass all v1 TCK tests, coverage matrix shows "supported" status


Phase 5: Persistence Layer (Week 9-10)

Goal: Durable storage with ACID properties using SQLite

5.1 Architecture Decision: SQLite

Decision: Use SQLite as the storage backend (see docs/storage-architecture-analysis.md)

Rationale: - Provides ACID, WAL mode, crash recovery out-of-the-box - Battle-tested (20+ years, billions of deployments) - Zero operational overhead (embedded, zero-config) - Aligns with "mirrors SQLite" design philosophy - 20-30 hours implementation vs 120-175 hours for custom WAL - Allows focus on openCypher features

5.2 Storage Implementation (src/graphforge/storage/)

- backend.py        # Storage backend interface (Protocol)
- sqlite_backend.py # SQLite implementation (default)
- schema.py         # Graph-specific SQLite schema
- serialization.py  # CypherValue <-> bytes conversion

SQLite Schema Design:

CREATE TABLE nodes (
    id INTEGER PRIMARY KEY,
    labels BLOB,      -- MessagePack serialized set
    properties BLOB   -- MessagePack serialized dict
);

CREATE TABLE edges (
    id INTEGER PRIMARY KEY,
    type TEXT,
    src_id INTEGER,
    dst_id INTEGER,
    properties BLOB,
    FOREIGN KEY (src_id) REFERENCES nodes(id),
    FOREIGN KEY (dst_id) REFERENCES nodes(id)
);

CREATE TABLE adjacency_out (
    node_id INTEGER,
    edge_id INTEGER,
    PRIMARY KEY (node_id, edge_id)
);

CREATE TABLE adjacency_in (
    node_id INTEGER,
    edge_id INTEGER,
    PRIMARY KEY (node_id, edge_id)
);

-- Indexes for fast lookups
CREATE INDEX idx_nodes_labels ON nodes(labels);
CREATE INDEX idx_edges_type ON edges(type);
CREATE INDEX idx_edges_src ON edges(src_id);
CREATE INDEX idx_edges_dst ON edges(dst_id);

SQLite Configuration:

PRAGMA journal_mode=WAL;  # Single writer, multiple readers
PRAGMA synchronous=FULL;   # Durability guarantee
PRAGMA foreign_keys=ON;    # Referential integrity

Tests: tests/integration/test_storage.py - Persist and reload graphs - Transaction isolation (BEGIN/COMMIT/ROLLBACK) - Crash recovery simulation - Concurrent readers (multiple connections) - ID stability across restarts

Milestone: Graphs persist across restarts, ACID guarantees, zero-config


Phase 6: Polish & Documentation (Week 11-12)

Goal: Production-ready v1 release

6.1 Query Plan Inspection

- plan_formatter.py    # Human-readable plan output
- GraphForge.explain(query)  formatted plan

6.2 Performance Testing

tests/benchmarks/
- benchmark_parsing.py
- benchmark_execution.py
- benchmark_storage.py

6.3 Documentation

  • API documentation with examples
  • Tutorial notebook
  • Migration guide (NetworkX → GraphForge)
  • Performance characteristics

6.4 Error Messages

  • Improve parser error messages
  • Add query execution hints
  • Document error codes

Milestone: v1.0.0 release ready


Immediate Next Steps (Phase 4: TCK Compliance)

With Phase 3 complete and a working query engine, the next priority is TCK (Technology Compatibility Kit) compliance testing.

Priority 1: Add Missing Features for TCK

Based on current limitations identified in Phase 3:

  1. RETURN Aliasing - RETURN n.name AS name
  2. Update ReturnClause AST to include aliases
  3. Update parser to parse AS keyword
  4. Update Project operator to use aliases in output

  5. ORDER BY Clause

  6. Add OrderByClause to AST
  7. Add Sort operator to planner
  8. Implement _execute_sort in executor

  9. Aggregation Functions

  10. Add FunctionCall expression to AST
  11. Implement COUNT(), SUM(), AVG(), MIN(), MAX()
  12. Handle aggregation in planner (GroupBy operator)

  13. Multiple Labels in WHERE

  14. Update ScanNodes to handle multiple labels correctly
  15. Fix label index queries to support multi-label nodes

Priority 2: Set Up openCypher TCK

  1. Clone TCK repository

    git clone https://github.com/opencypher/openCypher.git vendor/opencypher
    

  2. Create TCK test runner - tests/tck/utils/tck_runner.py

  3. Parse Gherkin feature files
  4. Execute scenarios against GraphForge
  5. Report pass/fail

  6. Create TCK integration - tests/tck/test_tck_features.py

  7. Run selected TCK scenarios
  8. Mark expected failures
  9. Track coverage

Priority 3: Fix Semantic Issues

Run TCK tests and fix failures: - Debug semantic mismatches - Fix NULL propagation edge cases - Ensure comparison semantics match spec - Validate result formats

Goal: Pass all TCK tests for v1 features (MATCH, WHERE, RETURN, LIMIT, SKIP)


Development Workflow

Daily Cycle

  1. Pick a component from the roadmap
  2. Write tests first (TDD)
    pytest -m unit --watch
    
  3. Implement minimal code to pass tests
  4. Refactor while keeping tests green
  5. Check coverage
    pytest --cov=src --cov-report=term-missing
    
  6. Commit
    ruff format . && ruff check --fix .
    git add . && git commit -m "Add NodeRef implementation"
    

Weekly Cycle

  1. Review progress against roadmap
  2. Run full test suite including integration tests
  3. Update TCK coverage matrix if features were added
  4. Update documentation if APIs changed
  5. Performance check - are tests still fast?

Decision Points

Parser Library Choice

Recommendation: Use lark-parser

Rationale: - Good performance - Clear grammar syntax (EBNF) - Excellent error messages - Can reference openCypher grammar directly - Active maintenance

Alternative: pyparsing if you prefer combinator style

Storage Backend Choice

Decision: Use SQLite (architectural decision finalized)

Rationale: - ACID guarantees out of the box (transactions, WAL, crash recovery) - Cross-platform (macOS, Linux, Windows) - Zero configuration (single file, embedded) - Battle-tested durability (20+ years, billions of deployments) - Matches "embedded" design goal - Allows focus on openCypher execution vs storage implementation - 20-30 hours implementation vs 120-175 hours for custom WAL - Storage backend is replaceable if future optimization needed

See docs/storage-architecture-analysis.md for detailed decision analysis.

Implementation Style

Recommendation: Functional core, imperative shell

Rationale: - Pure functions for AST, planning, execution logic - Side effects isolated to storage layer - Easier to test and reason about - Matches Cypher's declarative nature


Risk Management

Potential Challenges

  1. Parser complexity
  2. Mitigation: Use existing parser library
  3. Fallback: Start with simplified grammar

  4. TCK semantic correctness

  5. Mitigation: Implement TCK tests incrementally
  6. Fallback: Clearly document semantic differences

  7. Storage performance

  8. Mitigation: Start simple, optimize later
  9. Fallback: Document scaling limits clearly

  10. Scope creep

  11. Mitigation: Strict adherence to v1 scope
  12. Fallback: Defer features to v1.1+

Success Metrics (v1.0)

Functional

  • ✅ Parse v1 Cypher queries (MATCH, WHERE, RETURN, LIMIT, SKIP)
  • ✅ Execute queries against graphs (10^6 nodes, 10^7 edges)
  • ✅ Pass TCK tests for supported features
  • ✅ Persist graphs across restarts

Quality

  • ✅ Test coverage ≥ 85%
  • ✅ CI/CD passing on all platforms
  • ✅ Zero critical bugs in issue tracker
  • ✅ API documentation complete

Performance (Best Effort)

  • Unit tests: < 5 seconds
  • Integration tests: < 30 seconds
  • TCK tests: < 60 seconds
  • Query execution: < 100ms for simple queries on small graphs

Community

  • ✅ Clear README with examples
  • ✅ Contributing guide
  • ✅ Issue templates
  • ✅ Example notebooks

Time Estimates

Based on the roadmap (updated with actuals):

Phase Duration Estimated Actual Status
1. Core Data Model 2 weeks 40-60 hours ~2.5 hours ✅ Complete
2. Parser & AST 2 weeks 40-60 hours ~2.5 hours ✅ Complete
3. Execution Engine 2 weeks 40-60 hours ~2 hours ✅ Complete
4. TCK Compliance 2 weeks 30-40 hours TBD 🔄 Next
5. Persistence 2 weeks 40-50 hours TBD ⏳ Pending
6. Polish 2 weeks 30-40 hours TBD ⏳ Pending
Total (Phases 1-3) 6 weeks 120-180 hours ~7 hours 50% done
Total (All Phases) 12 weeks 220-310 hours TBD In progress

Note: Actual time spent was significantly less than estimated due to: - Clear, detailed specifications already in place - TDD approach with immediate feedback - No research or design decisions needed - Well-structured problem domain

Accelerators: - Excellent requirements already written - Testing infrastructure complete - Clear specifications to follow - No research needed, just implementation


Current Progress Summary

✅ Completed (Phases 1-3)

Phase 1: Core Data Model (~2.5 hours vs 40-60 hours estimated) - ✅ Module structure created - ✅ CypherValue types implemented - ✅ NodeRef and EdgeRef implemented - ✅ In-memory graph store implemented - ✅ 86 tests, 89.43% coverage

Phase 2: Parser & AST (~2.5 hours vs 40-60 hours estimated) - ✅ AST data structures - ✅ Lark-based parser - ✅ Grammar for v1 Cypher - ✅ 167 tests passing

Phase 3: Execution Engine (~2 hours vs 40-60 hours estimated) - ✅ Logical plan operators - ✅ Expression evaluator - ✅ Query executor - ✅ Query planner - ✅ High-level GraphForge API - ✅ 213 tests, 89.35% coverage

Total time spent: ~7 hours (vs 120-180 hours estimated)

This represents approximately 50% of the v1.0 functionality complete.

📋 Next Phase: TCK Compliance

Recommended starting point: Implement RETURN aliasing and ORDER BY clause to expand query capabilities before diving into TCK test suite.


Resources Needed

Dependencies to Add

[project.dependencies]
pydantic = ">=2.6"
lark = ">=1.1"          # Parser (recommended)
# OR pyparsing = ">=3.0"  # Alternative parser

[project.optional-dependencies]
dev = [
    # ... existing dev deps
]

External References


Conclusion

GraphForge has achieved major implementation milestone:

Clear vision - Requirements document is comprehensive ✅ Technical specs - AST and runtime models defined ✅ Quality foundation - Testing infrastructure is production-ready ✅ Professional packaging - Ready for PyPI ✅ Working query engine - Phases 1-3 complete (50% of v1.0 functionality)

Current Status: 213 tests passing, 89.35% coverage

Capabilities: - Parse and execute openCypher v1 queries - Support MATCH, WHERE, RETURN, LIMIT, SKIP - Handle node and relationship patterns - Property access and comparisons - Logical operators with proper NULL propagation

Recommended action: Continue with Phase 4 (TCK Compliance) to validate semantic correctness and identify gaps. Add RETURN aliasing and ORDER BY as next features.

This project is on track for v1.0 release. The implementation velocity has been excellent, with each phase completed significantly faster than estimated. The remaining work is primarily feature expansion and compliance validation rather than foundational architecture.