Skip to content

OpenCypher Compatibility Status

Last Updated: 2026-05-07 GraphForge Version: v0.4.0


Executive Summary

As of v0.4.0, GraphForge passes all 3,885 openCypher TCK scenarios — 100% compliance with zero failures and zero expected failures. This is the first embedded Python graph database to achieve complete openCypher TCK compliance.

Current Status

Version TCK Scenarios Feature Completeness Status
v0.1.4 638/3,837 ~30% Released
v0.2.0 638/3,837 ~40% Released
v0.2.1 638/3,837 ~45% Released
v0.3.0 1,303/1,626 (34%) ~78% Released (February 2026)
v0.3.6 2,507/3,837 (65%) ~85% Released (March 2026)
v0.3.7 3,235/3,885 (83.3%) ~88% Released (April 2026)
v0.3.8 3,885/3,885 (100%) 100% Released (May 2026)
v0.3.10 3,885/3,885 (100%) 100% Released (May 2026)

Design Philosophy

GraphForge prioritizes: - ✅ Full openCypher language — all clauses, functions, operators, patterns - ✅ SQLite-backed persistence with ACID transactions - ✅ Zero-configuration embedded usage - ✅ Temporal/spatial types — nanosecond precision, IANA timezones, extreme years - ✅ Python-first — results are Python objects; integrates with pandas, NetworkX, LLM libraries - ❌ Full-text search (use CONTAINS or external FTS) - ❌ Multi-database / distributed features (single-node embedded design) - ❌ High-concurrency write workloads (SQLite single-writer)


Feature Matrix

All features below are fully implemented as of v0.3.9.

✅ Fully Supported (v0.1.4+)

Reading Clauses

  • MATCH - Basic pattern matching with node and relationship patterns
  • Single patterns: MATCH (n:Label)
  • Relationship patterns: MATCH (a)-[r:TYPE]->(b)
  • Multi-pattern: MATCH (a), (b)
  • Property filtering: MATCH (n {key: value})
  • WHERE - Predicate filtering with comparisons and logical operators
  • Comparisons: =, <>, <, >, <=, >=
  • Logical operators: AND, OR
  • Property access: n.property
  • NULL handling with ternary logic
  • RETURN - Projection with aliasing
  • Property projection: RETURN n.name AS name
  • Expressions: RETURN n.age + 5
  • DISTINCT: RETURN DISTINCT n.city
  • WITH - Query chaining and variable passing
  • Pipeline queries: MATCH ... WITH ... MATCH ...
  • Filtering: WITH n WHERE n.age > 18
  • ORDER BY - Sorting with multiple keys
  • Multi-key: ORDER BY n.age DESC, n.name ASC
  • NULL ordering: NULLs last by default
  • LIMIT / SKIP - Result pagination

Writing Clauses

  • CREATE - Node and relationship creation
  • Nodes: CREATE (n:Label {key: value})
  • Relationships: CREATE (a)-[r:TYPE {key: value}]->(b)
  • Multi-create: CREATE (a), (b), (a)-[:KNOWS]->(b)
  • SET - Property updates
  • Set property: SET n.key = value
  • Set multiple: SET n.key1 = value1, n.key2 = value2
  • Copy properties: SET n = m
  • DELETE - Node and relationship deletion
  • Delete nodes: DELETE n
  • Delete relationships: DELETE r
  • Constraint: Cannot delete node with relationships (use DETACH DELETE)
  • MERGE - Create-or-match patterns
  • Basic: MERGE (n:Label {key: value})
  • ON CREATE: MERGE (n) ON CREATE SET n.created = timestamp()
  • ON MATCH: MERGE (n) ON MATCH SET n.accessed = timestamp()

Aggregations

  • COUNT - Row counting
  • COUNT(*) - Count all rows
  • COUNT(expr) - Count non-NULL values
  • COUNT(DISTINCT expr) - Count distinct values
  • SUM - Numeric summation
  • AVG - Numeric average
  • MIN - Minimum value
  • MAX - Maximum value
  • Implicit GROUP BY - Non-aggregated columns become grouping keys

Scalar Functions

  • String Functions
  • length(str) - String length
  • substring(str, start, length) - Extract substring
  • toUpper(str) / toLower(str) - Case conversion
  • trim(str) - Remove whitespace
  • Type Conversion
  • toInteger(value) - Convert to integer
  • toFloat(value) - Convert to float
  • toString(value) - Convert to string
  • Utility Functions
  • coalesce(expr1, expr2, ...) - Return first non-NULL
  • type(relationship) - Get relationship type

Expressions & Operators

  • Comparison Operators: =, <>, <, >, <=, >=
  • Logical Operators: AND, OR (with NULL propagation)
  • Property Access: n.property, r.property
  • Literals: Integers, floats, strings, booleans, NULL, lists, maps
  • List Literals: [1, 2, 3], ['a', 'b', 'c']
  • Map Literals: {key: value, nested: {k: v}}

Data Types

  • CypherInt - 64-bit signed integers
  • CypherFloat - 64-bit floating point
  • CypherString - UTF-8 strings
  • CypherBool - Boolean (true/false)
  • CypherNull - NULL value
  • CypherList - Ordered lists (heterogeneous)
  • CypherMap - Key-value maps (nested structures)
  • NodeRef - Node references in query context
  • EdgeRef - Relationship references in query context

✅ Completed in v0.2.0 and v0.2.1

Released: February 2026

Feature Version Status
CASE expressions v0.2.0 ✅ Complete
COLLECT aggregation v0.2.0 ✅ Complete
Arithmetic operators (+, -, *, /, %) v0.2.0 ✅ Complete
String matching (STARTS WITH, ENDS WITH, CONTAINS) v0.2.0 ✅ Complete
REMOVE clause v0.2.0 ✅ Complete
NOT operator v0.2.0 ✅ Complete
UNWIND clause v0.2.0 ✅ Complete
DETACH DELETE v0.2.0 ✅ Complete
MERGE ON CREATE SET v0.2.1 ✅ Complete
MERGE ON MATCH SET v0.2.1 ✅ Complete
Dataset loading infrastructure v0.2.1 ✅ Complete
CSV edge-list loader v0.2.1 ✅ Complete
5 SNAP datasets v0.2.1 ✅ Complete

What v0.2.0 Will Enable

-- UNWIND: Iterate over lists
UNWIND [1, 2, 3] AS num
RETURN num

-- DETACH DELETE: Cascading deletion
MATCH (n:Temporary)
DETACH DELETE n

-- CASE: Conditional logic
MATCH (p:Person)
RETURN p.name,
       CASE
           WHEN p.age < 18 THEN 'minor'
           WHEN p.age < 65 THEN 'adult'
           ELSE 'senior'
       END AS category

-- REMOVE: Property/label removal
MATCH (n:Person)
REMOVE n.temporaryField, n:OldLabel

-- Arithmetic: Computations
MATCH (p:Person)
RETURN p.name, p.salary * 1.1 AS new_salary

-- COLLECT: Aggregate into lists
MATCH (p:Person)
RETURN p.city, COLLECT(p.name) AS residents

-- String matching: Text filtering
MATCH (p:Person)
WHERE p.email ENDS WITH '@example.com'
RETURN p

-- NOT: Logical negation
MATCH (p:Person)
WHERE NOT p.archived
RETURN p

✅ Completed in v0.3.0

Released: February 2026 | Release Notes

Major Cypher Features

Feature Status TCK Impact
OPTIONAL MATCH ✅ Complete ~150 scenarios
Variable-length patterns (-[:TYPE*1..3]->) ✅ Complete ~100 scenarios
List comprehensions ✅ Complete ~100 scenarios
Subqueries (EXISTS, COUNT) ✅ Complete ~100 scenarios
UNION / UNION ALL ✅ Complete ~30 scenarios
IS NULL / IS NOT NULL ✅ Complete Integrated
Spatial types (Point, Distance) ✅ Complete ~50 scenarios
Temporal types (Date, DateTime, Time, Duration) ✅ Complete ~50 scenarios

Dataset Integration

Feature Status
95 SNAP datasets ✅ Complete
10 LDBC datasets ✅ Complete
10 NetworkRepository datasets ✅ Complete
GraphML loader ✅ Complete
Cypher script loader ✅ Complete
Zip compression support ✅ Complete
Zstandard compression support ✅ Complete

Actual TCK Compliance: ~29% (950+ scenarios) Total Datasets: 109+ validated datasets


❌ Not Supported (Out of Scope)

These features are out of scope for GraphForge's design goals: - ❌ Full-Text Search - db.index.fulltext.* - Reason: SQLite FTS could be added, but not core priority - Workaround: Use string matching (CONTAINS) or external FTS

Enterprise Features

  • User Management - CREATE USER, GRANT, REVOKE, roles
  • Reason: Embedded design, no multi-user access
  • Multi-Database - USE database, database switching
  • Reason: Single-database design, create multiple GraphForge instances if needed
  • Constraints (advanced) - UNIQUE, EXISTS, KEY constraints
  • Reason: Validation can be done in Python, limited benefit for analysis
  • Indexes (advanced) - CREATE INDEX, BTREE, HASH
  • Reason: SQLite provides indexing, but explicit index creation not exposed

Distributed Features

  • Sharding / Replication - Multi-node clusters
  • Reason: Single-node embedded design
  • Distributed Transactions - Cross-database ACID
  • Reason: SQLite ACID within single database only

Advanced Query Features

  • CALL Procedures - User-defined procedures, built-in procedures. Note: graph algorithms are not planned as Cypher procedures — they are exposed as db.gds.* Python methods instead (v0.4.0)
  • Reason: Could add in future, but Python functions are more natural
  • Workaround: Write Python functions, call from builder API
  • Label Expressions - :A|B (union), !:A (negation)
  • Reason: Low priority, can filter in WHERE
  • Map Projections - node {.property1, .property2}
  • Reason: Syntax sugar, not essential
  • FOREACH - Iterative updates
  • Reason: Low usage, can use UNWIND + SET

Graph Algorithms

  • Built-in Algorithms - PageRank, community detection, centrality
  • Reason: User can implement in Python or use NetworkX
  • v0.4.0: Use db.gds.* Python methods (e.g. db.gds.pagerank(write_property="pr")). For custom algorithms: to_networkx() + set_node_properties()

Comparison with Neo4j

Feature Category GraphForge v0.3.9 Neo4j
Core Clauses ✅ 100% ✅ 100%
Pattern Matching ✅ 100% ✅ 100%
Aggregations ✅ 100% ✅ 100%+
Scalar Functions ✅ 100% ✅ 100%+
Temporal Types ✅ Full (nanoseconds, IANA tz) ✅ Full support
Spatial Types ✅ Complete ✅ Full support
TCK Compliance ✅ 3,885/3,885 (100%) ~100%
Indexes ⚠️ SQLite automatic ✅ Explicit control
Constraints ❌ None ✅ Full support
Procedures ❌ None ✅ CALL + APOC
Deployment ✅ Embedded (pip install) ⚠️ Service (Docker/VM)
Setup Complexity ✅ Zero config ⚠️ Configuration needed
ACID Transactions ✅ SQLite ✅ Native
Scale ⚠️ < 10M nodes ✅ Billions of nodes
Multi-user ❌ Single process ✅ Full auth/RBAC

Summary: GraphForge is to Neo4j as SQLite is to PostgreSQL — a lightweight, embedded alternative for single-user analytical workflows, not a production database replacement.


TCK Compliance Details

The Technology Compatibility Kit (TCK) is the official openCypher test suite with 3,885 scenarios.

See TCK Compliance for the full v0.3.9 compliance report.

Current Coverage (v0.3.9)

3,885/3,885 scenarios passing (100%)

Zero failures. Zero expected failures. All 3,885 scenarios pass on every supported Python version (3.10–3.13).

All Passing Categories (v0.3.9)

  • ✅ MATCH, OPTIONAL MATCH, WHERE, RETURN, WITH, ORDER BY, LIMIT, SKIP
  • ✅ CREATE, SET, REMOVE, DELETE, DETACH DELETE, MERGE (ON CREATE/ON MATCH)
  • ✅ UNWIND, UNION, UNION ALL
  • ✅ CASE expressions (simple and generic)
  • ✅ Variable-length patterns ([*1..3], [*])
  • ✅ Path variables and path functions
  • ✅ EXISTS { } and COUNT { } subqueries
  • ✅ List comprehensions, pattern comprehensions
  • ✅ Predicate functions (all, any, none, single, exists, isEmpty)
  • ✅ All string, math, list, aggregation, graph, conversion functions
  • ✅ Temporal types with nanosecond precision and IANA timezone names
  • ✅ Extreme year dates (outside Python's 1–9999 range)
  • ✅ Spatial types (point, distance)
  • ✅ NULL propagation (three-valued logic throughout)

Usage Recommendations

✅ Good Use Cases for GraphForge

  • Notebook-based analysis - Jupyter, IPython, exploratory data analysis
  • Knowledge graph prototyping - Build and refine graph structures iteratively
  • LLM-powered graph generation - Store entity-relationship extractions
  • Data lineage tracking - Model data transformation pipelines
  • Small to medium graphs - 100k-1M nodes, 1M-10M relationships
  • Single-user workflows - No concurrent write access needed
  • Embedded applications - Package graph database with Python app
  • Teaching and learning - Learn Cypher without database setup

⚠️ Limited Use Cases

  • Full-text search — Use CONTAINS for simple matching, or external FTS
  • Very large graphs — full-scan queries practical up to ~1M edges; LIMIT-based traversal up to ~20M edges
  • Concurrent writes — SQLite single-writer limitation
  • High-throughput ingestion — use bulk_ingest() context manager for best throughput
  • Production web applications - Use Neo4j, Memgraph, or similar
  • Multi-tenant systems - No user management or security
  • Distributed queries - Single-node only
  • Real-time analytics - Limited optimization for high-throughput
  • Complex graph algorithms - Use db.gds.* for common algorithms; to_networkx() + set_node_properties() for custom ones
  • Mission-critical systems - Embedded design, no HA/replication

Roadmap

v0.4.0 is the current release.

Version Focus Status
v0.3.8 Full TCK compliance (3,885/3,885) Released (May 2026)
v0.3.9 Performance: LALR parser, indexes, bulk ingest, LIMIT short-circuit Released (May 2026)
v0.3.10 Analytics integration: NetworkX/igraph export, parse cache, add_graph_documents() Released (May 2026)
v0.4.0 Three-Surface API: db.gds.* algorithms + db.search.* hybrid retrieval Released (May 2026)
v0.5.0 Rust core: recursive-descent + Pratt parser, DataFusion execution, Arrow result streams, Parquet storage, Python + Node + Swift + Kotlin bindings In development (main branch)

Enterprise features remain permanently out of scope: - ❌ User management, multi-DB (incompatible with embedded design) - ❌ Distributed features (single-node architecture)


Contributing

Help build GraphForge! See: - GitHub Milestones - Contributing Guide - Issue Workflow

High-Impact Contributions

Check the GitHub issue tracker for open issues tagged good first issue or enhancement.


References

External Resources

  • openCypher Specification: https://opencypher.org/resources/
  • Neo4j Cypher Manual: https://neo4j.com/docs/cypher-manual/
  • openCypher TCK: https://github.com/opencypher/openCypher/tree/master/tck
  • GraphForge Issues: https://github.com/DecisionNerd/graphforge/issues

GraphForge Documentation


Last Updated: 2026-02-16 Maintained by: @DecisionNerd