OpenCypher Compatibility Status¶
Last Updated: 2026-05-07 GraphForge Version: v0.4.0
Executive Summary¶
As of v0.4.0, GraphForge passes all 3,885 openCypher TCK scenarios — 100% compliance with zero failures and zero expected failures. This is the first embedded Python graph database to achieve complete openCypher TCK compliance.
Current Status¶
| Version | TCK Scenarios | Feature Completeness | Status |
|---|---|---|---|
| v0.1.4 | 638/3,837 | ~30% | Released |
| v0.2.0 | 638/3,837 | ~40% | Released |
| v0.2.1 | 638/3,837 | ~45% | Released |
| v0.3.0 | 1,303/1,626 (34%) | ~78% | Released (February 2026) |
| v0.3.6 | 2,507/3,837 (65%) | ~85% | Released (March 2026) |
| v0.3.7 | 3,235/3,885 (83.3%) | ~88% | Released (April 2026) |
| v0.3.8 | 3,885/3,885 (100%) | 100% | Released (May 2026) |
| v0.3.10 | 3,885/3,885 (100%) | 100% | Released (May 2026) |
Design Philosophy¶
GraphForge prioritizes: - ✅ Full openCypher language — all clauses, functions, operators, patterns - ✅ SQLite-backed persistence with ACID transactions - ✅ Zero-configuration embedded usage - ✅ Temporal/spatial types — nanosecond precision, IANA timezones, extreme years - ✅ Python-first — results are Python objects; integrates with pandas, NetworkX, LLM libraries - ❌ Full-text search (use CONTAINS or external FTS) - ❌ Multi-database / distributed features (single-node embedded design) - ❌ High-concurrency write workloads (SQLite single-writer)
Feature Matrix¶
All features below are fully implemented as of v0.3.9.
✅ Fully Supported (v0.1.4+)¶
Reading Clauses¶
- MATCH - Basic pattern matching with node and relationship patterns
- Single patterns:
MATCH (n:Label) - Relationship patterns:
MATCH (a)-[r:TYPE]->(b) - Multi-pattern:
MATCH (a), (b) - Property filtering:
MATCH (n {key: value}) - WHERE - Predicate filtering with comparisons and logical operators
- Comparisons:
=,<>,<,>,<=,>= - Logical operators:
AND,OR - Property access:
n.property - NULL handling with ternary logic
- RETURN - Projection with aliasing
- Property projection:
RETURN n.name AS name - Expressions:
RETURN n.age + 5 - DISTINCT:
RETURN DISTINCT n.city - WITH - Query chaining and variable passing
- Pipeline queries:
MATCH ... WITH ... MATCH ... - Filtering:
WITH n WHERE n.age > 18 - ORDER BY - Sorting with multiple keys
- Multi-key:
ORDER BY n.age DESC, n.name ASC - NULL ordering: NULLs last by default
- LIMIT / SKIP - Result pagination
Writing Clauses¶
- CREATE - Node and relationship creation
- Nodes:
CREATE (n:Label {key: value}) - Relationships:
CREATE (a)-[r:TYPE {key: value}]->(b) - Multi-create:
CREATE (a), (b), (a)-[:KNOWS]->(b) - SET - Property updates
- Set property:
SET n.key = value - Set multiple:
SET n.key1 = value1, n.key2 = value2 - Copy properties:
SET n = m - DELETE - Node and relationship deletion
- Delete nodes:
DELETE n - Delete relationships:
DELETE r - Constraint: Cannot delete node with relationships (use DETACH DELETE)
- MERGE - Create-or-match patterns
- Basic:
MERGE (n:Label {key: value}) - ON CREATE:
MERGE (n) ON CREATE SET n.created = timestamp() - ON MATCH:
MERGE (n) ON MATCH SET n.accessed = timestamp()
Aggregations¶
- COUNT - Row counting
COUNT(*)- Count all rowsCOUNT(expr)- Count non-NULL valuesCOUNT(DISTINCT expr)- Count distinct values- SUM - Numeric summation
- AVG - Numeric average
- MIN - Minimum value
- MAX - Maximum value
- Implicit GROUP BY - Non-aggregated columns become grouping keys
Scalar Functions¶
- String Functions
length(str)- String lengthsubstring(str, start, length)- Extract substringtoUpper(str)/toLower(str)- Case conversiontrim(str)- Remove whitespace- Type Conversion
toInteger(value)- Convert to integertoFloat(value)- Convert to floattoString(value)- Convert to string- Utility Functions
coalesce(expr1, expr2, ...)- Return first non-NULLtype(relationship)- Get relationship type
Expressions & Operators¶
- Comparison Operators:
=,<>,<,>,<=,>= - Logical Operators:
AND,OR(with NULL propagation) - Property Access:
n.property,r.property - Literals: Integers, floats, strings, booleans, NULL, lists, maps
- List Literals:
[1, 2, 3],['a', 'b', 'c'] - Map Literals:
{key: value, nested: {k: v}}
Data Types¶
- CypherInt - 64-bit signed integers
- CypherFloat - 64-bit floating point
- CypherString - UTF-8 strings
- CypherBool - Boolean (true/false)
- CypherNull - NULL value
- CypherList - Ordered lists (heterogeneous)
- CypherMap - Key-value maps (nested structures)
- NodeRef - Node references in query context
- EdgeRef - Relationship references in query context
✅ Completed in v0.2.0 and v0.2.1¶
Released: February 2026
| Feature | Version | Status |
|---|---|---|
| CASE expressions | v0.2.0 | ✅ Complete |
| COLLECT aggregation | v0.2.0 | ✅ Complete |
| Arithmetic operators (+, -, *, /, %) | v0.2.0 | ✅ Complete |
| String matching (STARTS WITH, ENDS WITH, CONTAINS) | v0.2.0 | ✅ Complete |
| REMOVE clause | v0.2.0 | ✅ Complete |
| NOT operator | v0.2.0 | ✅ Complete |
| UNWIND clause | v0.2.0 | ✅ Complete |
| DETACH DELETE | v0.2.0 | ✅ Complete |
| MERGE ON CREATE SET | v0.2.1 | ✅ Complete |
| MERGE ON MATCH SET | v0.2.1 | ✅ Complete |
| Dataset loading infrastructure | v0.2.1 | ✅ Complete |
| CSV edge-list loader | v0.2.1 | ✅ Complete |
| 5 SNAP datasets | v0.2.1 | ✅ Complete |
What v0.2.0 Will Enable¶
-- UNWIND: Iterate over lists
UNWIND [1, 2, 3] AS num
RETURN num
-- DETACH DELETE: Cascading deletion
MATCH (n:Temporary)
DETACH DELETE n
-- CASE: Conditional logic
MATCH (p:Person)
RETURN p.name,
CASE
WHEN p.age < 18 THEN 'minor'
WHEN p.age < 65 THEN 'adult'
ELSE 'senior'
END AS category
-- REMOVE: Property/label removal
MATCH (n:Person)
REMOVE n.temporaryField, n:OldLabel
-- Arithmetic: Computations
MATCH (p:Person)
RETURN p.name, p.salary * 1.1 AS new_salary
-- COLLECT: Aggregate into lists
MATCH (p:Person)
RETURN p.city, COLLECT(p.name) AS residents
-- String matching: Text filtering
MATCH (p:Person)
WHERE p.email ENDS WITH '@example.com'
RETURN p
-- NOT: Logical negation
MATCH (p:Person)
WHERE NOT p.archived
RETURN p
✅ Completed in v0.3.0¶
Released: February 2026 | Release Notes
Major Cypher Features¶
| Feature | Status | TCK Impact |
|---|---|---|
| OPTIONAL MATCH | ✅ Complete | ~150 scenarios |
Variable-length patterns (-[:TYPE*1..3]->) |
✅ Complete | ~100 scenarios |
| List comprehensions | ✅ Complete | ~100 scenarios |
| Subqueries (EXISTS, COUNT) | ✅ Complete | ~100 scenarios |
| UNION / UNION ALL | ✅ Complete | ~30 scenarios |
| IS NULL / IS NOT NULL | ✅ Complete | Integrated |
| Spatial types (Point, Distance) | ✅ Complete | ~50 scenarios |
| Temporal types (Date, DateTime, Time, Duration) | ✅ Complete | ~50 scenarios |
Dataset Integration¶
| Feature | Status |
|---|---|
| 95 SNAP datasets | ✅ Complete |
| 10 LDBC datasets | ✅ Complete |
| 10 NetworkRepository datasets | ✅ Complete |
| GraphML loader | ✅ Complete |
| Cypher script loader | ✅ Complete |
| Zip compression support | ✅ Complete |
| Zstandard compression support | ✅ Complete |
Actual TCK Compliance: ~29% (950+ scenarios) Total Datasets: 109+ validated datasets
❌ Not Supported (Out of Scope)¶
These features are out of scope for GraphForge's design goals:
- ❌ Full-Text Search - db.index.fulltext.*
- Reason: SQLite FTS could be added, but not core priority
- Workaround: Use string matching (CONTAINS) or external FTS
Enterprise Features¶
- ❌ User Management - CREATE USER, GRANT, REVOKE, roles
- Reason: Embedded design, no multi-user access
- ❌ Multi-Database - USE database, database switching
- Reason: Single-database design, create multiple GraphForge instances if needed
- ❌ Constraints (advanced) - UNIQUE, EXISTS, KEY constraints
- Reason: Validation can be done in Python, limited benefit for analysis
- ❌ Indexes (advanced) - CREATE INDEX, BTREE, HASH
- Reason: SQLite provides indexing, but explicit index creation not exposed
Distributed Features¶
- ❌ Sharding / Replication - Multi-node clusters
- Reason: Single-node embedded design
- ❌ Distributed Transactions - Cross-database ACID
- Reason: SQLite ACID within single database only
Advanced Query Features¶
- ❌ CALL Procedures - User-defined procedures, built-in procedures. Note: graph algorithms are not planned as Cypher procedures — they are exposed as
db.gds.*Python methods instead (v0.4.0) - Reason: Could add in future, but Python functions are more natural
- Workaround: Write Python functions, call from builder API
- ❌ Label Expressions -
:A|B(union),!:A(negation) - Reason: Low priority, can filter in WHERE
- ❌ Map Projections -
node {.property1, .property2} - Reason: Syntax sugar, not essential
- ❌ FOREACH - Iterative updates
- Reason: Low usage, can use UNWIND + SET
Graph Algorithms¶
- ❌ Built-in Algorithms - PageRank, community detection, centrality
- Reason: User can implement in Python or use NetworkX
- v0.4.0: Use
db.gds.*Python methods (e.g.db.gds.pagerank(write_property="pr")). For custom algorithms:to_networkx()+set_node_properties()
Comparison with Neo4j¶
| Feature Category | GraphForge v0.3.9 | Neo4j |
|---|---|---|
| Core Clauses | ✅ 100% | ✅ 100% |
| Pattern Matching | ✅ 100% | ✅ 100% |
| Aggregations | ✅ 100% | ✅ 100%+ |
| Scalar Functions | ✅ 100% | ✅ 100%+ |
| Temporal Types | ✅ Full (nanoseconds, IANA tz) | ✅ Full support |
| Spatial Types | ✅ Complete | ✅ Full support |
| TCK Compliance | ✅ 3,885/3,885 (100%) | ~100% |
| Indexes | ⚠️ SQLite automatic | ✅ Explicit control |
| Constraints | ❌ None | ✅ Full support |
| Procedures | ❌ None | ✅ CALL + APOC |
| Deployment | ✅ Embedded (pip install) | ⚠️ Service (Docker/VM) |
| Setup Complexity | ✅ Zero config | ⚠️ Configuration needed |
| ACID Transactions | ✅ SQLite | ✅ Native |
| Scale | ⚠️ < 10M nodes | ✅ Billions of nodes |
| Multi-user | ❌ Single process | ✅ Full auth/RBAC |
Summary: GraphForge is to Neo4j as SQLite is to PostgreSQL — a lightweight, embedded alternative for single-user analytical workflows, not a production database replacement.
TCK Compliance Details¶
The Technology Compatibility Kit (TCK) is the official openCypher test suite with 3,885 scenarios.
See TCK Compliance for the full v0.3.9 compliance report.
Current Coverage (v0.3.9)¶
3,885/3,885 scenarios passing (100%)
Zero failures. Zero expected failures. All 3,885 scenarios pass on every supported Python version (3.10–3.13).
All Passing Categories (v0.3.9)¶
- ✅ MATCH, OPTIONAL MATCH, WHERE, RETURN, WITH, ORDER BY, LIMIT, SKIP
- ✅ CREATE, SET, REMOVE, DELETE, DETACH DELETE, MERGE (ON CREATE/ON MATCH)
- ✅ UNWIND, UNION, UNION ALL
- ✅ CASE expressions (simple and generic)
- ✅ Variable-length patterns (
[*1..3],[*]) - ✅ Path variables and path functions
- ✅ EXISTS { } and COUNT { } subqueries
- ✅ List comprehensions, pattern comprehensions
- ✅ Predicate functions (all, any, none, single, exists, isEmpty)
- ✅ All string, math, list, aggregation, graph, conversion functions
- ✅ Temporal types with nanosecond precision and IANA timezone names
- ✅ Extreme year dates (outside Python's 1–9999 range)
- ✅ Spatial types (point, distance)
- ✅ NULL propagation (three-valued logic throughout)
Usage Recommendations¶
✅ Good Use Cases for GraphForge¶
- Notebook-based analysis - Jupyter, IPython, exploratory data analysis
- Knowledge graph prototyping - Build and refine graph structures iteratively
- LLM-powered graph generation - Store entity-relationship extractions
- Data lineage tracking - Model data transformation pipelines
- Small to medium graphs - 100k-1M nodes, 1M-10M relationships
- Single-user workflows - No concurrent write access needed
- Embedded applications - Package graph database with Python app
- Teaching and learning - Learn Cypher without database setup
⚠️ Limited Use Cases¶
- Full-text search — Use CONTAINS for simple matching, or external FTS
- Very large graphs — full-scan queries practical up to ~1M edges; LIMIT-based traversal up to ~20M edges
- Concurrent writes — SQLite single-writer limitation
- High-throughput ingestion — use
bulk_ingest()context manager for best throughput
❌ Not Recommended¶
- Production web applications - Use Neo4j, Memgraph, or similar
- Multi-tenant systems - No user management or security
- Distributed queries - Single-node only
- Real-time analytics - Limited optimization for high-throughput
- Complex graph algorithms - Use
db.gds.*for common algorithms;to_networkx()+set_node_properties()for custom ones - Mission-critical systems - Embedded design, no HA/replication
Roadmap¶
v0.4.0 is the current release.
| Version | Focus | Status |
|---|---|---|
| v0.3.8 | Full TCK compliance (3,885/3,885) | Released (May 2026) |
| v0.3.9 | Performance: LALR parser, indexes, bulk ingest, LIMIT short-circuit | Released (May 2026) |
| v0.3.10 | Analytics integration: NetworkX/igraph export, parse cache, add_graph_documents() |
Released (May 2026) |
| v0.4.0 | Three-Surface API: db.gds.* algorithms + db.search.* hybrid retrieval |
Released (May 2026) |
| v0.5.0 | Rust core: recursive-descent + Pratt parser, DataFusion execution, Arrow result streams, Parquet storage, Python + Node + Swift + Kotlin bindings | In development (main branch) |
Enterprise features remain permanently out of scope: - ❌ User management, multi-DB (incompatible with embedded design) - ❌ Distributed features (single-node architecture)
Contributing¶
Help build GraphForge! See: - GitHub Milestones - Contributing Guide - Issue Workflow
High-Impact Contributions¶
Check the GitHub issue tracker for open issues tagged good first issue or enhancement.
References¶
External Resources¶
- openCypher Specification: https://opencypher.org/resources/
- Neo4j Cypher Manual: https://neo4j.com/docs/cypher-manual/
- openCypher TCK: https://github.com/opencypher/openCypher/tree/master/tck
- GraphForge Issues: https://github.com/DecisionNerd/graphforge/issues
GraphForge Documentation¶
- TCK Compliance - Full v0.3.10 TCK compliance report
- API Reference - Complete Python API documentation
Last Updated: 2026-02-16 Maintained by: @DecisionNerd