OpenCypher Compatibility Status¶

Last Updated: 2026-05-07 GraphForge Version: v0.4.0

Executive Summary¶

As of v0.4.0, GraphForge passes all 3,885 openCypher TCK scenarios — 100% compliance with zero failures and zero expected failures. This is the first embedded Python graph database to achieve complete openCypher TCK compliance.

Current Status¶

Version	TCK Scenarios	Feature Completeness	Status
v0.1.4	638/3,837	~30%	Released
v0.2.0	638/3,837	~40%	Released
v0.2.1	638/3,837	~45%	Released
v0.3.0	1,303/1,626 (34%)	~78%	Released (February 2026)
v0.3.6	2,507/3,837 (65%)	~85%	Released (March 2026)
v0.3.7	3,235/3,885 (83.3%)	~88%	Released (April 2026)
v0.3.8	3,885/3,885 (100%)	100%	Released (May 2026)
v0.3.10	3,885/3,885 (100%)	100%	Released (May 2026)

Design Philosophy¶

GraphForge prioritizes: - ✅ Full openCypher language — all clauses, functions, operators, patterns - ✅ SQLite-backed persistence with ACID transactions - ✅ Zero-configuration embedded usage - ✅ Temporal/spatial types — nanosecond precision, IANA timezones, extreme years - ✅ Python-first — results are Python objects; integrates with pandas, NetworkX, LLM libraries - ❌ Full-text search (use CONTAINS or external FTS) - ❌ Multi-database / distributed features (single-node embedded design) - ❌ High-concurrency write workloads (SQLite single-writer)

Feature Matrix¶

All features below are fully implemented as of v0.3.9.

✅ Fully Supported (v0.1.4+)¶

Reading Clauses¶

MATCH - Basic pattern matching with node and relationship patterns
Single patterns: MATCH (n:Label)
Relationship patterns: MATCH (a)-[r:TYPE]->(b)
Multi-pattern: MATCH (a), (b)
Property filtering: MATCH (n {key: value})
WHERE - Predicate filtering with comparisons and logical operators
Comparisons: =, <>, <, >, <=, >=
Logical operators: AND, OR
Property access: n.property
NULL handling with ternary logic
RETURN - Projection with aliasing
Property projection: RETURN n.name AS name
Expressions: RETURN n.age + 5
DISTINCT: RETURN DISTINCT n.city
WITH - Query chaining and variable passing
Pipeline queries: MATCH ... WITH ... MATCH ...
Filtering: WITH n WHERE n.age > 18
ORDER BY - Sorting with multiple keys
Multi-key: ORDER BY n.age DESC, n.name ASC
NULL ordering: NULLs last by default
LIMIT / SKIP - Result pagination

Writing Clauses¶

CREATE - Node and relationship creation
Nodes: CREATE (n:Label {key: value})
Relationships: CREATE (a)-[r:TYPE {key: value}]->(b)
Multi-create: CREATE (a), (b), (a)-[:KNOWS]->(b)
SET - Property updates
Set property: SET n.key = value
Set multiple: SET n.key1 = value1, n.key2 = value2
Copy properties: SET n = m
DELETE - Node and relationship deletion
Delete nodes: DELETE n
Delete relationships: DELETE r
Constraint: Cannot delete node with relationships (use DETACH DELETE)
MERGE - Create-or-match patterns
Basic: MERGE (n:Label {key: value})
ON CREATE: MERGE (n) ON CREATE SET n.created = timestamp()
ON MATCH: MERGE (n) ON MATCH SET n.accessed = timestamp()

Aggregations¶

COUNT - Row counting
COUNT(*) - Count all rows
COUNT(expr) - Count non-NULL values
COUNT(DISTINCT expr) - Count distinct values
SUM - Numeric summation
AVG - Numeric average
MIN - Minimum value
MAX - Maximum value
Implicit GROUP BY - Non-aggregated columns become grouping keys

Scalar Functions¶

String Functions
length(str) - String length
substring(str, start, length) - Extract substring
toUpper(str) / toLower(str) - Case conversion
trim(str) - Remove whitespace
Type Conversion
toInteger(value) - Convert to integer
toFloat(value) - Convert to float
toString(value) - Convert to string
Utility Functions
coalesce(expr1, expr2, ...) - Return first non-NULL
type(relationship) - Get relationship type

Expressions & Operators¶

Comparison Operators: =, <>, <, >, <=, >=
Logical Operators: AND, OR (with NULL propagation)
Property Access: n.property, r.property
Literals: Integers, floats, strings, booleans, NULL, lists, maps
List Literals: [1, 2, 3], ['a', 'b', 'c']
Map Literals: {key: value, nested: {k: v}}

Data Types¶

CypherInt - 64-bit signed integers
CypherFloat - 64-bit floating point
CypherString - UTF-8 strings
CypherBool - Boolean (true/false)
CypherNull - NULL value
CypherList - Ordered lists (heterogeneous)
CypherMap - Key-value maps (nested structures)
NodeRef - Node references in query context
EdgeRef - Relationship references in query context

✅ Completed in v0.2.0 and v0.2.1¶

Released: February 2026

Feature	Version	Status
CASE expressions	v0.2.0	✅ Complete
COLLECT aggregation	v0.2.0	✅ Complete
Arithmetic operators (+, -, *, /, %)	v0.2.0	✅ Complete
String matching (STARTS WITH, ENDS WITH, CONTAINS)	v0.2.0	✅ Complete
REMOVE clause	v0.2.0	✅ Complete
NOT operator	v0.2.0	✅ Complete
UNWIND clause	v0.2.0	✅ Complete
DETACH DELETE	v0.2.0	✅ Complete
MERGE ON CREATE SET	v0.2.1	✅ Complete
MERGE ON MATCH SET	v0.2.1	✅ Complete
Dataset loading infrastructure	v0.2.1	✅ Complete
CSV edge-list loader	v0.2.1	✅ Complete
5 SNAP datasets	v0.2.1	✅ Complete

What v0.2.0 Will Enable¶

-- UNWIND: Iterate over lists
UNWIND [1, 2, 3] AS num
RETURN num

-- DETACH DELETE: Cascading deletion
MATCH (n:Temporary)
DETACH DELETE n

-- CASE: Conditional logic
MATCH (p:Person)
RETURN p.name,
       CASE
           WHEN p.age < 18 THEN 'minor'
           WHEN p.age < 65 THEN 'adult'
           ELSE 'senior'
       END AS category

-- REMOVE: Property/label removal
MATCH (n:Person)
REMOVE n.temporaryField, n:OldLabel

-- Arithmetic: Computations
MATCH (p:Person)
RETURN p.name, p.salary * 1.1 AS new_salary

-- COLLECT: Aggregate into lists
MATCH (p:Person)
RETURN p.city, COLLECT(p.name) AS residents

-- String matching: Text filtering
MATCH (p:Person)
WHERE p.email ENDS WITH '@example.com'
RETURN p

-- NOT: Logical negation
MATCH (p:Person)
WHERE NOT p.archived
RETURN p

✅ Completed in v0.3.0¶

Released: February 2026 | Release Notes

Major Cypher Features¶

Feature	Status	TCK Impact
OPTIONAL MATCH	✅ Complete	~150 scenarios
Variable-length patterns (`-[:TYPE*1..3]->`)	✅ Complete	~100 scenarios
List comprehensions	✅ Complete	~100 scenarios
Subqueries (EXISTS, COUNT)	✅ Complete	~100 scenarios
UNION / UNION ALL	✅ Complete	~30 scenarios
IS NULL / IS NOT NULL	✅ Complete	Integrated
Spatial types (Point, Distance)	✅ Complete	~50 scenarios
Temporal types (Date, DateTime, Time, Duration)	✅ Complete	~50 scenarios

Dataset Integration¶

Feature	Status
95 SNAP datasets	✅ Complete
10 LDBC datasets	✅ Complete
10 NetworkRepository datasets	✅ Complete
GraphML loader	✅ Complete
Cypher script loader	✅ Complete
Zip compression support	✅ Complete
Zstandard compression support	✅ Complete

Actual TCK Compliance: ~29% (950+ scenarios) Total Datasets: 109+ validated datasets

❌ Not Supported (Out of Scope)¶

These features are out of scope for GraphForge's design goals: - ❌ Full-Text Search - db.index.fulltext.* - Reason: SQLite FTS could be added, but not core priority - Workaround: Use string matching (CONTAINS) or external FTS

Enterprise Features¶

❌ User Management - CREATE USER, GRANT, REVOKE, roles
Reason: Embedded design, no multi-user access
❌ Multi-Database - USE database, database switching
Reason: Single-database design, create multiple GraphForge instances if needed
❌ Constraints (advanced) - UNIQUE, EXISTS, KEY constraints
Reason: Validation can be done in Python, limited benefit for analysis
❌ Indexes (advanced) - CREATE INDEX, BTREE, HASH
Reason: SQLite provides indexing, but explicit index creation not exposed

Distributed Features¶

❌ Sharding / Replication - Multi-node clusters
Reason: Single-node embedded design
❌ Distributed Transactions - Cross-database ACID
Reason: SQLite ACID within single database only

Advanced Query Features¶

❌ CALL Procedures - User-defined procedures, built-in procedures. Note: graph algorithms are not planned as Cypher procedures — they are exposed as db.gds.* Python methods instead (v0.4.0)
Reason: Could add in future, but Python functions are more natural
Workaround: Write Python functions, call from builder API
❌ Label Expressions - :A|B (union), !:A (negation)
Reason: Low priority, can filter in WHERE
❌ Map Projections - node {.property1, .property2}
Reason: Syntax sugar, not essential
❌ FOREACH - Iterative updates
Reason: Low usage, can use UNWIND + SET

Graph Algorithms¶

❌ Built-in Algorithms - PageRank, community detection, centrality
Reason: User can implement in Python or use NetworkX
v0.4.0: Use db.gds.* Python methods (e.g. db.gds.pagerank(write_property="pr")). For custom algorithms: to_networkx() + set_node_properties()

Comparison with Neo4j¶

Feature Category	GraphForge v0.3.9	Neo4j
Core Clauses	✅ 100%	✅ 100%
Pattern Matching	✅ 100%	✅ 100%
Aggregations	✅ 100%	✅ 100%+
Scalar Functions	✅ 100%	✅ 100%+
Temporal Types	✅ Full (nanoseconds, IANA tz)	✅ Full support
Spatial Types	✅ Complete	✅ Full support
TCK Compliance	✅ 3,885/3,885 (100%)	~100%
Indexes	⚠️ SQLite automatic	✅ Explicit control
Constraints	❌ None	✅ Full support
Procedures	❌ None	✅ CALL + APOC
Deployment	✅ Embedded (pip install)	⚠️ Service (Docker/VM)
Setup Complexity	✅ Zero config	⚠️ Configuration needed
ACID Transactions	✅ SQLite	✅ Native
Scale	⚠️ < 10M nodes	✅ Billions of nodes
Multi-user	❌ Single process	✅ Full auth/RBAC

Summary: GraphForge is to Neo4j as SQLite is to PostgreSQL — a lightweight, embedded alternative for single-user analytical workflows, not a production database replacement.

TCK Compliance Details¶

The Technology Compatibility Kit (TCK) is the official openCypher test suite with 3,885 scenarios.

See TCK Compliance for the full v0.3.9 compliance report.

Current Coverage (v0.3.9)¶

3,885/3,885 scenarios passing (100%)

Zero failures. Zero expected failures. All 3,885 scenarios pass on every supported Python version (3.10–3.13).

All Passing Categories (v0.3.9)¶

✅ MATCH, OPTIONAL MATCH, WHERE, RETURN, WITH, ORDER BY, LIMIT, SKIP
✅ CREATE, SET, REMOVE, DELETE, DETACH DELETE, MERGE (ON CREATE/ON MATCH)
✅ UNWIND, UNION, UNION ALL
✅ CASE expressions (simple and generic)
✅ Variable-length patterns ([*1..3], [*])
✅ Path variables and path functions
✅ EXISTS { } and COUNT { } subqueries
✅ List comprehensions, pattern comprehensions
✅ Predicate functions (all, any, none, single, exists, isEmpty)
✅ All string, math, list, aggregation, graph, conversion functions
✅ Temporal types with nanosecond precision and IANA timezone names
✅ Extreme year dates (outside Python's 1–9999 range)
✅ Spatial types (point, distance)
✅ NULL propagation (three-valued logic throughout)

Usage Recommendations¶

✅ Good Use Cases for GraphForge¶

Notebook-based analysis - Jupyter, IPython, exploratory data analysis
Knowledge graph prototyping - Build and refine graph structures iteratively
LLM-powered graph generation - Store entity-relationship extractions
Data lineage tracking - Model data transformation pipelines
Small to medium graphs - 100k-1M nodes, 1M-10M relationships
Single-user workflows - No concurrent write access needed
Embedded applications - Package graph database with Python app
Teaching and learning - Learn Cypher without database setup

⚠️ Limited Use Cases¶

Full-text search — Use CONTAINS for simple matching, or external FTS
Very large graphs — full-scan queries practical up to ~1M edges; LIMIT-based traversal up to ~20M edges
Concurrent writes — SQLite single-writer limitation
High-throughput ingestion — use bulk_ingest() context manager for best throughput

❌ Not Recommended¶

Production web applications - Use Neo4j, Memgraph, or similar
Multi-tenant systems - No user management or security
Distributed queries - Single-node only
Real-time analytics - Limited optimization for high-throughput
Complex graph algorithms - Use db.gds.* for common algorithms; to_networkx() + set_node_properties() for custom ones
Mission-critical systems - Embedded design, no HA/replication

Roadmap¶

v0.4.0 is the current release.

Version	Focus	Status
v0.3.8	Full TCK compliance (3,885/3,885)	Released (May 2026)
v0.3.9	Performance: LALR parser, indexes, bulk ingest, LIMIT short-circuit	Released (May 2026)
v0.3.10	Analytics integration: NetworkX/igraph export, parse cache, `add_graph_documents()`	Released (May 2026)
v0.4.0	Three-Surface API: `db.gds.` algorithms + `db.search.` hybrid retrieval	Released (May 2026)
v0.5.0	Rust core: recursive-descent + Pratt parser, DataFusion execution, Arrow result streams, Parquet storage, Python + Node + Swift + Kotlin bindings	In development (`main` branch)

Enterprise features remain permanently out of scope: - ❌ User management, multi-DB (incompatible with embedded design) - ❌ Distributed features (single-node architecture)

Contributing¶

Help build GraphForge! See: - GitHub Milestones - Contributing Guide - Issue Workflow

High-Impact Contributions¶

Check the GitHub issue tracker for open issues tagged good first issue or enhancement.

References¶

External Resources¶

openCypher Specification: https://opencypher.org/resources/
Neo4j Cypher Manual: https://neo4j.com/docs/cypher-manual/
openCypher TCK: https://github.com/opencypher/openCypher/tree/master/tck
GraphForge Issues: https://github.com/DecisionNerd/graphforge/issues

GraphForge Documentation¶

TCK Compliance - Full v0.3.10 TCK compliance report
API Reference - Complete Python API documentation

Last Updated: 2026-02-16 Maintained by: @DecisionNerd