openCypher AST & Logical Plan Spec¶
v1 (TCK-Aligned Subset)¶
1. Scope¶
This spec defines: - The internal AST representation for a declared openCypher subset - The semantic lowering rules from AST → Logical Plan - The Logical Plan operator set used by the execution engine
This is not a full openCypher spec. It targets the v1 supported surface area:
- MATCH
- WHERE
- RETURN
- SKIP
- LIMIT
Unsupported constructs MUST be rejected during validation.
2. Guiding Invariants¶
- Semantics first: If a feature is supported, its semantics MUST match openCypher expectations.
- Planner purity: The planner MUST NOT depend on storage implementation details.
- Deterministic lowering: The AST → Plan transform MUST be deterministic.
- Minimal operator set: Prefer a small, orthogonal set of operators.
- Schema orthogonality: Pydantic/JSON Schema models MUST NOT affect query semantics.
3. AST Model (Internal)¶
3.1 Top-Level¶
CypherQuery¶
clauses: list[Clause]
Clause (sum type)¶
MatchClauseWhereClauseReturnClauseSkipClauseLimitClause
Note: Although openCypher allows flexible clause ordering and
WITHpipelines, v1 enforces a simplified, valid subset ordering:MATCH→ optionalWHERE→RETURN→ optionalSKIP/LIMIT.
3.2 MATCH¶
MatchClause¶
patterns: list[Pattern]
Pattern¶
parts: list[PatternPart]
PatternPart¶
NodePattern|RelationshipPattern
NodePattern¶
var: VarName | None(e.g.,nin(n:Person))labels: set[str](e.g.,Person)properties: dict[str, Expr] | None
Property Map Semantics (v1)
- Property maps in node patterns (e.g. (n:Person {age: 30})) ARE SUPPORTED
- Property keys MUST be static identifiers
- Property values MUST be expressions evaluable at match time
- Property maps are semantic sugar for conjunctions of equality predicates
RelationshipPattern¶
var: VarName | Nonetypes: set[str]direction: Directionproperties: dict[str, Expr] | None
Property Map Semantics (v1) - Property maps in relationship patterns ARE SUPPORTED - Semantics mirror node property maps
Direction¶
OUT(e.g.-[:R]->)IN(e.g.<-[:R]-)UNDIRECTED(e.g.-[:R]-)
Explicit v1 Constraints
- Variable-length relationships (*) are NOT supported
- Path variables (e.g. p = (a)-[:R]->(b)) are NOT supported
3.3 WHERE¶
WhereClause¶
predicate: Expr
3.4 RETURN¶
ReturnClause¶
items: list[ReturnItem]distinct: bool
DISTINCT Semantics (v1)
- RETURN DISTINCT IS SUPPORTED
- DISTINCT applies to the full projected row
- DISTINCT is implemented as a logical operator after projection
ReturnItem¶
expr: Expralias: str | None
3.5 Expressions¶
Expressions are a typed tree.
Expr (sum type)¶
LiteralVarRefPropertyAccessBinaryOpUnaryOpFunctionCall(optional; initially very limited)
Literal¶
value: int | float | bool | str | None | list | dict
VarRef¶
name: str
PropertyAccess¶
base: Expr(typicallyVarRef)key: str(e.g.,n.age)
BinaryOp¶
op: strone of:= != < <= > >= AND ORleft: Exprright: Expr
UnaryOp¶
op: strone of:NOTexpr: Expr
FunctionCall (v1 minimal)¶
name: strargs: list[Expr]
Explicit v1 constraint - No aggregations - No list comprehensions - No pattern expressions in WHERE
4. Validation Rules (AST-Level)¶
Validation MUST reject queries that:
- Use unsupported clauses (WITH, OPTIONAL MATCH, MERGE, etc.)
- Use variable-length relationships (*)
- Use write clauses (CREATE, SET, DELETE)
- Use aggregations
- Use features not covered by the declared subset
Validation MUST also ensure: - Variables referenced in WHERE/RETURN are introduced in MATCH (per v1 scope rules) - Relationship directionality and types are well-formed
5. Logical Plan Model¶
The logical plan is a small operator graph (pipeline) that the execution engine can run.
5.1 Core Operator Set (v1)¶
NodeScan¶
- Inputs: none
- Outputs: rows with bound variable for node (e.g.,
{a: Node}) - Params:
var: strlabels: set[str] | None
Expand¶
- Inputs: rows with bound node var (e.g.,
{a: Node}) - Outputs: rows with added bindings (e.g.,
{a: Node, b: Node}) - Params:
from_var: strrel_var: str | Noneto_var: strtypes: set[str] | Nonedirection: Direction
UNDIRECTED Expansion Semantics (v1)
- For UNDIRECTED, expansion MUST consider both outgoing and incoming edges
- The same relationship MUST NOT be returned twice for a single expansion
Filter¶
- Inputs: rows
- Outputs: rows where predicate evaluates to TRUE
- Params:
predicate: CompiledExpr
Project¶
- Inputs: rows
- Outputs: projected rows
- Params:
items: list[ProjectedItem]
Distinct¶
- Inputs: projected rows
- Outputs: rows with duplicate projections removed
- Params: none
Skip¶
- Inputs: rows
- Outputs: rows after skipping N
- Params:
count: int
Limit¶
- Inputs: rows
- Outputs: first N rows
- Params:
count: int
6. Lowering Rules: AST → Logical Plan¶
Given a supported query of the form:
Lowering MUST produce a plan in the following shape:
- Choose an anchoring
NodeScanfor the first node variable in the first pattern (v1 heuristic) - Apply
Expandsteps following relationship direction for each hop in the pattern - Apply label constraints as early as possible:
- Prefer pushing node label checks into
NodeScanwhen the node is the scan anchor - Otherwise apply a
Filterimmediately after the binding is introduced - Compile
WHEREinto aFilter(if present) - Compile
RETURNintoProject - Apply
SKIPthenLIMITif present
6.1 Example Lowering¶
Cypher:
MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.age > 30 AND b.city = "Lehi"
RETURN b.name AS name
LIMIT 10
Plan:
- NodeScan(var="a", labels={"Person"})
- Expand(from_var="a", to_var="b", types={"KNOWS"}, direction=OUT)
- Filter(predicate = (label(b) == "Person")) (if labels not enforced elsewhere)
- Filter(predicate = (a.age > 30 AND b.city = "Lehi"))
- Project(items=[b.name AS name])
- Limit(10)
7. Expression Compilation¶
Expressions in WHERE and RETURN MUST be compiled into an evaluable form (CompiledExpr) that:
- Correctly implements openCypher null semantics (3-valued logic)
- Supports property access on node/relationship refs
- Avoids Python eval
A simple approach is an expression VM or a compiled callable tree.
8. Extension Points¶
8.1 DISTINCT¶
- If supported, implement as a
Distinctoperator afterProject. - If unsupported in v1, reject during validation.
8.2 Multiple MATCH Patterns¶
- v1 may initially support single linear patterns.
- Future: introduce
Join/CartesianProductoperators.
8.3 Optional MATCH¶
- Future: introduce
LeftJoinsemantics.
9. Deliverables for Implementation Phase¶
- AST dataclasses / Pydantic models for internal representation
- Parser adapter: openCypher parse tree → AST
- Validator for v1 subset
- Lowering: AST → Logical Plan
- Expression compiler with null semantics
- Golden tests + initial TCK subset wiring