Column Naming Behavior in GraphForge¶

Last Updated: 2026-02-01 Version: 0.1.3+

Summary¶

GraphForge follows the openCypher TCK specification for column naming in query results. This means that simple variable references preserve their variable names as column names.

Behavior¶

Simple Variable References¶

When you return a simple variable without an alias, the column name is the variable name:

MATCH (n:Person)
RETURN n

Result: Column name is "n" (not "col_0")

results = db.execute("MATCH (n:Person) RETURN n")
for row in results:
    node = row['n']  # Access using variable name

Explicit Aliases¶

When you provide an explicit alias, the column name is the alias:

MATCH (n:Person)
RETURN n AS person

Result: Column name is "person"

results = db.execute("MATCH (n:Person) RETURN n AS person")
for row in results:
    node = row['person']  # Access using alias

Complex Expressions¶

Complex expressions without aliases use col_N naming (where N is the 0-based index):

MATCH (n:Person)
RETURN n.name

Result: Column name is "col_0"

MATCH (n:Person)
RETURN n.name, n.age

Result: Column names are "col_0" and "col_1"

results = db.execute("MATCH (n:Person) RETURN n.name, n.age")
for row in results:
    name = row['col_0']
    age = row['col_1']

To avoid col_N naming, use explicit aliases:

MATCH (n:Person)
RETURN n.name AS name, n.age AS age

Result: Column names are "name" and "age"

Summary Table¶

Query	Column Name	Reasoning
`RETURN n`	`"n"`	Simple variable - use variable name
`RETURN n AS person`	`"person"`	Explicit alias - use alias
`RETURN n.name`	`"col_0"`	Complex expression - use col_N
`RETURN n.name AS name`	`"name"`	Explicit alias - use alias
`RETURN count(*)`	`"col_0"`	Aggregation - use col_N
`RETURN count(*) AS total`	`"total"`	Explicit alias - use alias

Why This Design?¶

openCypher TCK Compliance¶

This behavior aligns with the openCypher specification and matches how Neo4j and other Cypher implementations work:

// Neo4j behavior
MATCH (n) RETURN n
// Column name: "n"

GraphForge aims to be compatible with openCypher TCK (Technology Compatibility Kit) tests, which expect variable names to be preserved.

WITH Clause Correctness¶

The WITH clause requires variable names to be preserved through the query pipeline:

WITH p.name AS name
RETURN name

In this example: 1. WITH binds "name" to the value of p.name 2. RETURN sees a simple variable name 3. The result column must be "name" (not "col_0")

If we used col_0 for all unnamed items, the WITH clause would break.

Consistency with Neo4j¶

Users familiar with Neo4j expect this behavior. Maintaining compatibility makes GraphForge easier to learn and use.

Historical Context¶

Version History¶

v0.1.0 - v0.1.1: Used col_N for all unnamed return items
v0.1.2: Briefly changed to variable names, then reverted to col_N due to test failures
v0.1.3+: Permanently changed to variable names for openCypher TCK compliance

Why Did It Change?¶

The original col_N behavior was GraphForge's initial design decision. However:

WITH clause implementation revealed the issue: WITH requires variable names to flow through the pipeline
openCypher TCK compliance: The TCK expects variable names, not col_N
Neo4j compatibility: Matching Neo4j behavior improves user experience
Future-proofing: Aligning with standards now prevents migration pain later

Breaking Change¶

This is a breaking change from v0.1.2. If you have code that accesses columns by name:

Before (v0.1.2):

results = db.execute("MATCH (n) RETURN n")
node = results[0]['col_0']  # Old behavior

After (v0.1.3+):

results = db.execute("MATCH (n) RETURN n")
node = results[0]['n']  # New behavior

Migration: Update code that accesses result columns to use variable names instead of col_0.

Best Practices¶

Always Use Explicit Aliases¶

For production code, use explicit aliases to make column names clear and avoid ambiguity:

MATCH (p:Person)-[:KNOWS]->(friend:Person)
RETURN p.name AS person_name, friend.name AS friend_name

This makes your code: - More readable - clear what each column represents - More maintainable - column names won't change if variable names change - More portable - works across different Cypher implementations

Access by Variable Name¶

For simple queries, access columns by variable name:

results = db.execute("MATCH (n:Person) RETURN n")
for row in results:
    node = row['n']  # Use variable name

Use List Access for Complex Expressions¶

If you don't want to use aliases, remember that col_N indices are 0-based:

results = db.execute("MATCH (n:Person) RETURN n.name, n.age")
for row in results:
    name = row['col_0']  # First expression
    age = row['col_1']   # Second expression

But explicit aliases are strongly recommended for clarity:

results = db.execute("MATCH (n:Person) RETURN n.name AS name, n.age AS age")
for row in results:
    name = row['name']  # Much clearer!
    age = row['age']

Implementation Details¶

Code Location¶

The column naming logic is implemented in src/graphforge/executor/executor.py in the _execute_project method:

if return_item.alias:
    # Explicit alias provided - use it
    key = return_item.alias
elif isinstance(return_item.expression, Variable):
    # Simple variable reference - use variable name as column name
    # This preserves names from WITH clauses
    key = return_item.expression.name
else:
    # Complex expression without alias - use default column naming
    key = f"col_{i}"

Testing¶

This behavior is tested in: - tests/integration/test_e2e_queries.py - End-to-end query tests - tests/integration/test_with_clause.py - WITH clause tests - tests/integration/test_persistence.py - Persistence tests

All 153 integration tests verify this behavior.