Testing Strategy & Infrastructure¶
Overview¶
GraphForge has two test suites that must both pass:
| Suite | Location | What it tests |
|---|---|---|
Rust tests (cargo test) |
crates/*/src/ |
Each Rust crate in isolation and integration |
Python tests (pytest) |
tests/ |
Python binding, end-to-end queries, TCK compliance |
The testing principles are the same for both:
- Spec-driven correctness — openCypher semantics verified via TCK
- Fast feedback loops — unit tests run in milliseconds
- Hermetic tests — no shared state between tests
- Deterministic behavior — tests pass or fail consistently
Rust Tests¶
Structure¶
Each crate contains unit tests inline with the source and integration tests in tests/:
crates/gf-cypher/
├── src/
│ ├── lexer.rs # #[cfg(test)] inline unit tests
│ ├── parser.rs # #[cfg(test)] inline unit tests
│ └── lib.rs
└── tests/
└── parse_corpus.rs # end-to-end parse tests against golden corpus
Running¶
# All crates
cargo test --workspace
# One crate
cargo test -p gf-cypher
# With output
cargo test --workspace -- --nocapture
# Only doctests
cargo test --doc --workspace
Rust test example¶
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn node_scan_roundtrip() {
let op = GraphOp::NodeScan { var: VarId(0), ty: TypeId(1) };
let json = serde_json::to_string(&op).unwrap();
let back: GraphOp = serde_json::from_str(&json).unwrap();
assert_eq!(op, back);
}
}
Parser differential corpus¶
The parser migration strategy requires differential testing between the Python LALR(1) parser and the LALRPOP Rust parser. The corpus lives in tests/parser_corpus/ and includes:
- Valid queries (from the TCK and real-world examples)
- Invalid queries (error recovery cases)
- Precedence edge cases
- Unicode identifiers
- Parameter syntax
- Comments
Differential tests run both parsers on the same input and assert AST parity:
cargo test -p gf-cypher -- differential
Python Tests¶
Test Categories¶
1. Unit Tests (tests/unit/)¶
Test individual components in isolation.
tests/unit/
├── parser/
├── planner/
├── executor/
├── storage/
├── algorithms/
├── search/
└── recipes/
Characteristics: no I/O, < 1 ms per test, ≥90% coverage target.
2. Integration Tests (tests/integration/)¶
Test full query pipeline (parse → plan → execute), persistence, transactions, and the Python API surface.
Characteristics: may use temporary databases, < 100 ms per test.
3. openCypher TCK Tests (tests/tck/)¶
Official openCypher Technology Compatibility Kit. 3,885 scenarios; 100% passing
on main. TCK is a hard merge gate for the rust-core branch too.
tests/tck/
├── conftest.py
├── coverage_matrix.json
└── features/
4. Property-Based Tests (tests/property/)¶
Hypothesis-driven generative tests for value semantics, expression evaluation, and storage consistency invariants.
5. Performance Benchmarks (tests/benchmarks/)¶
Real-dataset benchmarks tracked over time. Not part of the standard CI run.
Pytest Configuration¶
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = ["-ra", "--strict-markers", "--tb=short", "-v"]
markers = [
"unit: unit tests (fast, isolated)",
"integration: integration tests (may use I/O)",
"tck: openCypher TCK compliance tests",
"property: property-based tests",
"benchmark: performance benchmarks",
"slow: tests that take >1s",
]
Running Python Tests¶
# All tests
make test
# By category
make test-unit
make test-integration
make test-tck
# With coverage
make coverage # run + validate thresholds (≥85% total, ≥90% patch)
make coverage-report # open HTML report
make coverage-diff # changed files only
# Parallel (4× faster for TCK)
pytest tests/ -n auto
Core Fixtures (tests/conftest.py)¶
@pytest.fixture
def db():
"""Fresh in-memory GraphForge instance."""
return GraphForge()
@pytest.fixture
def tmp_db(tmp_path):
"""GraphForge instance backed by a temporary Parquet directory."""
return GraphForge(str(tmp_path / "graph"))
Quality Gates¶
Coverage Requirements¶
| Scope | Threshold |
|---|---|
| Total codebase | ≥85% |
| Patch (new/changed lines) | ≥90% |
| Core modules (executor, planner, parser) | ≥90% |
Required Checks (all PRs)¶
cargo clippy --workspace -- -D warnings— zero warningscargo test --workspace— all Rust tests passpytest -m unit— all Python unit tests passpytest -m integration— all Python integration tests passpytest -m tck— all non-skipped TCK scenarios passmake coverage— coverage thresholds metmake lintandmake type-check— zero issues
TCK Coverage Matrix¶
Maintain tests/tck/coverage_matrix.json:
{
"tck_version": "2024.2",
"features": {
"Match1_Nodes": {
"status": "supported",
"scenarios": {
"Match single node": "pass",
"Match node with label": "pass"
}
},
"Match3_VariableLength": {
"status": "supported"
}
}
}
When the Rust core implements a feature, verify the corresponding TCK scenarios
pass end-to-end before marking "status": "supported".
CI/CD¶
GitHub Actions runs the full suite on every PR:
jobs:
rust:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- run: cargo clippy --workspace -- -D warnings
- run: cargo test --workspace
python:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.10", "3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install uv && uv sync --all-extras
- run: maturin develop --release
- run: make pre-push
Known Issues¶
pytest-xdist + pytest-cov deadlock on macOS / Python 3.13¶
Symptom: make pre-push hangs at the end of the test run — progress reaches
~100% then freezes. CPU drops to 0%. Only kill escapes it.
Root cause: pytest-cov collects coverage data from xdist workers via IPC
sockets. When workers close their sockets, a coverage collection thread in the
main process blocks on read(), deadlocking with the main thread. Reproduced on
macOS (Darwin 25.x) + Python 3.13 + pytest-cov 7.0.0 + pytest-xdist 3.x.
Solution (current Makefile): Run coverage serially, skipping SNAP tests:
coverage:
uv run pytest tests/unit tests/integration -m "not snap" \
--cov=src --cov-branch \
--cov-report=term-missing --cov-report=xml
The serial run is ~60 s slower than the parallel baseline but avoids the deadlock.
If the upstream pytest-cov / pytest-xdist fix lands, re-evaluate.