Contributing to GraphForge¶

Thank you for your interest in contributing to GraphForge! This document provides guidelines and instructions for contributing.

Development Setup¶

Prerequisites¶

Python 3.10 or newer
uv (recommended) or pip

Getting Started¶

Clone the repository

git clone https://github.com/DecisionNerd/graphforge.git
cd graphforge

Install dependencies

# Using uv (recommended)
uv sync --all-extras

# Or using pip
pip install -e ".[dev]"

Verify installation
```
pytest -m unit
```

Development Workflow¶

Before Pushing Code¶

Always run this before pushing:

make pre-push

This runs the complete validation suite: - ✅ Code formatting checks (ruff format --check) - ✅ Linting (ruff check) - ✅ Type checking (mypy) - ✅ Tests with coverage measurement - ✅ Coverage threshold validation (≥85%)

If all checks pass, you're ready to push!

Running Tests¶

# Run all tests (use make pre-push for full validation)
make test

# Run specific categories
make test-unit           # Fast unit tests only
make test-integration    # Integration tests only
make test-tck            # TCK compliance tests
make test-tck-parallel   # TCK tests in parallel (faster, uses all CPU cores)

# Run with coverage reporting
make coverage            # Run tests + generate coverage reports
make coverage-report     # Open HTML coverage report in browser
make coverage-diff       # Show coverage for changed files only

# For new features, check against stricter threshold
make coverage-strict     # Requires ≥90% coverage

Direct pytest usage (if needed):

pytest -m unit           # Fast unit tests
pytest -m integration    # Integration tests
pytest -m tck            # TCK compliance tests
pytest -n auto           # Run in parallel

Code Quality¶

# Format code
make format              # Format all code with ruff

# Check without modifying
make format-check        # Verify formatting
make lint                # Run linting checks
make type-check          # Run mypy type checking

# View all available targets
make help

Direct ruff usage (if needed):

ruff format .            # Format code
ruff check .             # Lint code
ruff check --fix .       # Auto-fix issues

Coverage Requirements¶

GraphForge maintains high test coverage standards:

Project coverage: ≥85% of entire codebase (checked by make pre-push)
Patch coverage: ≥80% of new/changed lines (checked by codecov in CI)

Best practice: Aim for 100% coverage of new code to ensure both thresholds pass.

Coverage workflow: 1. Write your code with tests 2. Run make coverage to see coverage report 3. Check missing lines with make coverage-report (opens HTML) 4. Add tests for uncovered lines 5. Run make pre-push to validate before pushing

Project Structure¶

graphforge/
├── src/graphforge/          # Main package code
│   ├── __init__.py
│   └── main.py
├── tests/                   # Test suite
│   ├── unit/               # Unit tests
│   ├── integration/        # Integration tests
│   ├── tck/                # TCK compliance tests
│   └── property/           # Property-based tests
├── docs/                    # Documentation
│   ├── 0-requirements.md
│   └── testing-strategy.md
└── pyproject.toml          # Project configuration

Testing Guidelines¶

Writing Tests¶

Unit tests - Test components in isolation

import pytest

@pytest.mark.unit
def test_node_creation():
    node = Node(labels={"Person"})
    assert "Person" in node.labels

Integration tests - Test component interactions

import pytest

@pytest.mark.integration
def test_query_execution(db):
    result = db.execute("MATCH (n) RETURN n")
    assert result is not None

Use fixtures - Leverage existing fixtures for common setup

def test_with_temp_db(tmp_db_path):
    db = GraphForge(tmp_db_path)
    # Test logic

Test Quality Standards¶

Fast: Unit tests should run in < 1ms
Isolated: No shared state between tests
Deterministic: Same input = same output
Clear: Test names describe what is being tested
Maintainable: Easy to update when requirements change

See testing.md for comprehensive testing documentation.

Code Style¶

General Guidelines¶

Follow PEP 8 conventions
Use type hints for function signatures
Keep functions focused and small
Write docstrings for public APIs
Prefer explicit over implicit

Example¶

from typing import Optional


def create_node(
    labels: set[str],
    properties: Optional[dict[str, Any]] = None,
) -> Node:
    """Create a new node with labels and properties.

    Args:
        labels: Set of node labels
        properties: Optional property map

    Returns:
        A new Node instance

    Raises:
        ValueError: If labels is empty
    """
    if not labels:
        raise ValueError("Node must have at least one label")

    return Node(labels=labels, properties=properties or {})

Pull Request Process¶

PR Size Guidelines¶

IMPORTANT: Keep PRs small and focused.

CI/CD tools (CodeRabbit, GitHub Actions) work best with small, reviewable PRs. Large PRs are: - Harder to review thoroughly - More likely to introduce bugs - Slower to get merged - More difficult for CI/CD tools to process

Good PR size: - ✅ Single feature or bug fix - ✅ 50-300 lines of code changed - ✅ 1-5 files modified - ✅ Reviewable in < 30 minutes - ✅ Clear, focused purpose

Too large: - ❌ Multiple unrelated changes - ❌ 1,000+ lines changed - ❌ Refactoring + new feature + bug fixes combined - ❌ Takes > 1 hour to review

How to keep PRs small: 1. Break large features into smaller PRs 2. Submit infrastructure changes separately from features 3. Refactor in one PR, add features in another 4. Use feature flags for incomplete features

Example breakdown:

❌ Bad: "Add WITH clause support" (2,000 lines, 20 files)

✅ Good: Break into multiple PRs
  PR 1: "Add WITH AST nodes and parser support" (200 lines)
  PR 2: "Add WITH planner operators" (150 lines)
  PR 3: "Add WITH executor logic" (180 lines)
  PR 4: "Add WITH integration tests" (120 lines)

No Bandaid Fixes¶

Fix problems properly, not with temporary workarounds.

When you encounter an issue:

❌ Bad approach (bandaids): - Add # type: ignore without understanding the issue - Comment out failing tests - Add workarounds instead of fixing root causes - Use try/except to hide errors - Skip CI checks temporarily

✅ Good approach (proper fixes): - Investigate the root cause - Fix the underlying problem - Add tests to prevent regression - Document why the fix is correct - Update related code to be consistent

Example:

# ❌ BANDAID - hides the real issue
try:
    result = process_data(input)
except Exception:
    result = None  # Hope this works...

# ✅ PROPER FIX - addresses root cause
def process_data(input: str | None) -> Result | None:
    """Process data with proper null handling."""
    if input is None:
        return None  # Explicitly handle null case

    try:
        return _parse_and_validate(input)
    except ValidationError as e:
        raise ValueError(f"Invalid input: {e}") from e

When you're tempted to add a bandaid, ask: 1. Why is this failing? 2. What's the root cause? 3. How can I fix it properly? 4. What tests will prevent this from happening again?

Creating a PR¶

Create a feature branch

git checkout -b feature/your-feature-name

Make your changes
Write tests first (TDD)
Implement the feature
Update documentation
Add tests to verify behavior
Ensure quality
```
make pre-push
```
This validates everything: formatting, linting, types, tests, and coverage.

Commit your changes

git add .
git commit -m "Add feature: brief description"

Push and create PR
```
git push origin feature/your-feature-name
```
Then create a pull request on GitHub.

PR Requirements¶

All PRs must: - ✅ Pass all CI checks (no exceptions) - ✅ Include tests for new functionality - ✅ Maintain or improve code coverage (≥85%) - ✅ Update relevant documentation - ✅ Follow the code style guidelines - ✅ Have a clear description of changes - ✅ Be small and focused (< 300 lines preferred) - ✅ Fix issues properly, not with bandaids - ✅ No failing tests (fix or remove them) - ✅ No # type: ignore without explanation - ✅ No skipped CI checks

Design Principles¶

When contributing, keep these principles in mind:

Spec-driven correctness - openCypher semantics over performance
Deterministic behavior - Stable results across runs
Inspectable - Observable query plans and execution
Minimal dependencies - Keep the dependency tree small
Python-first - Optimize for Python workflows

See requirements.md for complete requirements.

openCypher TCK Compliance¶

When implementing openCypher features:

Check the TCK coverage matrix: tests/tck/coverage_matrix.json
Mark features as "supported", "planned", or "unsupported"
Add corresponding TCK tests
Ensure semantic correctness per the openCypher specification

Documentation¶

Code Documentation¶

Public APIs: Comprehensive docstrings with examples
Internal functions: Brief docstrings explaining purpose
Complex logic: Inline comments explaining the "why"

Project Documentation¶

Update relevant docs when adding features: - README.md - User-facing features - requirements.md - Requirement changes - testing.md - Testing approach changes

Getting Help¶

Questions: Open a GitHub Discussion
Bugs: Open a GitHub Issue
Security: Email security concerns privately (see SECURITY.md if available)

Releases and Versioning¶

GraphForge follows Semantic Versioning and maintains a detailed CHANGELOG.md (repository root).

For Contributors¶

When submitting PRs, update the [Unreleased] section of CHANGELOG.md:

## [Unreleased]

### Added
- New feature you implemented

### Fixed
- Bug you fixed

For Maintainers¶

See RELEASING.md (repository root) for the release process, or release-process.md for comprehensive documentation.

Quick release:

python scripts/bump_version.py minor
# Edit CHANGELOG.md
git commit -am "chore(release): bump version to X.Y.Z"
git push origin main
git tag -a vX.Y.Z -m "Release version X.Y.Z"
git push origin vX.Y.Z
gh release create vX.Y.Z --title "GraphForge vX.Y.Z"

License¶

By contributing, you agree that your contributions will be licensed under the MIT License.

Thank You!¶

Your contributions help make GraphForge better for everyone. We appreciate your time and effort!