API Reference¶

v0.5.0 unified API

This page documents the v0.5.0 API. All methods return Arrow Tables — there are no CypherValue wrappers or SearchHit objects. The conventional instance name is forge, not db. The Rust crate API is documented separately below.

Main API¶

`GraphForge` — constructor¶

forge = GraphForge()            # in-memory, ephemeral
forge = GraphForge("path/")     # Parquet-backed, persistent

`execute(query, params=None)` → `pyarrow.Table`¶

The openCypher entry point. Runs through the full compiler pipeline (recursive-descent parser → binder → Graph IR → DataFusion) and returns an Arrow Table.

table = forge.execute("MATCH (p:Person)-[:KNOWS]->(b) RETURN p.name, b.name")

params is an optional dict of query parameters ($name placeholders in the query string).

Construction¶

`add_node(label, **props)` → `NodeHandle`¶

Add a single node. Returns a NodeHandle that prints as Person(id=1, name='Alice', age=30) and exposes .id for use in Cypher parameters.

alice = forge.add_node("Person", name="Alice", age=30)

`add_nodes(label, data)` → `None`¶

Bulk-insert nodes. data may be an Arrow Table, a pandas or Polars DataFrame, or a list[dict]. No return value.

forge.add_nodes("Person", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])

`add_edge(src, rel_type, dst, **props)` → `EdgeHandle`¶

Add a single directed relationship. Returns an EdgeHandle.

forge.add_edge(alice, "KNOWS", bob, since=2020)

`add_edges(rel_type, data, src, dst)` → `None`¶

Bulk-insert relationships. src and dst name the columns holding source and destination node IDs.

forge.add_edges("KNOWS", edges_df, src="src_id", dst="dst_id")

Analyst Verbs¶

All analyst verbs bypass the Cypher compiler. They export the relevant subgraph from Parquet, dispatch to the algorithm backend, and return Arrow Tables. All are read-only by default; write_property opts into graph mutation.

Common parameters (availability varies by verb):

Parameter	Available on	Description
`via`	`rank`, `cluster`, `paths`, `analyze`, `similar`	Relationship type to traverse — `None` means all types
`directed`	`rank`, `cluster`, `paths`, `analyze`	Treat edges as directed (default: `True`)
`write_property`	`rank`, `cluster`	If set, persist the result column back to each node under this name

Full algorithm catalog: Algorithm Verbs

`rank(label, *, by, via=None, directed=True, write_property=None)` → `pyarrow.Table`¶

Score every node. Returns all node properties + score: Float64.

Centrality (by=): pagerank, betweenness, closeness, harmonic_closeness, degree, eigenvector, article_rank, hits_hub, hits_authority, celf

Structural (by=): clustering_coefficient, triangles, local_clustering_coefficient, k_core

Link prediction (by=): preferential_attachment, adamic_adar, common_neighbors, resource_allocation, total_neighbors

table = forge.rank("Person", by="pagerank")
forge.rank("Person", by="pagerank", write_property="rank")   # opt-in write-back
forge.rank("Person", by="betweenness", via="KNOWS", directed=False)

`cluster(label, *, by, via=None, directed=False, write_property=None)` → `pyarrow.Table`¶

Assign community membership. Returns all node properties + community_id: Int64.

Community detection defaults to undirected (directed=False).

Community (by=): louvain, leiden, label_propagation, speaker_listener, girvan_newman, modularity_optimization, fastgreedy, infomap, leading_eigenvector, walktrap, spinglass, hdbscan, k_means, approximate_max_k_cut

Components (by=): components, strongly_connected, biconnected

Decomposition (by=): k_core_decomposition

table = forge.cluster("Person", by="louvain")
forge.cluster("Person", by="louvain", write_property="community")
forge.cluster("Person", by="strongly_connected", directed=True)

`paths(source, target=None, *, by, via=None, directed=True, k=1, weight=None)` → `pyarrow.Table`¶

Find paths or compute flows. Schema varies by algorithm:

Shortest path: source_id, target_id, cost: Float64, path: List<Int64>
All-pairs: one row per reachable (source, target) pair
Flow: source_id, sink_id, flow: Float64 + per-edge flow columns
Traversal: node_id, depth: Int64

source and target accept node IDs, NodeHandle values, or {"label": "Person", "name": "Alice"} match dicts. Pass target=None for single-source algorithms.

Shortest paths (by=): bfs, dijkstra, dijkstra_all_pairs, astar, bellman_ford, floyd_warshall, delta_stepping, yens

Flow (by=): max_flow, min_cut, min_cost_max_flow, gomory_hu_tree

Steiner trees (by=): min_steiner_tree, prize_collecting_steiner_tree

Traversal / sampling (by=): dfs, random_walk

Reachability (by=): transitive_closure

# Hop-count distance, all reachable nodes from alice
forge.paths(alice, by="bfs")

# Weighted shortest path between two nodes
forge.paths(alice, bob, by="dijkstra", weight="distance")

# Top-3 shortest paths (Yen's)
forge.paths(alice, bob, by="yens", k=3, weight="distance")

# Max flow between two nodes
forge.paths(alice, bob, by="max_flow", weight="capacity")

`analyze(label=None, *, by, via=None, directed=True, **kwargs)` → `pyarrow.Table`¶

Graph-level and structural metrics. Schema varies by algorithm.

Spanning trees (by=): minimum_spanning_tree, maximum_spanning_tree, minimum_k_spanning_tree

DAG (by=): topological_sort, is_dag, find_cycles, dag_longest_path, dag_longest_path_weighted

Coloring (by=): node_coloring, edge_coloring, chromatic_number, k1_coloring

Matching (by=): max_weight_matching, max_cardinality_matching, max_bipartite_matching

Eulerian (by=): euler_circuit, euler_path, has_euler_circuit, has_euler_path

Structure (by=): is_planar, articulation_points, bridges, triangle_count, conductance, modularity, transitivity, triad_census, dyad_census, count_automorphisms

Node embeddings (by=): node2vec, graphsage, fast_random_projection, hashgnn (returns node_id + embedding: List<Float32>)

# Check for cycles
result = forge.analyze(by="is_dag")

# Topological ordering
order = forge.analyze(by="topological_sort")

# Minimum spanning tree
edges = forge.analyze(by="minimum_spanning_tree", via="ROAD", weight="distance")

# Node embeddings for downstream ML
emb = forge.analyze("Person", by="node2vec", directed=False)

`similar(label, *, by, k=10, vector_property=None, via=None)` → `pyarrow.Table`¶

Pairwise node similarity. Returns node1_id, node2_id, similarity: Float64.

Topology-based (by=): node_similarity, knn, filtered_knn, filtered_node_similarity

Vector-based (by=): cosine (requires vector_property)

forge.similar("Paper", by="node_similarity", k=5)
forge.similar("Paper", by="cosine", vector_property="embedding", k=10)

`find(query=None, *, label=None, vector=None, limit=10, space=None)` → `pyarrow.Table`¶

Search nodes by text, vector, or hybrid. Returns node properties + score: Float64 + matched_on: Utf8 ("text" | "vector" | "text+vector").

Index is built lazily on first call for each label. Call forge.index() for explicit control.

GraphForge stores and queries vectors but does not generate them.

forge.find("graph neural networks", label="Paper", limit=20)
forge.find(vector=embedding, label="Paper")
forge.find("graph neural networks", label="Paper", vector=embedding)

`index(label, *, properties=None, node_id=None, vector=None, space=None)` → `None`¶

Explicitly build or update a search index.

forge.index("Paper", properties=["title", "abstract"])
forge.index("Paper", node_id=42, vector=embedding, space="sbert")

Graph Inspection¶

`schema()` → `pyarrow.Table`¶

Columns: label, node_count, rel_type, rel_count.

`labels()` → `list[str]`¶

All node labels present in the graph.

`relationship_types()` → `list[str]`¶

All relationship types present in the graph.

`node_count(label=None)` → `int`¶

Total node count, or count for a specific label.

`explain(query, stage=None)` → `str`¶

Compiler plan for a Cypher query. stage: "ast", "ir", "logical", "physical", or omit for a full multi-stage summary.

`load_ontology(path)` → `None`¶

Load an ontology YAML. Governs label validation, property types, relationship constraints, and inference rules used during query planning.

Recipes¶

`graphforge.recipes` ¶

graphforge.recipes — standalone helper functions for common graph patterns.

Each recipe composes forge.execute() into a reusable pattern. All results are pyarrow.Table.

`neighbourhood(db, canonical, hops=2, *, label='Entity', canonical_prop='canonical')` ¶

Return the n-hop neighbourhood of a seed node as an Arrow Table.

Parameters:

Name	Type	Description	Default
`db`	`GraphForge`	GraphForge instance to query.	required
`canonical`	`str`	Value of the `canonical_prop` property on the seed node.	required
`hops`	`int`	Maximum traversal depth (default 2).	`2`
`label`	`str`	Node label to match on seed and neighbours (default `"Entity"`).	`'Entity'`
`canonical_prop`	`str`	Property name used as the node identifier.	`'canonical'`

Returns:

Type	Description
`Table`	pyarrow.Table with columns `canonical_prop`, `name`, `labels`.

Rust Crate API¶

Generate the full Rust API reference with:

cargo doc --workspace --no-deps --open

Key public types:

Type	Crate	Description
`GraphForge`	`gf-core`	Main engine facade — builder pattern
`ExecutionResult`	`gf-core`	`{ schema, batches, stats }`
`GraphPlan`	`gf-ir`	Versioned graph IR envelope
`OntologyHandle`	`gf-ontology`	Loaded ontology reference
`GfError`	`gf-core`	Typed public error enum

Python documentation is automatically generated from source code docstrings.

API Reference¶

Main API¶

GraphForge — constructor¶

execute(query, params=None) → pyarrow.Table¶

Construction¶

add_node(label, **props) → NodeHandle¶

add_nodes(label, data) → None¶

add_edge(src, rel_type, dst, **props) → EdgeHandle¶

add_edges(rel_type, data, src, dst) → None¶

Analyst Verbs¶

rank(label, *, by, via=None, directed=True, write_property=None) → pyarrow.Table¶

cluster(label, *, by, via=None, directed=False, write_property=None) → pyarrow.Table¶

paths(source, target=None, *, by, via=None, directed=True, k=1, weight=None) → pyarrow.Table¶

analyze(label=None, *, by, via=None, directed=True, **kwargs) → pyarrow.Table¶

similar(label, *, by, k=10, vector_property=None, via=None) → pyarrow.Table¶

find(query=None, *, label=None, vector=None, limit=10, space=None) → pyarrow.Table¶

index(label, *, properties=None, node_id=None, vector=None, space=None) → None¶