API Reference¶
v0.5.0 unified API
This page documents the v0.5.0 API. All methods return Arrow Tables — there are no
CypherValue wrappers or SearchHit objects. The conventional instance name is forge,
not db. The Rust crate API is documented separately below.
Main API¶
GraphForge — constructor¶
forge = GraphForge() # in-memory, ephemeral
forge = GraphForge("path/") # Parquet-backed, persistent
execute(query, params=None) → pyarrow.Table¶
The openCypher entry point. Runs through the full compiler pipeline (recursive-descent parser → binder → Graph IR → DataFusion) and returns an Arrow Table.
table = forge.execute("MATCH (p:Person)-[:KNOWS]->(b) RETURN p.name, b.name")
params is an optional dict of query parameters ($name placeholders in the query string).
Construction¶
add_node(label, **props) → NodeHandle¶
Add a single node. Returns a NodeHandle that prints as Person(id=1, name='Alice', age=30)
and exposes .id for use in Cypher parameters.
alice = forge.add_node("Person", name="Alice", age=30)
add_nodes(label, data) → None¶
Bulk-insert nodes. data may be an Arrow Table, a pandas or Polars DataFrame, or a
list[dict]. No return value.
forge.add_nodes("Person", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])
add_edge(src, rel_type, dst, **props) → EdgeHandle¶
Add a single directed relationship. Returns an EdgeHandle.
forge.add_edge(alice, "KNOWS", bob, since=2020)
add_edges(rel_type, data, src, dst) → None¶
Bulk-insert relationships. src and dst name the columns holding source and destination
node IDs.
forge.add_edges("KNOWS", edges_df, src="src_id", dst="dst_id")
Analyst Verbs¶
All analyst verbs bypass the Cypher compiler. They export the relevant subgraph from Parquet,
dispatch to the algorithm backend, and return Arrow Tables. All are read-only by default;
write_property opts into graph mutation.
Common parameters (availability varies by verb):
| Parameter | Available on | Description |
|---|---|---|
via |
rank, cluster, paths, analyze, similar |
Relationship type to traverse — None means all types |
directed |
rank, cluster, paths, analyze |
Treat edges as directed (default: True) |
write_property |
rank, cluster |
If set, persist the result column back to each node under this name |
Full algorithm catalog: Algorithm Verbs
rank(label, *, by, via=None, directed=True, write_property=None) → pyarrow.Table¶
Score every node. Returns all node properties + score: Float64.
Centrality (by=):
pagerank, betweenness, closeness, harmonic_closeness, degree,
eigenvector, article_rank, hits_hub, hits_authority, celf
Structural (by=):
clustering_coefficient, triangles, local_clustering_coefficient, k_core
Link prediction (by=):
preferential_attachment, adamic_adar, common_neighbors,
resource_allocation, total_neighbors
table = forge.rank("Person", by="pagerank")
forge.rank("Person", by="pagerank", write_property="rank") # opt-in write-back
forge.rank("Person", by="betweenness", via="KNOWS", directed=False)
cluster(label, *, by, via=None, directed=False, write_property=None) → pyarrow.Table¶
Assign community membership. Returns all node properties + community_id: Int64.
Community detection defaults to undirected (directed=False).
Community (by=):
louvain, leiden, label_propagation, speaker_listener, girvan_newman,
modularity_optimization, fastgreedy, infomap, leading_eigenvector,
walktrap, spinglass, hdbscan, k_means, approximate_max_k_cut
Components (by=):
components, strongly_connected, biconnected
Decomposition (by=):
k_core_decomposition
table = forge.cluster("Person", by="louvain")
forge.cluster("Person", by="louvain", write_property="community")
forge.cluster("Person", by="strongly_connected", directed=True)
paths(source, target=None, *, by, via=None, directed=True, k=1, weight=None) → pyarrow.Table¶
Find paths or compute flows. Schema varies by algorithm:
- Shortest path:
source_id,target_id,cost: Float64,path: List<Int64> - All-pairs: one row per reachable (source, target) pair
- Flow:
source_id,sink_id,flow: Float64+ per-edge flow columns - Traversal:
node_id,depth: Int64
source and target accept node IDs, NodeHandle values, or {"label": "Person", "name": "Alice"} match dicts. Pass target=None for single-source algorithms.
Shortest paths (by=):
bfs, dijkstra, dijkstra_all_pairs, astar, bellman_ford,
floyd_warshall, delta_stepping, yens
Flow (by=):
max_flow, min_cut, min_cost_max_flow, gomory_hu_tree
Steiner trees (by=):
min_steiner_tree, prize_collecting_steiner_tree
Traversal / sampling (by=):
dfs, random_walk
Reachability (by=):
transitive_closure
# Hop-count distance, all reachable nodes from alice
forge.paths(alice, by="bfs")
# Weighted shortest path between two nodes
forge.paths(alice, bob, by="dijkstra", weight="distance")
# Top-3 shortest paths (Yen's)
forge.paths(alice, bob, by="yens", k=3, weight="distance")
# Max flow between two nodes
forge.paths(alice, bob, by="max_flow", weight="capacity")
analyze(label=None, *, by, via=None, directed=True, **kwargs) → pyarrow.Table¶
Graph-level and structural metrics. Schema varies by algorithm.
Spanning trees (by=):
minimum_spanning_tree, maximum_spanning_tree, minimum_k_spanning_tree
DAG (by=):
topological_sort, is_dag, find_cycles,
dag_longest_path, dag_longest_path_weighted
Coloring (by=):
node_coloring, edge_coloring, chromatic_number, k1_coloring
Matching (by=):
max_weight_matching, max_cardinality_matching, max_bipartite_matching
Eulerian (by=):
euler_circuit, euler_path, has_euler_circuit, has_euler_path
Structure (by=):
is_planar, articulation_points, bridges, triangle_count,
conductance, modularity, transitivity, triad_census, dyad_census,
count_automorphisms
Node embeddings (by=):
node2vec, graphsage, fast_random_projection, hashgnn
(returns node_id + embedding: List<Float32>)
# Check for cycles
result = forge.analyze(by="is_dag")
# Topological ordering
order = forge.analyze(by="topological_sort")
# Minimum spanning tree
edges = forge.analyze(by="minimum_spanning_tree", via="ROAD", weight="distance")
# Node embeddings for downstream ML
emb = forge.analyze("Person", by="node2vec", directed=False)
similar(label, *, by, k=10, vector_property=None, via=None) → pyarrow.Table¶
Pairwise node similarity. Returns node1_id, node2_id, similarity: Float64.
Topology-based (by=):
node_similarity, knn, filtered_knn, filtered_node_similarity
Vector-based (by=):
cosine (requires vector_property)
forge.similar("Paper", by="node_similarity", k=5)
forge.similar("Paper", by="cosine", vector_property="embedding", k=10)
find(query=None, *, label=None, vector=None, limit=10, space=None) → pyarrow.Table¶
Search nodes by text, vector, or hybrid. Returns node properties + score: Float64 +
matched_on: Utf8 ("text" | "vector" | "text+vector").
Index is built lazily on first call for each label. Call forge.index() for explicit control.
GraphForge stores and queries vectors but does not generate them.
forge.find("graph neural networks", label="Paper", limit=20)
forge.find(vector=embedding, label="Paper")
forge.find("graph neural networks", label="Paper", vector=embedding)
index(label, *, properties=None, node_id=None, vector=None, space=None) → None¶
Explicitly build or update a search index.
forge.index("Paper", properties=["title", "abstract"])
forge.index("Paper", node_id=42, vector=embedding, space="sbert")
Graph Inspection¶
schema() → pyarrow.Table¶
Columns: label, node_count, rel_type, rel_count.
labels() → list[str]¶
All node labels present in the graph.
relationship_types() → list[str]¶
All relationship types present in the graph.
node_count(label=None) → int¶
Total node count, or count for a specific label.
explain(query, stage=None) → str¶
Compiler plan for a Cypher query. stage: "ast", "ir", "logical", "physical",
or omit for a full multi-stage summary.
load_ontology(path) → None¶
Load an ontology YAML. Governs label validation, property types, relationship constraints, and inference rules used during query planning.
Recipes¶
graphforge.recipes
¶
graphforge.recipes — standalone helper functions for common graph patterns.
Each recipe composes forge.execute() into a reusable pattern. All results are pyarrow.Table.
neighbourhood(db, canonical, hops=2, *, label='Entity', canonical_prop='canonical')
¶
Return the n-hop neighbourhood of a seed node as an Arrow Table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db
|
GraphForge
|
GraphForge instance to query. |
required |
canonical
|
str
|
Value of the |
required |
hops
|
int
|
Maximum traversal depth (default 2). |
2
|
label
|
str
|
Node label to match on seed and neighbours (default |
'Entity'
|
canonical_prop
|
str
|
Property name used as the node identifier. |
'canonical'
|
Returns:
| Type | Description |
|---|---|
Table
|
pyarrow.Table with columns |
Rust Crate API¶
Generate the full Rust API reference with:
cargo doc --workspace --no-deps --open
Key public types:
| Type | Crate | Description |
|---|---|---|
GraphForge |
gf-core |
Main engine facade — builder pattern |
ExecutionResult |
gf-core |
{ schema, batches, stats } |
GraphPlan |
gf-ir |
Versioned graph IR envelope |
OntologyHandle |
gf-ontology |
Loaded ontology reference |
GfError |
gf-core |
Typed public error enum |
Python documentation is automatically generated from source code docstrings.