Skip to content

API Reference

v0.5.0 unified API

This page documents the v0.5.0 API. All methods return Arrow Tables — there are no CypherValue wrappers or SearchHit objects. The conventional instance name is forge, not db. The Rust crate API is documented separately below.


Main API

GraphForge — constructor

forge = GraphForge()            # in-memory, ephemeral
forge = GraphForge("path/")     # Parquet-backed, persistent

execute(query, params=None)pyarrow.Table

The openCypher entry point. Runs through the full compiler pipeline (recursive-descent parser → binder → Graph IR → DataFusion) and returns an Arrow Table.

table = forge.execute("MATCH (p:Person)-[:KNOWS]->(b) RETURN p.name, b.name")

params is an optional dict of query parameters ($name placeholders in the query string).


Construction

add_node(label, **props)NodeHandle

Add a single node. Returns a NodeHandle that prints as Person(id=1, name='Alice', age=30) and exposes .id for use in Cypher parameters.

alice = forge.add_node("Person", name="Alice", age=30)

add_nodes(label, data)None

Bulk-insert nodes. data may be an Arrow Table, a pandas or Polars DataFrame, or a list[dict]. No return value.

forge.add_nodes("Person", [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}])

add_edge(src, rel_type, dst, **props)EdgeHandle

Add a single directed relationship. Returns an EdgeHandle.

forge.add_edge(alice, "KNOWS", bob, since=2020)

add_edges(rel_type, data, src, dst)None

Bulk-insert relationships. src and dst name the columns holding source and destination node IDs.

forge.add_edges("KNOWS", edges_df, src="src_id", dst="dst_id")

Analyst Verbs

All analyst verbs bypass the Cypher compiler. They export the relevant subgraph from Parquet, dispatch to the algorithm backend, and return Arrow Tables. All are read-only by default; write_property opts into graph mutation.

Common parameters (availability varies by verb):

Parameter Available on Description
via rank, cluster, paths, analyze, similar Relationship type to traverse — None means all types
directed rank, cluster, paths, analyze Treat edges as directed (default: True)
write_property rank, cluster If set, persist the result column back to each node under this name

Full algorithm catalog: Algorithm Verbs


rank(label, *, by, via=None, directed=True, write_property=None)pyarrow.Table

Score every node. Returns all node properties + score: Float64.

Centrality (by=): pagerank, betweenness, closeness, harmonic_closeness, degree, eigenvector, article_rank, hits_hub, hits_authority, celf

Structural (by=): clustering_coefficient, triangles, local_clustering_coefficient, k_core

Link prediction (by=): preferential_attachment, adamic_adar, common_neighbors, resource_allocation, total_neighbors

table = forge.rank("Person", by="pagerank")
forge.rank("Person", by="pagerank", write_property="rank")   # opt-in write-back
forge.rank("Person", by="betweenness", via="KNOWS", directed=False)

cluster(label, *, by, via=None, directed=False, write_property=None)pyarrow.Table

Assign community membership. Returns all node properties + community_id: Int64.

Community detection defaults to undirected (directed=False).

Community (by=): louvain, leiden, label_propagation, speaker_listener, girvan_newman, modularity_optimization, fastgreedy, infomap, leading_eigenvector, walktrap, spinglass, hdbscan, k_means, approximate_max_k_cut

Components (by=): components, strongly_connected, biconnected

Decomposition (by=): k_core_decomposition

table = forge.cluster("Person", by="louvain")
forge.cluster("Person", by="louvain", write_property="community")
forge.cluster("Person", by="strongly_connected", directed=True)

paths(source, target=None, *, by, via=None, directed=True, k=1, weight=None)pyarrow.Table

Find paths or compute flows. Schema varies by algorithm:

  • Shortest path: source_id, target_id, cost: Float64, path: List<Int64>
  • All-pairs: one row per reachable (source, target) pair
  • Flow: source_id, sink_id, flow: Float64 + per-edge flow columns
  • Traversal: node_id, depth: Int64

source and target accept node IDs, NodeHandle values, or {"label": "Person", "name": "Alice"} match dicts. Pass target=None for single-source algorithms.

Shortest paths (by=): bfs, dijkstra, dijkstra_all_pairs, astar, bellman_ford, floyd_warshall, delta_stepping, yens

Flow (by=): max_flow, min_cut, min_cost_max_flow, gomory_hu_tree

Steiner trees (by=): min_steiner_tree, prize_collecting_steiner_tree

Traversal / sampling (by=): dfs, random_walk

Reachability (by=): transitive_closure

# Hop-count distance, all reachable nodes from alice
forge.paths(alice, by="bfs")

# Weighted shortest path between two nodes
forge.paths(alice, bob, by="dijkstra", weight="distance")

# Top-3 shortest paths (Yen's)
forge.paths(alice, bob, by="yens", k=3, weight="distance")

# Max flow between two nodes
forge.paths(alice, bob, by="max_flow", weight="capacity")

analyze(label=None, *, by, via=None, directed=True, **kwargs)pyarrow.Table

Graph-level and structural metrics. Schema varies by algorithm.

Spanning trees (by=): minimum_spanning_tree, maximum_spanning_tree, minimum_k_spanning_tree

DAG (by=): topological_sort, is_dag, find_cycles, dag_longest_path, dag_longest_path_weighted

Coloring (by=): node_coloring, edge_coloring, chromatic_number, k1_coloring

Matching (by=): max_weight_matching, max_cardinality_matching, max_bipartite_matching

Eulerian (by=): euler_circuit, euler_path, has_euler_circuit, has_euler_path

Structure (by=): is_planar, articulation_points, bridges, triangle_count, conductance, modularity, transitivity, triad_census, dyad_census, count_automorphisms

Node embeddings (by=): node2vec, graphsage, fast_random_projection, hashgnn (returns node_id + embedding: List<Float32>)

# Check for cycles
result = forge.analyze(by="is_dag")

# Topological ordering
order = forge.analyze(by="topological_sort")

# Minimum spanning tree
edges = forge.analyze(by="minimum_spanning_tree", via="ROAD", weight="distance")

# Node embeddings for downstream ML
emb = forge.analyze("Person", by="node2vec", directed=False)

similar(label, *, by, k=10, vector_property=None, via=None)pyarrow.Table

Pairwise node similarity. Returns node1_id, node2_id, similarity: Float64.

Topology-based (by=): node_similarity, knn, filtered_knn, filtered_node_similarity

Vector-based (by=): cosine (requires vector_property)

forge.similar("Paper", by="node_similarity", k=5)
forge.similar("Paper", by="cosine", vector_property="embedding", k=10)

find(query=None, *, label=None, vector=None, limit=10, space=None)pyarrow.Table

Search nodes by text, vector, or hybrid. Returns node properties + score: Float64 + matched_on: Utf8 ("text" | "vector" | "text+vector").

Index is built lazily on first call for each label. Call forge.index() for explicit control.

GraphForge stores and queries vectors but does not generate them.

forge.find("graph neural networks", label="Paper", limit=20)
forge.find(vector=embedding, label="Paper")
forge.find("graph neural networks", label="Paper", vector=embedding)

index(label, *, properties=None, node_id=None, vector=None, space=None)None

Explicitly build or update a search index.

forge.index("Paper", properties=["title", "abstract"])
forge.index("Paper", node_id=42, vector=embedding, space="sbert")

Graph Inspection

schema()pyarrow.Table

Columns: label, node_count, rel_type, rel_count.

labels()list[str]

All node labels present in the graph.

relationship_types()list[str]

All relationship types present in the graph.

node_count(label=None)int

Total node count, or count for a specific label.

explain(query, stage=None)str

Compiler plan for a Cypher query. stage: "ast", "ir", "logical", "physical", or omit for a full multi-stage summary.

load_ontology(path)None

Load an ontology YAML. Governs label validation, property types, relationship constraints, and inference rules used during query planning.


Recipes

graphforge.recipes

graphforge.recipes — standalone helper functions for common graph patterns.

Each recipe composes forge.execute() into a reusable pattern. All results are pyarrow.Table.

neighbourhood(db, canonical, hops=2, *, label='Entity', canonical_prop='canonical')

Return the n-hop neighbourhood of a seed node as an Arrow Table.

Parameters:

Name Type Description Default
db GraphForge

GraphForge instance to query.

required
canonical str

Value of the canonical_prop property on the seed node.

required
hops int

Maximum traversal depth (default 2).

2
label str

Node label to match on seed and neighbours (default "Entity").

'Entity'
canonical_prop str

Property name used as the node identifier.

'canonical'

Returns:

Type Description
Table

pyarrow.Table with columns canonical_prop, name, labels.


Rust Crate API

Generate the full Rust API reference with:

cargo doc --workspace --no-deps --open

Key public types:

Type Crate Description
GraphForge gf-core Main engine facade — builder pattern
ExecutionResult gf-core { schema, batches, stats }
GraphPlan gf-ir Versioned graph IR envelope
OntologyHandle gf-ontology Loaded ontology reference
GfError gf-core Typed public error enum

Python documentation is automatically generated from source code docstrings.