Analytics Integration¶
Every GraphForge method returns an Apache Arrow Table. Arrow is the common currency — you can pass the result directly to pandas, Polars, NetworkX, or any Arrow-aware library with no intermediate conversion step.
Arrow Results¶
forge.execute() returns a PyArrow Table. The same is true for rank(), cluster(),
and find().
from graphforge import GraphForge
forge = GraphForge()
forge.add_node("Person", name="Alice", age=30)
forge.add_node("Person", name="Bob", age=25)
table = forge.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")
# table is a pyarrow.Table
# Convert to pandas
import pandas as pd
df = table.to_pandas()
# Convert to Polars
import polars as pl
df = pl.from_arrow(table)
# Iterate rows as plain Python dicts
for row in table.to_pylist():
print(row["name"], row["age"])
# Pass a single column to NumPy
ages = table.column("age").to_pylist()
There are no CypherValue wrappers or .value calls in v0.5.0. Column values are
standard Python types (str, int, float, bool, None) when accessed via to_pandas(),
to_pylist(), or .as_py().
NetworkX from Arrow¶
Convert an Arrow Table to a NetworkX graph for path algorithms and custom analysis:
import networkx as nx
table = forge.execute("""
MATCH (a:Person)-[:KNOWS]->(b:Person)
RETURN a.name AS src, b.name AS dst
""")
G = nx.from_pandas_edgelist(table.to_pandas(), source="src", target="dst")
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Run any NetworkX algorithm on the result
pr = nx.pagerank(G)
cc = nx.average_clustering(G.to_undirected())
print(f"Avg clustering: {cc:.4f}")
forge.rank() — Centrality Analysis¶
forge.rank() scores every node of a given label and returns an Arrow Table with all
node properties plus a score column. It dispatches to an optimised algorithm backend —
no Cypher needed, no exporting the graph first.
# PageRank across all Person nodes
table = forge.rank("Person", by="pagerank")
df = table.to_pandas().sort_values("score", ascending=False)
print(df[["name", "score"]].head(10))
# Restrict to a specific relationship type and direction
table = forge.rank("Person", by="betweenness", via="KNOWS", directed=True)
# Degree centrality — good proxy for raw connectivity
table = forge.rank("Person", by="degree")
Available algorithms for by:
| Value | Description |
|---|---|
pagerank |
Eigenvector-based global influence |
betweenness |
Fraction of shortest paths passing through a node |
closeness |
Average inverse distance to all other nodes |
degree |
Number of direct connections |
clustering_coefficient |
Density of a node's local neighbourhood |
triangles |
Count of closed triangles the node participates in |
Write-back (opt-in)¶
By default rank() is read-only. Pass write_property to store the score as a node
property, which you can then query with Cypher:
forge.rank("Person", by="pagerank", write_property="rank")
top = forge.execute("""
MATCH (n:Person)
RETURN n.name AS name, n.rank AS rank
ORDER BY n.rank DESC LIMIT 10
""")
print(top.to_pandas())
forge.cluster() — Community Detection¶
forge.cluster() assigns every node of a given label to a community and returns an Arrow
Table with node properties plus a community_id column.
# Louvain community detection
table = forge.cluster("Person", by="louvain")
df = table.to_pandas()
# See community sizes
print(df.groupby("community_id").size().sort_values(ascending=False))
# Restrict to one relationship type
table = forge.cluster("Person", by="louvain", via="KNOWS")
# Connected components — useful for finding isolated subgraphs
table = forge.cluster("Person", by="components")
Available algorithms for by:
| Value | Description |
|---|---|
louvain |
Modularity-maximising community detection |
components |
Weakly connected components |
Write-back (opt-in)¶
forge.cluster("Person", by="louvain", write_property="community")
# Now query the community structure with Cypher
forge.execute("""
MATCH (n:Person)
RETURN n.community AS community, count(*) AS size
ORDER BY size DESC LIMIT 5
""")
forge.find() — Search and Retrieval¶
forge.find() combines full-text search and vector cosine similarity in a single call.
It returns an Arrow Table with node properties plus score and matched_on columns.
The index is built automatically on the first find() call — no explicit indexing step
is required for a standard workflow.
# Text search
table = forge.find("graph neural networks")
df = table.to_pandas()
print(df[["title", "score", "matched_on"]])
# Limit to a label and top-N results
table = forge.find("graph neural networks", label="Paper", limit=20)
# Hybrid text + vector search (bring your own embeddings)
import openai
client = openai.OpenAI()
def embed(text: str) -> list[float]:
return client.embeddings.create(
input=text, model="text-embedding-3-small"
).data[0].embedding
query_vec = embed("scalable graph representation learning")
table = forge.find("scalable graph learning", label="Paper", vector=query_vec)
# Vector-only search
table = forge.find(vector=query_vec, label="Paper")
Result columns:
| Column | Type | Description |
|---|---|---|
| (node properties) | varies | All properties of the matched node |
score |
float | Combined relevance score |
matched_on |
str | "text", "vector", or "text+vector" |
Explicit indexing¶
find() builds its index automatically, but you can control the timing explicitly —
useful when you want to index a large batch before the first search call:
# Index selected properties for text search
forge.index("Paper", properties=["title", "abstract"])
# Store a vector for a specific node
nid = forge.execute(
"MATCH (n:Paper {title: 'GNN Survey'}) RETURN id(n) AS nid"
).column("nid")[0].as_py()
forge.index("Paper", node_id=nid, vector=embed("GNN Survey overview"), space="sbert")
Using find() results in Cypher¶
The id column in the result table is a live node ID — pass it to execute() for
follow-up graph traversals:
table = forge.find("graph neural networks", label="Paper", limit=5)
top_id = table.column("id")[0].as_py()
neighbours = forge.execute("""
MATCH (p:Paper)-[:CITES]->(cited:Paper)
WHERE id(p) = $nid
RETURN cited.title AS title, cited.year AS year
ORDER BY cited.year DESC
""", {"nid": top_id})
print(neighbours.to_pandas())
Choosing Between Methods¶
| Goal | Method |
|---|---|
| Declarative query — patterns, filters, aggregations | forge.execute() |
| Score nodes by graph influence | forge.rank() |
| Group nodes into communities | forge.cluster() |
| Search by keywords or semantic similarity | forge.find() |
| Custom graph algorithms via NetworkX | forge.execute() → Arrow → nx.from_pandas_edgelist() |
Schema Introspection¶
print(forge.labels()) # ['Author', 'Paper', 'Person']
print(forge.relationship_types()) # ['AUTHORED', 'CITES', 'KNOWS']
print(forge.node_count("Person")) # 42
print(forge.schema()) # Arrow Table — labels, property names, types
Next Steps¶
- Network Analysis use case — worked examples on the SNAP ego-Facebook dataset
- Knowledge Graph Construction — LangChain integration and MERGE patterns
- API Reference — full method signatures