Quick Start¶
Get a graph running in five minutes.
Install¶
pip install graphforge
# or
uv add graphforge
Create and Query a Graph¶
from graphforge import GraphForge
forge = GraphForge() # in-memory; use GraphForge("my-graph/") for persistence
# Add nodes — returns a NodeHandle
alice = forge.add_node("Person", name="Alice", age=30)
bob = forge.add_node("Person", name="Bob", age=25)
# Add a relationship
forge.add_edge(alice, "KNOWS", bob, since=2020)
# Query with openCypher — returns an Arrow Table
table = forge.execute("""
MATCH (p:Person)-[:KNOWS]->(friend:Person)
WHERE p.age > 25
RETURN p.name AS person, friend.name AS friend, p.age AS age
ORDER BY p.age DESC
""")
# Consume the result
df = table.to_pandas()
print(df)
# person friend age
# 0 Alice Bob 30
forge.execute() always returns a PyArrow Table. Use table.to_pandas() for pandas,
pl.from_arrow(table) for Polars, or iterate rows with table.to_pylist().
Persist a Graph¶
Pass a directory path instead of leaving it empty. GraphForge stores the graph as Parquet files in that directory and reloads it automatically on the next open.
forge = GraphForge("research/")
forge.add_node("Paper", title="Graph Neural Networks", year=2024)
forge.close()
# Reload in a later session
forge = GraphForge("research/")
table = forge.execute("MATCH (p:Paper) RETURN p.title AS title")
print(table.column("title")[0].as_py()) # Graph Neural Networks
Bulk Load¶
For loading many nodes or edges at once, use add_nodes() and add_edges(). Both accept a
list of dicts, a pandas DataFrame, or an Arrow Table.
import pandas as pd
# List of dicts
forge.add_nodes("Paper", [
{"title": "Graph Neural Networks", "year": 2021, "citations": 150},
{"title": "Deep Learning Fundamentals", "year": 2019, "citations": 500},
{"title": "Attention Is All You Need", "year": 2017, "citations": 2000},
])
# From a DataFrame
edges_df = pd.DataFrame({
"src_id": [1, 2],
"dst_id": [3, 3],
"weight": [0.8, 0.6],
})
forge.add_edges("CITES", edges_df, src="src_id", dst="dst_id")
Rank Nodes¶
forge.rank() scores every node of a given label and returns an Arrow Table containing all
node properties plus a score column. No mutation happens unless you pass write_property.
# Read-only — just get the scores back as a table
table = forge.rank("Person", by="pagerank")
df = table.to_pandas()
print(df[["name", "score"]].sort_values("score", ascending=False))
# Restrict to a specific relationship type
table = forge.rank("Person", by="betweenness", via="KNOWS", directed=False)
# Opt-in write-back — stores the score as a node property
forge.rank("Person", by="pagerank", write_property="rank")
forge.execute("MATCH (n:Person) RETURN n.name, n.rank ORDER BY n.rank DESC LIMIT 5")
Supported values for by: pagerank, betweenness, closeness, degree,
clustering_coefficient, triangles.
Find Relevant Content¶
forge.find() runs a hybrid text + vector search and returns an Arrow Table with node
properties alongside score and matched_on columns. The index is built automatically on
the first call — no setup step required.
# Text search — index built lazily on first call
table = forge.find("graph neural networks")
df = table.to_pandas()
print(df[["title", "score", "matched_on"]])
# title score matched_on
# 0 Graph Neural Networks 0.924 text
# 1 GNN Applications in NLP 0.781 text
# Restrict to a label and limit results
table = forge.find("graph neural networks", label="Paper", limit=20)
# Hybrid search — pass a vector alongside the text query
import openai
client = openai.OpenAI()
query_vec = client.embeddings.create(
input="graph neural networks", model="text-embedding-3-small"
).data[0].embedding
table = forge.find("graph neural networks", label="Paper", vector=query_vec)
# Vector-only search
table = forge.find(vector=query_vec, label="Paper")
matched_on is "text", "vector", or "text+vector". GraphForge stores and queries
vectors but does not generate them — bring your own embeddings from any model.
For explicit control over index timing (e.g. batch ingestion before first search):
forge.index("Paper", properties=["title", "abstract"])
forge.index("Paper", node_id=42, vector=embedding, space="sbert")
Group into Communities¶
forge.cluster() assigns every node of a given label to a community and returns an Arrow
Table with node properties plus a community_id column.
# Read-only community detection
table = forge.cluster("Person", by="louvain")
df = table.to_pandas()
print(df.groupby("community_id")["name"].apply(list))
# Restrict to a relationship type and write the result back
forge.cluster("Person", by="louvain", via="KNOWS", write_property="community")
forge.execute("""
MATCH (n:Person)
RETURN n.community AS community, count(*) AS size
ORDER BY size DESC LIMIT 5
""")
Supported values for by: louvain, components.
Next Steps¶
- Tutorial — guided walkthrough with a full citation network example
- Analytics Integration — Arrow, pandas, Polars, rank, cluster, find
- Cypher Reference — complete query language documentation
- Datasets — 100+ real-world networks
- API Reference — full Python API