Skip to content

Quick Start

Get a graph running in five minutes.


Install

pip install graphforge
# or
uv add graphforge

Create and Query a Graph

from graphforge import GraphForge

forge = GraphForge()   # in-memory; use GraphForge("my-graph/") for persistence

# Add nodes — returns a NodeHandle
alice = forge.add_node("Person", name="Alice", age=30)
bob   = forge.add_node("Person", name="Bob",   age=25)

# Add a relationship
forge.add_edge(alice, "KNOWS", bob, since=2020)

# Query with openCypher — returns an Arrow Table
table = forge.execute("""
    MATCH (p:Person)-[:KNOWS]->(friend:Person)
    WHERE p.age > 25
    RETURN p.name AS person, friend.name AS friend, p.age AS age
    ORDER BY p.age DESC
""")

# Consume the result
df = table.to_pandas()
print(df)
#   person friend  age
# 0  Alice    Bob   30

forge.execute() always returns a PyArrow Table. Use table.to_pandas() for pandas, pl.from_arrow(table) for Polars, or iterate rows with table.to_pylist().


Persist a Graph

Pass a directory path instead of leaving it empty. GraphForge stores the graph as Parquet files in that directory and reloads it automatically on the next open.

forge = GraphForge("research/")
forge.add_node("Paper", title="Graph Neural Networks", year=2024)
forge.close()

# Reload in a later session
forge = GraphForge("research/")
table = forge.execute("MATCH (p:Paper) RETURN p.title AS title")
print(table.column("title")[0].as_py())   # Graph Neural Networks

Bulk Load

For loading many nodes or edges at once, use add_nodes() and add_edges(). Both accept a list of dicts, a pandas DataFrame, or an Arrow Table.

import pandas as pd

# List of dicts
forge.add_nodes("Paper", [
    {"title": "Graph Neural Networks",       "year": 2021, "citations": 150},
    {"title": "Deep Learning Fundamentals",  "year": 2019, "citations": 500},
    {"title": "Attention Is All You Need",   "year": 2017, "citations": 2000},
])

# From a DataFrame
edges_df = pd.DataFrame({
    "src_id": [1, 2],
    "dst_id": [3, 3],
    "weight": [0.8, 0.6],
})
forge.add_edges("CITES", edges_df, src="src_id", dst="dst_id")

Rank Nodes

forge.rank() scores every node of a given label and returns an Arrow Table containing all node properties plus a score column. No mutation happens unless you pass write_property.

# Read-only — just get the scores back as a table
table = forge.rank("Person", by="pagerank")
df = table.to_pandas()
print(df[["name", "score"]].sort_values("score", ascending=False))

# Restrict to a specific relationship type
table = forge.rank("Person", by="betweenness", via="KNOWS", directed=False)

# Opt-in write-back — stores the score as a node property
forge.rank("Person", by="pagerank", write_property="rank")
forge.execute("MATCH (n:Person) RETURN n.name, n.rank ORDER BY n.rank DESC LIMIT 5")

Supported values for by: pagerank, betweenness, closeness, degree, clustering_coefficient, triangles.


Find Relevant Content

forge.find() runs a hybrid text + vector search and returns an Arrow Table with node properties alongside score and matched_on columns. The index is built automatically on the first call — no setup step required.

# Text search — index built lazily on first call
table = forge.find("graph neural networks")
df = table.to_pandas()
print(df[["title", "score", "matched_on"]])
#                       title     score matched_on
# 0     Graph Neural Networks  0.924       text
# 1   GNN Applications in NLP  0.781       text

# Restrict to a label and limit results
table = forge.find("graph neural networks", label="Paper", limit=20)

# Hybrid search — pass a vector alongside the text query
import openai
client = openai.OpenAI()
query_vec = client.embeddings.create(
    input="graph neural networks", model="text-embedding-3-small"
).data[0].embedding

table = forge.find("graph neural networks", label="Paper", vector=query_vec)

# Vector-only search
table = forge.find(vector=query_vec, label="Paper")

matched_on is "text", "vector", or "text+vector". GraphForge stores and queries vectors but does not generate them — bring your own embeddings from any model.

For explicit control over index timing (e.g. batch ingestion before first search):

forge.index("Paper", properties=["title", "abstract"])
forge.index("Paper", node_id=42, vector=embedding, space="sbert")

Group into Communities

forge.cluster() assigns every node of a given label to a community and returns an Arrow Table with node properties plus a community_id column.

# Read-only community detection
table = forge.cluster("Person", by="louvain")
df = table.to_pandas()
print(df.groupby("community_id")["name"].apply(list))

# Restrict to a relationship type and write the result back
forge.cluster("Person", by="louvain", via="KNOWS", write_property="community")
forge.execute("""
    MATCH (n:Person)
    RETURN n.community AS community, count(*) AS size
    ORDER BY size DESC LIMIT 5
""")

Supported values for by: louvain, components.


Next Steps