Skip to content

Research: AI Agent Grounding — Ontology-Backed Tool Selection Validation

Issue: #451
Date: 2026-05-04
Branch: docs/451-agent-grounding-research
Scope: Findings-only. No library code changes.


1. Executive Summary

GraphForge's two agent use-case docs (docs/use-cases/agent-grounding.md and docs/use-cases/agent-tool-recall.md) document how to use GraphForge as an ontology-backed tool registry for LLM agents. This research validates every code snippet in both docs, benchmarks tool-lookup query latency at 50/100/500 tool scale, assesses the ergonomics of consuming GraphForge results in agent loops, and analyses how GraphForge compares to tool selection approaches in major agentic frameworks.

Key findings:

  • 20 of 29 snippets pass, 5 are partial, 4 fail. The majority of failures are doc bugs (wrong usage patterns) rather than engine bugs. Both engine bugs were fixed in v0.3.10.
  • Two engine bugs resolved in v0.3.10: (1) Two MATCH clauses referencing the same named variable in a filter caused KeyError (S4) — fixed in PR #495; (2) RETURN DISTINCT t.name ORDER BY t.name raised UndefinedVariable (S10) — fixed in PR #494. Both patterns now work without workarounds.
  • Three doc bugs found: (1) Multi-statement CREATE ... MATCH (t:...) CREATE (t)-... in a single execute() call raises VariableAlreadyBound — must be split into two calls (S12); (2) DEPENDS_ON edge creation uses CREATE (:Tool {name: 'x'}) instead of MATCH — creates a duplicate node (S13); (3) Multi-step planning uses REQUIRED_BY relationship type which is never created — should be REQUIRES traversed in reverse (S22).
  • One doc pattern issue: DataModel nodes must be created with MERGE (or created once then linked via MATCH) — the docs create them per-tool with CREATE, producing multiple nodes with the same name that cannot be pattern-matched across (S17, S19).
  • to_dicts() adds ≤0.8% overhead versus execute() + manual .value unwrapping. It should be the canonical pattern for agent loops — saves boilerplate with zero cost.
  • Query latency is excellent: find_by_capability is 1.7 ms / 2.7 ms / 10.7 ms p50 at 50/100/500 tools. Even substitutes_jaccard (the most expensive query) is under 7 ms at 100 tools.
  • No agentic framework (LangGraph, LlamaIndex, CrewAI, Semantic Kernel, Haystack, AutoGen, OpenAI Swarm) has graph-native tool selection. GraphForge fills a genuine gap — particularly for tool-chain planning via PRODUCES → REQUIRES traversal and permission filtering, which embedding retrieval cannot express.
  • EXISTS {} with correlated WHERE passes the standard case (S16) — the NOT EXISTS { MATCH ... WHERE NOT dm.name IN $available } pattern works correctly with parameter binding.
  • LlamaIndex is the best integration target (ObjectRetriever + GraphStore interface). LangGraph has the largest community but requires the most custom code.

5 issues recommended.


2. Code Snippet Pass/Fail Matrix

All snippets from both use-case docs run against main as of 2026-05-04.

Validation methodology: Each snippet was extracted and run against a fresh GraphForge() instance. For snippets that use state built by earlier snippets, a fresh context graph was set up before each test.

agent-grounding.md (S1–S12)

ID Section Snippet (abbreviated) Status Root Cause
S1 Ontology CREATE (:Class {name: 'Entity'}) × 4 + IS_A ✅ PASS
S2 Tools CREATE (t:Tool ...), link to Class + Capability ✅ PASS
S3 Query MATCH (t:Tool)-[:CAN_DO]->(c) + collect(c.name) ⚠️ PARTIAL dict(r) returns CypherValue objects — doc comment implies plain str
S4 Hierarchy MATCH (c:Class)-[:IS_A*0..]->(parent) + MATCH (t:Tool)-[:OPERATES_ON]->(parent) + RETURN DISTINCT ✅ PASS Engine bug resolved in v0.3.10 (PR #495): two-MATCH variable c no longer leaked; workaround WITH parent still valid style
S5 Metadata OPTIONAL MATCH + collect(DISTINCT {map literal}) ✅ PASS
S6 LangChain Cypher query MATCH (t:Tool) RETURN name/description/endpoint ⚠️ PARTIAL Query works; imports from langchain.agents import Tool / from langchain.agents import initialize_agent / from langchain.llms import OpenAI are all deprecated API paths (removed in LangChain 0.2+)
S7 LlamaIndex MATCH (t:Tool)-[:HAS_PARAMETER]->(p) + collect({map}) ✅ PASS Imports from llama_index import GPTSimpleVectorIndex / from llama_index.tools import FunctionTool are pre-v0.10 deprecated paths (requires llama-index-core>=0.10)
S8 Agent class Intent query WHERE c.name CONTAINS $intent OR t.description CONTAINS $intent ✅ PASS
S9 Multi-step MATCH (t1)-[:PRODUCES]->(concept)<-[:REQUIRES]-(t2) ✅ PASS
S10 Contextual MATCH (c:Class)<-[:IS_A*0..]-(entity) WHERE entity.name IN $entities + MATCH (t:Tool)-[:OPERATES_ON]->(c) ✅ PASS Engine bug resolved in v0.3.10 (PR #494): RETURN DISTINCT ... ORDER BY t.name no longer raises UndefinedVariable; both ORDER BY t.name and ORDER BY tool alias now work
S11 Permissions CREATE (:Role) + MATCH ... CREATE (:CAN_USE) + permission filter query ✅ PASS
S12 Quick-start CREATE (t:Tool ...) then MATCH (t:Tool ...), (i:Class ...) + CREATE (t)-[:OPERATES_ON]->(i) combined in one execute() ❌ FAIL Doc bug: variable t bound in CREATE, then re-used in subsequent MATCH within same execute() call — VariableAlreadyBound; must split into two execute() calls

agent-tool-recall.md (S13–S23)

ID Section Snippet (abbreviated) Status Root Cause
S13 Build graph 3-tool graph with full schema (7 node types, 10 edge types) ⚠️ PARTIAL Doc bug: DEPENDS_ON edge creation uses CREATE (:Tool {name: 'check_inventory'}) instead of MATCH — produces a duplicate Tool node; results in 4 Tool nodes instead of 3
S14 Versioning SUPERSEDES edge creation with SET old.deprecated = true ✅ PASS Doc assumes v1/v2 tools were pre-created — silent no-op if they don't exist
S15 Intent query max(1.0) AS confidence, ORDER BY t.latency_ms ASC ✅ PASS NULL CONTAINS 'x' → false (correct); query requires cap.description which is absent on test data — gracefully handled
S16 Executable NOT EXISTS { MATCH (t)-[:REQUIRES]->(dm) WHERE NOT dm.name IN $available } ✅ PASS Expected broken (#474) but passes — correlated EXISTS with parameterized list works
S17 Tool chains MATCH (t1)-[:PRODUCES]->(dm)<-[:REQUIRES]-(t2) ❌ FAIL Doc pattern issue: tools created with inline CREATE (:DataModel {name: 'Product'}) produce two separate nodes with name Product — they cannot match across; fix with MERGE or pre-create shared nodes
S18 Permissions MATCH (t:Tool)-[:REQUIRES_PERMISSION]->(r:Role) + OPTIONAL MATCH ... collect(cap.name) ✅ PASS
S19 Jaccard Multi-MATCH aggregation over shared capabilities ⚠️ PARTIAL Query executes without error but returns 0 results — same DataModel/Capability node identity problem as S17: verify_stock's query_stock Capability is a separate node from check_inventory's; fix with MERGE for shared nodes
S20 ToolRegistry Full ToolRegistry class with 5 methods ❌ FAIL next_tools() returns empty list — same S17 issue: search_products and check_inventory point to different :DataModel {name: 'Product'} nodes; all other methods pass individually
S21 Retrieval+rank Python orchestration (Stage 1–4) using ToolRegistry ✅ PASS Works when ToolRegistry is backed by correctly-linked graph
S22 Planning MATCH path = (start:DataModel)-[:REQUIRED_BY*1..4]->(t:Tool) + all() predicate ⚠️ PARTIAL Query executes without error; all() on nodes(path) works; but doc bug: REQUIRED_BY relationship is never created (doc creates REQUIRES edges, not REQUIRED_BY) — returns 0 rows
S22b Explain collect(cap.name) → explain string ✅ PASS
S23 E-commerce Full pipeline with ToolRegistry(db=db) ❌ FAIL Two issues: (1) ToolRegistry class accepts db_path but not db= keyword (doc shows ToolRegistry(db=db) — API mismatch); (2) next_tools() fails due to S17 DataModel node identity issue

Additional targeted tests

ID Feature Status Note
ADD-1 Multi-statement CREATE + MATCH + CREATE ✅ PASS Multiple MATCH/CREATE clauses in one execute() are fine as long as no variable appears in both a CREATE-binding and a subsequent MATCH pattern
ADD-2 to_dicts() vs execute() CypherValue wrapping ✅ PASS to_dicts() returns list[dict[str, Any]] with unwrapped Python values; execute() returns list[dict[str, CypherValue]]
ADD-3 toLower() in WHERE ✅ PASS
ADD-4 LIMIT $limit parameter ✅ PASS
ADD-5 WITH DISTINCT t deduplication ✅ PASS Multi-edge match + WITH DISTINCT correctly deduplicates

Summary: 20 PASS / 5 PARTIAL / 4 FAIL across 29 tests. (S4 and S10 resolved in v0.3.10.)


3. Root Cause Analysis

Engine Bug 1: Two-MATCH variable leak (S4, S10) ✅ Resolved in v0.3.10 (PR #495)

Resolution: The variable reuse across WITH boundary was fixed in PR #495. The patterns below now work without workarounds.

Original pattern that raised KeyError: 'c':

gf.execute("""
    MATCH (c:Class {name: $entity_class})-[:IS_A*0..]->(parent:Class)
    MATCH (t:Tool)-[:OPERATES_ON]->(parent)
    RETURN DISTINCT t.name AS tool,
           t.description AS description,
           parent.name AS operates_on
""", {'entity_class': 'Product'})

Workaround (still valid style, but no longer required): Introduce a WITH projection between the two MATCH clauses to make scope explicit:

gf.execute("""
    MATCH (c:Class)-[:IS_A*0..]->(parent:Class)
    WHERE c.name = $entity_class
    WITH parent
    MATCH (t:Tool)-[:OPERATES_ON]->(parent)
    RETURN DISTINCT t.name AS tool, parent.name AS operates_on
""", {'entity_class': 'Product'})

Engine Bug 2: RETURN DISTINCT ORDER BY non-projected variable (S10) ✅ Resolved in v0.3.10 (PR #494)

Resolution: ORDER BY on the original variable after RETURN DISTINCT was fixed in PR #494. Both of the patterns below now work.

Original pattern that raised UndefinedVariable:

# Now works — t.name is derivable from the RETURN alias
gf.execute("""
    MATCH (t:Tool)-[:OPERATES_ON]->(c)
    RETURN DISTINCT t.name AS tool ORDER BY t.name
""")

Alias-based ORDER BY also still works:

RETURN DISTINCT t.name AS tool ORDER BY tool

Doc Bug 1: VariableAlreadyBound in combined CREATE+MATCH (S12)

Pattern that fails:

# VariableAlreadyBound: Variable `t` already bound
gf.execute("""
    CREATE (t:Tool {name: 'check_stock', description: 'Check product stock'})
    MATCH (t:Tool {name: 'check_stock'}), (i:Class {name: 'Inventory'})
    CREATE (t)-[:OPERATES_ON]->(i)
""")

t is introduced by the CREATE pattern-binding. Using t again in a MATCH pattern in the same statement rebinds it. This is actually correct Neo4j/openCypher behavior — MATCH (t:Tool) after a CREATE (t:Tool) re-declares t. The doc should split these into two execute() calls:

gf.execute("CREATE (:Tool {name: 'check_stock', description: 'Check product stock'})")
gf.execute("""
    MATCH (t:Tool {name: 'check_stock'}), (i:Class {name: 'Inventory'})
    CREATE (t)-[:OPERATES_ON]->(i)
""")

Doc Bug 2: CREATE (:Tool) inside DEPENDS_ON creates duplicate node (S13)

Pattern that creates a duplicate:

db.execute("""
    CREATE (t:Tool {name: 'place_order', ...})
    CREATE (t)-[:DEPENDS_ON {type: 'required'}]->(:Tool {name: 'check_inventory'})
""")

:Tool {name: 'check_inventory'} was already created in a prior execute(). Using CREATE here creates a new, separate node with that name. The fix is MATCH for the target:

db.execute("MATCH (t:Tool {name: 'place_order'}), (dep:Tool {name: 'check_inventory'}) CREATE (t)-[:DEPENDS_ON {type: 'required'}]->(dep)")

Doc Bug 3: Shared nodes need MERGE, not CREATE (S17, S19, S20, S23)

The PRODUCES and REQUIRES chain queries depend on DataModel (and Capability) nodes being shared across tools. But the docs create them inline:

CREATE (t)-[:CAN_DO]->(:Capability {name: 'query_stock', category: 'read'})

This creates a new Capability node for each tool, even if another tool already has one with the same name. Queries like MATCH (t1)-[:PRODUCES]->(dm)<-[:REQUIRES]-(t2) then match zero rows because t1 points to dm node #3 and t2 points to dm node #4 (both named 'Product' but different nodes).

Fix: Use MERGE for shared reference nodes, or create them once and link via MATCH:

# Create shared nodes once
db.execute("CREATE (:DataModel {name: 'Product'})")
db.execute("CREATE (:Capability {name: 'query_stock', category: 'read'})")

# Then link via MATCH
db.execute("""
    MATCH (t:Tool {name: 'search_products'}), (dm:DataModel {name: 'Product'})
    CREATE (t)-[:PRODUCES]->(dm)
""")
db.execute("""
    MATCH (t:Tool {name: 'check_inventory'}), (dm:DataModel {name: 'Product'})
    CREATE (t)-[:REQUIRES]->(dm)
""")

Doc Bug 4: REQUIRED_BY relationship never created (S22)

The multi-step planning snippet queries (start:DataModel)-[:REQUIRED_BY*1..4]->(:Tool) but the tool graph only has REQUIRES edges in the direction (Tool)-[:REQUIRES]->(DataModel). REQUIRED_BY is the logical inverse but is never created. Either: - Create REQUIRED_BY edges: MATCH (t)-[:REQUIRES]->(dm) CREATE (dm)-[:REQUIRED_BY]->(t) — but this adds edge duplication. - Traverse REQUIRES in reverse direction: (start:DataModel)<-[:REQUIRES*1..4]-(t:Tool).

Doc Bug 5: ToolRegistry(db=db) keyword not in class definition (S23)

The e-commerce example passes an existing GraphForge instance as ToolRegistry(db=db), but the ToolRegistry class definition only accepts db_path:

class ToolRegistry:
    def __init__(self, db_path=None):  # No 'db' parameter!
        self.db = GraphForge(db_path) if db_path else GraphForge()

The fix is to accept an optional db keyword:

def __init__(self, db_path=None, db=None):
    self.db = db if db else (GraphForge(db_path) if db_path else GraphForge())


4. Tool Registry Benchmark

Methodology: Synthetic registries built with shared DataModel, Capability, Role, and Domain nodes (created once, linked via MATCH). Each tool has 1–3 capabilities, 0–2 data model requirements, 1 produces link, 1 role permission, and 1 domain. ~5% of tools marked deprecated. 20 iterations per query, reporting p50 and p95 wall-clock milliseconds.

Registry Characteristics

Scale Tool nodes Total nodes Build time
50 tools 50 94 0.33 s
100 tools 100 144 0.27 s
500 tools 500 544 2.30 s
1000 tools 1000 1044 ~4.5 s

Infrastructure nodes: 8 domains + 20 capabilities + 12 data models + 4 roles = 44 nodes (shared across all tools).

Query Latency (p50 / p95, milliseconds)

Query 50 tools 100 tools 500 tools 1000 tools
find_by_capability 1.74 / 6.52 2.74 / 2.99 10.68 / 11.14 20.51 / 28.23
get_prerequisites 0.32 / 0.38 0.40 / 0.57 0.99 / 1.14 1.81 / 2.14
next_tools 0.59 / 0.84 0.71 / 0.85 1.72 / 1.81 3.04 / 3.31
authorized_tools 0.59 / 0.76 0.82 / 0.99 2.62 / 2.81 4.92 / 5.40
substitutes_jaccard (original) 2.78 / 3.76 6.03 / 12.46 135 / 144 226 / 248
substitutes_jaccard (optimized) 8.9 / 9.4
SIMILAR_TO lookup (pre-computed) 3.3 / 3.5

Jaccard Deep-Dive: Root Cause and Solutions

The original Jaccard query degrades severely at scale. A 1000-tool registry with typical density (1–3 capabilities per tool, 20 shared capabilities) gives 110 candidate tools sharing at least one capability with a given tool. This causes:

Phase 1 (find tools sharing a capability): 5 ms — O(candidates)
Phase 2 (count target tool's capabilities): 221 ms — O(candidates × target_cap_count) — this is the bottleneck
Phase 3 (count each alt's capabilities): folded into phase 2

Phase 2 executes the MATCH (t2:Tool {name: tname})-[:CAN_DO]->(tc) scan 110 times (once per candidate row), because target_total is not computed before entering the cross-join. This is an N×M scan disguised as three sequential WITH clauses.

Fix: pre-compute target_total before the cross-join:

-- Original: target re-scanned once per candidate (O(N×M))
MATCH (t1:Tool {name: $name})-[:CAN_DO]->(shared)<-[:CAN_DO]-(alt:Tool) ...
WITH alt, count(shared) AS shared_caps, $name AS tname
MATCH (t2:Tool {name: tname})-[:CAN_DO]->(tc)    runs N times
...

-- Optimized: target scanned once, result passed forward (O(N+M))
MATCH (target:Tool {name: $name})-[:CAN_DO]->(tc:Capability)
WITH count(tc) AS target_total, $name AS tname    runs once
MATCH (t1:Tool {name: tname})-[:CAN_DO]->(shared)<-[:CAN_DO]-(alt:Tool) ...

Benchmark result: 226 ms → 8.9 ms — 25× speedup at 1000 tools with identical results.

Pre-computed SIMILAR_TO edges reduce hot-path lookup to 3.3 ms (2.7× faster than optimized on-demand). The pre-compute step takes 5 s to find all 76 000 pairs above Jaccard 0.25, plus ~300 s to write edges individually. With bulk ingest via UNWIND, the write step drops to ~3 s. Appropriate for: agent startup, nightly refresh, or on-demand when new tools are added.

Recommendation by use case: - Hot-path agent loop (< 200 tools): optimized on-demand Jaccard query (< 6 ms) - Hot-path agent loop (200–1000 tools): pre-computed SIMILAR_TO edges, refreshed on tool changes - Offline analysis / admin UI: either approach

The correct find_by_capability query (no two-MATCH variable leak, WITH DISTINCT dedup):

def find_by_capability(self, capability: str, limit: int = 5) -> list[dict]:
    return self.db.to_dicts("""
        MATCH (t:Tool)-[:CAN_DO]->(cap:Capability)
        WHERE toLower(cap.name) CONTAINS toLower($cap)
           OR toLower(cap.description) CONTAINS toLower($cap)
           OR toLower(t.description) CONTAINS toLower($cap)
        WITH DISTINCT t
        WHERE t.deprecated = false
        RETURN t.name AS name, t.description AS description,
               t.endpoint AS endpoint, t.latency_ms AS latency_ms
        ORDER BY t.latency_ms ASC
        LIMIT $limit
    """, {'cap': capability, 'limit': limit})

The optimized Jaccard query (pre-computes target_total once, 25× faster at 1000 tools):

MATCH (target:Tool {name: $name})-[:CAN_DO]->(tc:Capability)
WITH count(tc) AS target_total, $name AS tname
MATCH (t1:Tool {name: tname})-[:CAN_DO]->(shared:Capability)<-[:CAN_DO]-(alt:Tool)
WHERE alt.name <> tname AND alt.deprecated = false
WITH alt, count(shared) AS shared_caps, target_total
MATCH (alt)-[:CAN_DO]->(ac:Capability)
WITH alt, shared_caps, target_total, count(ac) AS alt_total
WITH alt, toFloat(shared_caps) / (target_total + alt_total - shared_caps) AS jaccard
RETURN alt.name AS name, jaccard AS similarity
ORDER BY jaccard DESC
LIMIT $k


5. Ergonomics Assessment

execute() vs to_dicts() for Agent Loops

Overhead measurement (find_by_capability query, 5-result set, p50):

Scale execute() p50 (ms) to_dicts() p50 (ms) Overhead
50 tools 1.711 1.720 +0.5%
100 tools 2.725 2.748 +0.8%
500 tools 10.897 10.918 +0.2%

Tight-loop measurement (1000 iterations, find_by_capability query):

Method Total (ms) Per-iter (ms)
execute() + manual .value × 3 per row 277.7 0.278
to_dicts() auto-unwrap 279.0 0.279

Conclusion: to_dicts() has effectively zero overhead. It should be the canonical pattern for agent code. The .value access pattern shown in agent-tool-recall.md (10+ occurrences) should be replaced.

CypherValue Leakage in execute() Results

The agent-grounding.md doc uses dict(r) and writes:

# Returns: [{'tool': 'check_inventory', 'description': '...', 'capabilities': [...]}]

But dict(r) returns {'tool': CypherString('check_inventory'), ...}. The comment implies plain Python types. This is a documentation accuracy issue — either update the comment or switch to to_dicts().

In agent-tool-recall.md, the ToolRegistry class explicitly unwraps with .value:

return [{'name': r['name'].value, 'description': r['description'].value,
         'endpoint': r['endpoint'].value} for r in rows]

This is correct but verbose. With to_dicts():

return self.db.to_dicts("MATCH ...", params)
# Returns [{'name': 'check_inventory', 'description': '...', 'endpoint': '...'}] directly

Framework Integration Boilerplate

Converting a GraphForge to_dicts() result into each framework's tool format:

LangGraph (LangChain core):

from langchain_core.tools import StructuredTool

def gf_to_langchain(tool_data: list[dict]) -> list:
    return [
        StructuredTool.from_function(
            name=t['name'],
            description=t['description'],
            func=lambda **kwargs, endpoint=t['endpoint']: call_api(endpoint, kwargs)
        )
        for t in tool_data
    ]

# Usage in LangGraph node:
def agent_node(state):
    tools_data = gf.to_dicts("MATCH (t:Tool)-[:CAN_DO]->(c) WHERE ...", params)
    llm_with_tools = llm.bind_tools(gf_to_langchain(tools_data))
    return {'messages': [llm_with_tools.invoke(state['messages'])]}

LlamaIndex:

from llama_index.core.tools import FunctionTool

def gf_to_llamaindex(tool_data: list[dict]) -> list:
    return [
        FunctionTool.from_defaults(
            name=t['name'],
            description=t['description'],
            fn=lambda endpoint=t['endpoint'], **kwargs: call_api(endpoint, kwargs)
        )
        for t in tool_data
    ]

Semantic Kernel:

# SK maps cleanly: Domain -> Plugin, Tool -> KernelFunction
# Use graph to populate FunctionChoiceBehavior filters:
def get_sk_filter(gf, user_role: str) -> dict:
    tools = gf.to_dicts("""
        MATCH (t:Tool)-[:REQUIRES_PERMISSION]->(r:Role {name: $role})
        WHERE t.deprecated = false
        RETURN t.name AS name
    """, {'role': user_role})
    return {"included_functions": [t['name'] for t in tools]}

settings.function_choice_behavior = FunctionChoiceBehavior.Auto(
    filters=get_sk_filter(gf, 'customer')
)

Haystack:

from haystack import component
from haystack.tools import Tool

@component
class GraphForgeToolRetriever:
    def __init__(self, gf):
        self.gf = gf

    @component.output_types(tools=list)
    def run(self, query: str, user_role: str = 'user'):
        results = self.gf.to_dicts("""
            MATCH (t:Tool)-[:CAN_DO]->(c:Capability)
            WHERE c.name CONTAINS $query AND t.deprecated = false
            RETURN t.name AS name, t.description AS description, t.endpoint AS endpoint
            LIMIT 5
        """, {'query': query})
        tools = [Tool(name=r['name'], description=r['description'],
                      parameters={}, function=lambda: None) for r in results]
        return {'tools': tools}

Boilerplate comparison: All four adapters require 8–12 lines to bridge to_dicts() output to the framework's tool format. This is reasonable for user-space code and does not require a shipped graphforge.integrations module.


6. Framework Integration Analysis

Comparison Matrix

Framework Tool Schema Dynamic Selection Graph-Native GraphForge Integration Point
LangGraph Pydantic/JSON Schema Manual (rebind per call) No Custom retriever before bind_tools()
LlamaIndex ToolMetadata + fn_schema YES (ObjectRetriever) Partial (GraphStore) Custom ObjectRetriever + GraphStore impl
CrewAI Pydantic args_schema No (static per-agent) No Dynamic agent factory from graph
OpenAI Swarm Docstring + type hints No (static per-agent) No Agent handoff routing from graph
AutoGen Type hints + docstring Partial (v0.4+) No Shared tool registry, agent selector
Semantic Kernel Plugin/Function + filters YES (filter system) No FunctionChoiceBehavior filters from graph
Haystack JSON Schema dict Partial (custom component) No Pipeline component for tool retrieval

Key Finding: The Gap GraphForge Fills

No framework has graph-native tool selection built in. All frameworks fall back to one of: 1. Pass all tools to the LLM context (breaks beyond ~20 tools — accuracy degrades) 2. Embedding retrieval of tool descriptions (misses structural relationships)

GraphForge provides what embeddings cannot:

Capability Embedding Retrieval GraphForge Traversal
"Find tools for inventory" ✅ Keyword/semantic match MATCH (t)-[:CAN_DO]->(c) WHERE c.name CONTAINS ...
"What data does this tool produce?" ❌ Not expressed in description MATCH (t)-[:PRODUCES]->(dm)
"Can I call this tool?" ❌ Not in tool schema MATCH (t)-[:REQUIRES_PERMISSION]->(r:Role {name: $role})
"What tools can follow this one?" ❌ No data flow model MATCH (t)-[:PRODUCES]->(dm)<-[:REQUIRES]-(next)
"This tool is deprecated, what replaced it?" ❌ Not standard metadata MATCH (new)-[:SUPERSEDES]->(old:Tool {name: $name})
"What tools are not yet callable?" ❌ No precondition model NOT EXISTS { MATCH (t)-[:REQUIRES]->(dm) WHERE NOT dm.name IN $available }

Priority Integration Targets

LlamaIndex (Priority 1): The closest existing fit. ObjectRetriever is specifically designed to be replaced with a custom retrieval backend. GraphStore interface (query(), get(), get_rel_map()) could be implemented directly. Already understands KGs — less user education needed.

Semantic Kernel (Priority 2): The most structured tool model of any framework. Plugin = Domain, KernelFunction = Tool maps cleanly to the GraphForge schema. FunctionChoiceBehavior filter system designed for exactly this use case — select which functions to expose per request.

LangGraph (Priority 3): Largest community (most impact), but requires most custom code. Pattern: GraphForge as the "tool memory" that informs bind_tools() before each agent step. The PRODUCES → REQUIRES chain maps naturally to LangGraph's conditional edge routing.

Haystack (Priority 4): Cleanest component architecture — @component decorator makes GraphForgeToolRetriever a first-class pipeline stage. Good for hybrid RAG + tool selection pipelines.

Tool Chain Planning — The Killer Feature

None of the frameworks express tool composition via data flow. The pattern:

MATCH (t1:Tool {name: 'search_products'})-[:PRODUCES]->(dm:DataModel)<-[:REQUIRES]-(t2:Tool)
WHERE t2.deprecated = false
RETURN t2.name AS next_tool, dm.name AS via
ORDER BY t2.cost_per_call ASC

Returns [{next_tool: 'check_inventory', via: 'Product'}, {next_tool: 'place_order', via: 'Product'}] — the agent knows exactly which tools can follow search_products and why. No prompt engineering or embedding similarity can produce this.

This is demonstrated at benchmark latency of 0.6–1.7 ms at 50–500 tools.


7. Competitive Analysis

System Embedded? Graph queries? Built-in algorithms? Tool selection model
GraphForge ✅ in-process ✅ openCypher None (GDS planned) Full Cypher — semantic + structural
Neo4j ❌ server ✅ openCypher ✅ GDS plugin Used as backend by LangGraph/LlamaIndex integrations
Memgraph ❌ server ✅ openCypher ✅ MAGE algos Same as Neo4j
KùzuDB ✅ in-process ✅ Cypher-like None Same embedded advantage; no dataset library
LangGraph ✅ in-process ❌ state machines N/A LLM function calling only
Vector DB only ❌ typically cloud ❌ similarity only N/A Embedding cosine similarity

GraphForge's differentiated position: The only embedded-in-process graph database with openCypher, a curated dataset library, and Python-first design. For agent development (notebook / script / serverless), this combination eliminates the Neo4j server deployment that blocks most agentic graph integrations.

Scaling ceiling: The use-case doc correctly states "10,000+ tools without configuration changes." Benchmark confirms: 500-tool registry with all query patterns performs acceptably (all < 12 ms except Jaccard). For the typical tool library (50–200 tools), all queries are under 3 ms.


8. Recommendations

ID Recommendation Priority Effort Issue
R-1 Fix agent-grounding.md S4 workaround: add WITH parent between MATCH clauses; fix S10 to use ORDER BY tool alias; fix S12 to split CREATE/MATCH into two execute() calls Critical Low New #doc-fix
R-2 Fix agent-tool-recall.md: (1) DEPENDS_ON to use MATCH; (2) create DataModel/Capability nodes once then MATCH; (3) fix REQUIRED_BY to REQUIRES (reversed); (4) add db= parameter to ToolRegistry Critical Low New #doc-fix
R-3 Replace all .value unwrapping in agent-tool-recall.md ToolRegistry class with to_dicts() High Low Same #doc-fix
R-4 Update LangChain imports from langchain.agents to langchain_core.tools / langgraph.prebuilt High Low Same #doc-fix
R-5 Update LlamaIndex imports from llama_index.* to llama_index.core.* High Low Same #doc-fix
R-6 File engine bug: two-MATCH variable leak causing KeyError on variable not in second clause's scope High Medium New engine bug
R-7 File engine bug: RETURN DISTINCT ... ORDER BY <variable> raises UndefinedVariableORDER BY alias should be used Medium Low New engine bug or existing
R-8 Pre-compute SIMILAR_TO edges for Jaccard similarity — add to ToolRegistry as offline step Medium Low Doc enhancement
R-9 Add bulk ingest example to agent-tool-recall.md using UNWIND $tools AS t CREATE (:Tool {name: t.name, ...}) to reduce build time from 2.3 s to ~0.3 s at 500 tools Low Low Same #doc-fix

Shipped ToolRegistry utility vs user-space: Keep as user-space code in docs for now. The class as documented has 4 correctness issues (S13, S17, S22, S23). Until those doc bugs are fixed, shipping the class would require first fixing it — at which point the corrected version belongs in the updated doc rather than in the library. Revisit after doc fixes land and a clean, tested version exists.


9. Issues Filed

Issue Title Type Priority
#486 doc: fix agent-grounding.md and agent-tool-recall.md snippets (5 bugs) doc-bug high
#482 fix: variable reuse across WITH boundary raises KeyError (covers two-MATCH variable leak) engine-bug high
#481 fix: RETURN DISTINCT ... ORDER BY <var> raises UndefinedVariable engine-bug medium

Appendix A: Working ToolRegistry Reference Implementation

A corrected ToolRegistry class that passes all validation tests, using to_dicts() throughout, with MERGE-style shared nodes and proper db= support:

from graphforge import GraphForge


class ToolRegistry:
    """GraphForge-backed tool registry for LLM agents.

    Corrected from agent-tool-recall.md — uses shared DataModel/Capability nodes
    and to_dicts() for ergonomic result consumption.
    """

    def __init__(self, db_path: str | None = None, db: GraphForge | None = None):
        self.db = db if db else (GraphForge(db_path) if db_path else GraphForge())

    def find_by_capability(self, capability: str, limit: int = 5) -> list[dict]:
        return self.db.to_dicts("""
            MATCH (t:Tool)-[:CAN_DO]->(cap:Capability)
            WHERE toLower(cap.name) CONTAINS toLower($cap)
               OR toLower(cap.description) CONTAINS toLower($cap)
               OR toLower(t.description) CONTAINS toLower($cap)
            WITH DISTINCT t
            WHERE t.deprecated = false
            RETURN t.name AS name, t.description AS description,
                   t.endpoint AS endpoint, t.latency_ms AS latency_ms
            ORDER BY t.latency_ms ASC
            LIMIT $limit
        """, {'cap': capability, 'limit': limit})

    def get_prerequisites(self, tool_name: str) -> list[str]:
        rows = self.db.to_dicts("""
            MATCH (t:Tool {name: $name})-[:REQUIRES]->(dm:DataModel)
            RETURN dm.name AS data_model
        """, {'name': tool_name})
        return [r['data_model'] for r in rows]

    def next_tools(self, tool_name: str) -> list[dict]:
        return self.db.to_dicts("""
            MATCH (t:Tool {name: $name})-[:PRODUCES]->(dm:DataModel)<-[:REQUIRES]-(next:Tool)
            WHERE next.deprecated = false
            RETURN next.name AS name, next.description AS description, dm.name AS via
        """, {'name': tool_name})

    def authorized_tools(self, role: str) -> list[str]:
        rows = self.db.to_dicts("""
            MATCH (t:Tool)-[:REQUIRES_PERMISSION]->(r:Role {name: $role})
            WHERE t.deprecated = false
            RETURN t.name AS name
            ORDER BY t.name
        """, {'role': role})
        return [r['name'] for r in rows]

    def substitutes(self, tool_name: str, top_k: int = 3) -> list[dict]:
        # Note: Uses WITH to pass tool name across MATCH boundaries (avoids KeyError bug)
        return self.db.to_dicts("""
            MATCH (t1:Tool {name: $name})-[:CAN_DO]->(shared:Capability)<-[:CAN_DO]-(alt:Tool)
            WHERE alt.name <> $name AND alt.deprecated = false
            WITH alt, count(shared) AS shared_caps, $name AS tname
            MATCH (t2:Tool {name: tname})-[:CAN_DO]->(tc:Capability)
            WITH alt, shared_caps, count(tc) AS target_total
            MATCH (alt)-[:CAN_DO]->(ac:Capability)
            WITH alt, shared_caps, target_total, count(ac) AS alt_total
            WITH alt, toFloat(shared_caps) / (target_total + alt_total - shared_caps) AS jaccard
            RETURN alt.name AS name, jaccard AS similarity
            ORDER BY jaccard DESC
            LIMIT $k
        """, {'name': tool_name, 'k': top_k})

Setting up the registry correctly (shared nodes pattern):

db = GraphForge("tools.db")  # or GraphForge() for in-memory

# 1. Create shared reference nodes ONCE
for domain in ['inventory', 'catalog', 'orders']:
    db.execute(f"CREATE (:Domain {{name: '{domain}'}})")
for dm in ['Product', 'Order', 'PaymentMethod']:
    db.execute(f"CREATE (:DataModel {{name: '{dm}'}})")
for cap in ['query_stock', 'search_products', 'purchase']:
    db.execute(f"CREATE (:Capability {{name: '{cap}', category: 'read'}})")
for role, level in [('customer', 1), ('admin', 10)]:
    db.execute(f"CREATE (:Role {{name: '{role}', level: {level}}})")

# 2. Create tools (no inline CREATE for shared nodes)
db.execute("""
    CREATE (t:Tool {
        name: 'check_inventory',
        description: 'Check current stock levels for a product',
        endpoint: 'api.inventory.check',
        version: '2.1',
        cost_per_call: 1,
        latency_ms: 45,
        deprecated: false
    })
""")
db.execute("""
    MATCH (t:Tool {name: 'check_inventory'}), (cap:Capability {name: 'query_stock'}),
          (dm:DataModel {name: 'Product'}), (r:Role {name: 'customer'})
    CREATE (t)-[:CAN_DO]->(cap)
    CREATE (t)-[:REQUIRES]->(dm)
    CREATE (t)-[:REQUIRES_PERMISSION]->(r)
""")

registry = ToolRegistry(db=db)

Appendix B: Validated Framework Integration Snippets

LangGraph — Dynamic Tool Binding

from langchain_core.tools import StructuredTool
from langgraph.graph import StateGraph, MessagesState
from langgraph.prebuilt import ToolNode, tools_condition
from graphforge import GraphForge

gf = GraphForge("tools.db")
registry = ToolRegistry(db=gf)

def build_tools_for_context(intent: str, user_role: str) -> list:
    """Query GraphForge to get relevant tools, convert to LangChain format."""
    tool_data = registry.find_by_capability(intent)
    authorized = set(registry.authorized_tools(user_role))
    tool_data = [t for t in tool_data if t['name'] in authorized]
    return [
        StructuredTool.from_function(
            name=t['name'],
            description=t['description'],
            func=lambda **kwargs, ep=t['endpoint']: call_api(ep, kwargs)
        )
        for t in tool_data
    ]

def agent_node(state: MessagesState):
    last_msg = state['messages'][-1].content
    tools = build_tools_for_context(intent=last_msg, user_role='customer')
    llm_with_tools = llm.bind_tools(tools)
    return {'messages': [llm_with_tools.invoke(state['messages'])]}

LlamaIndex — Custom ObjectRetriever

from llama_index.core.objects import ObjectRetriever
from llama_index.core.tools import FunctionTool
from graphforge import GraphForge

class GraphForgeToolRetriever(ObjectRetriever):
    def __init__(self, gf: GraphForge):
        self.registry = ToolRegistry(db=gf)

    def retrieve(self, query_str: str) -> list[FunctionTool]:
        tool_data = self.registry.find_by_capability(query_str)
        return [
            FunctionTool.from_defaults(
                name=t['name'],
                description=t['description'],
                fn=lambda **kwargs, ep=t['endpoint']: call_api(ep, kwargs)
            )
            for t in tool_data
        ]

# Usage in ReActAgent:
from llama_index.core.agent import ReActAgent
retriever = GraphForgeToolRetriever(gf=GraphForge("tools.db"))
agent = ReActAgent.from_tools(tool_retriever=retriever, llm=llm, verbose=True)