Research: AI Agent Grounding — Ontology-Backed Tool Selection Validation¶
Issue: #451
Date: 2026-05-04
Branch: docs/451-agent-grounding-research
Scope: Findings-only. No library code changes.
1. Executive Summary¶
GraphForge's two agent use-case docs (docs/use-cases/agent-grounding.md and docs/use-cases/agent-tool-recall.md) document how to use GraphForge as an ontology-backed tool registry for LLM agents. This research validates every code snippet in both docs, benchmarks tool-lookup query latency at 50/100/500 tool scale, assesses the ergonomics of consuming GraphForge results in agent loops, and analyses how GraphForge compares to tool selection approaches in major agentic frameworks.
Key findings:
- 20 of 29 snippets pass, 5 are partial, 4 fail. The majority of failures are doc bugs (wrong usage patterns) rather than engine bugs. Both engine bugs were fixed in v0.3.10.
- Two engine bugs resolved in v0.3.10: (1) Two
MATCHclauses referencing the same named variable in a filter causedKeyError(S4) — fixed in PR #495; (2)RETURN DISTINCT t.name ORDER BY t.nameraisedUndefinedVariable(S10) — fixed in PR #494. Both patterns now work without workarounds. - Three doc bugs found: (1) Multi-statement
CREATE ... MATCH (t:...) CREATE (t)-...in a singleexecute()call raisesVariableAlreadyBound— must be split into two calls (S12); (2)DEPENDS_ONedge creation usesCREATE (:Tool {name: 'x'})instead ofMATCH— creates a duplicate node (S13); (3) Multi-step planning usesREQUIRED_BYrelationship type which is never created — should beREQUIREStraversed in reverse (S22). - One doc pattern issue:
DataModelnodes must be created withMERGE(or created once then linked viaMATCH) — the docs create them per-tool withCREATE, producing multiple nodes with the same name that cannot be pattern-matched across (S17,S19). to_dicts()adds ≤0.8% overhead versusexecute()+ manual.valueunwrapping. It should be the canonical pattern for agent loops — saves boilerplate with zero cost.- Query latency is excellent:
find_by_capabilityis 1.7 ms / 2.7 ms / 10.7 ms p50 at 50/100/500 tools. Evensubstitutes_jaccard(the most expensive query) is under 7 ms at 100 tools. - No agentic framework (LangGraph, LlamaIndex, CrewAI, Semantic Kernel, Haystack, AutoGen, OpenAI Swarm) has graph-native tool selection. GraphForge fills a genuine gap — particularly for tool-chain planning via
PRODUCES → REQUIREStraversal and permission filtering, which embedding retrieval cannot express. - EXISTS {} with correlated WHERE passes the standard case (S16) — the
NOT EXISTS { MATCH ... WHERE NOT dm.name IN $available }pattern works correctly with parameter binding. - LlamaIndex is the best integration target (ObjectRetriever + GraphStore interface). LangGraph has the largest community but requires the most custom code.
5 issues recommended.
2. Code Snippet Pass/Fail Matrix¶
All snippets from both use-case docs run against main as of 2026-05-04.
Validation methodology: Each snippet was extracted and run against a fresh GraphForge() instance. For snippets that use state built by earlier snippets, a fresh context graph was set up before each test.
agent-grounding.md (S1–S12)¶
| ID | Section | Snippet (abbreviated) | Status | Root Cause |
|---|---|---|---|---|
| S1 | Ontology | CREATE (:Class {name: 'Entity'}) × 4 + IS_A |
✅ PASS | — |
| S2 | Tools | CREATE (t:Tool ...), link to Class + Capability |
✅ PASS | — |
| S3 | Query | MATCH (t:Tool)-[:CAN_DO]->(c) + collect(c.name) |
⚠️ PARTIAL | dict(r) returns CypherValue objects — doc comment implies plain str |
| S4 | Hierarchy | MATCH (c:Class)-[:IS_A*0..]->(parent) + MATCH (t:Tool)-[:OPERATES_ON]->(parent) + RETURN DISTINCT |
✅ PASS | Engine bug resolved in v0.3.10 (PR #495): two-MATCH variable c no longer leaked; workaround WITH parent still valid style |
| S5 | Metadata | OPTIONAL MATCH + collect(DISTINCT {map literal}) |
✅ PASS | — |
| S6 | LangChain | Cypher query MATCH (t:Tool) RETURN name/description/endpoint |
⚠️ PARTIAL | Query works; imports from langchain.agents import Tool / from langchain.agents import initialize_agent / from langchain.llms import OpenAI are all deprecated API paths (removed in LangChain 0.2+) |
| S7 | LlamaIndex | MATCH (t:Tool)-[:HAS_PARAMETER]->(p) + collect({map}) |
✅ PASS | Imports from llama_index import GPTSimpleVectorIndex / from llama_index.tools import FunctionTool are pre-v0.10 deprecated paths (requires llama-index-core>=0.10) |
| S8 | Agent class | Intent query WHERE c.name CONTAINS $intent OR t.description CONTAINS $intent |
✅ PASS | — |
| S9 | Multi-step | MATCH (t1)-[:PRODUCES]->(concept)<-[:REQUIRES]-(t2) |
✅ PASS | — |
| S10 | Contextual | MATCH (c:Class)<-[:IS_A*0..]-(entity) WHERE entity.name IN $entities + MATCH (t:Tool)-[:OPERATES_ON]->(c) |
✅ PASS | Engine bug resolved in v0.3.10 (PR #494): RETURN DISTINCT ... ORDER BY t.name no longer raises UndefinedVariable; both ORDER BY t.name and ORDER BY tool alias now work |
| S11 | Permissions | CREATE (:Role) + MATCH ... CREATE (:CAN_USE) + permission filter query |
✅ PASS | — |
| S12 | Quick-start | CREATE (t:Tool ...) then MATCH (t:Tool ...), (i:Class ...) + CREATE (t)-[:OPERATES_ON]->(i) combined in one execute() |
❌ FAIL | Doc bug: variable t bound in CREATE, then re-used in subsequent MATCH within same execute() call — VariableAlreadyBound; must split into two execute() calls |
agent-tool-recall.md (S13–S23)¶
| ID | Section | Snippet (abbreviated) | Status | Root Cause |
|---|---|---|---|---|
| S13 | Build graph | 3-tool graph with full schema (7 node types, 10 edge types) | ⚠️ PARTIAL | Doc bug: DEPENDS_ON edge creation uses CREATE (:Tool {name: 'check_inventory'}) instead of MATCH — produces a duplicate Tool node; results in 4 Tool nodes instead of 3 |
| S14 | Versioning | SUPERSEDES edge creation with SET old.deprecated = true |
✅ PASS | Doc assumes v1/v2 tools were pre-created — silent no-op if they don't exist |
| S15 | Intent query | max(1.0) AS confidence, ORDER BY t.latency_ms ASC |
✅ PASS | NULL CONTAINS 'x' → false (correct); query requires cap.description which is absent on test data — gracefully handled |
| S16 | Executable | NOT EXISTS { MATCH (t)-[:REQUIRES]->(dm) WHERE NOT dm.name IN $available } |
✅ PASS | Expected broken (#474) but passes — correlated EXISTS with parameterized list works |
| S17 | Tool chains | MATCH (t1)-[:PRODUCES]->(dm)<-[:REQUIRES]-(t2) |
❌ FAIL | Doc pattern issue: tools created with inline CREATE (:DataModel {name: 'Product'}) produce two separate nodes with name Product — they cannot match across; fix with MERGE or pre-create shared nodes |
| S18 | Permissions | MATCH (t:Tool)-[:REQUIRES_PERMISSION]->(r:Role) + OPTIONAL MATCH ... collect(cap.name) |
✅ PASS | — |
| S19 | Jaccard | Multi-MATCH aggregation over shared capabilities | ⚠️ PARTIAL | Query executes without error but returns 0 results — same DataModel/Capability node identity problem as S17: verify_stock's query_stock Capability is a separate node from check_inventory's; fix with MERGE for shared nodes |
| S20 | ToolRegistry | Full ToolRegistry class with 5 methods |
❌ FAIL | next_tools() returns empty list — same S17 issue: search_products and check_inventory point to different :DataModel {name: 'Product'} nodes; all other methods pass individually |
| S21 | Retrieval+rank | Python orchestration (Stage 1–4) using ToolRegistry |
✅ PASS | Works when ToolRegistry is backed by correctly-linked graph |
| S22 | Planning | MATCH path = (start:DataModel)-[:REQUIRED_BY*1..4]->(t:Tool) + all() predicate |
⚠️ PARTIAL | Query executes without error; all() on nodes(path) works; but doc bug: REQUIRED_BY relationship is never created (doc creates REQUIRES edges, not REQUIRED_BY) — returns 0 rows |
| S22b | Explain | collect(cap.name) → explain string |
✅ PASS | — |
| S23 | E-commerce | Full pipeline with ToolRegistry(db=db) |
❌ FAIL | Two issues: (1) ToolRegistry class accepts db_path but not db= keyword (doc shows ToolRegistry(db=db) — API mismatch); (2) next_tools() fails due to S17 DataModel node identity issue |
Additional targeted tests¶
| ID | Feature | Status | Note |
|---|---|---|---|
| ADD-1 | Multi-statement CREATE + MATCH + CREATE | ✅ PASS | Multiple MATCH/CREATE clauses in one execute() are fine as long as no variable appears in both a CREATE-binding and a subsequent MATCH pattern |
| ADD-2 | to_dicts() vs execute() CypherValue wrapping |
✅ PASS | to_dicts() returns list[dict[str, Any]] with unwrapped Python values; execute() returns list[dict[str, CypherValue]] |
| ADD-3 | toLower() in WHERE |
✅ PASS | — |
| ADD-4 | LIMIT $limit parameter |
✅ PASS | — |
| ADD-5 | WITH DISTINCT t deduplication |
✅ PASS | Multi-edge match + WITH DISTINCT correctly deduplicates |
Summary: 20 PASS / 5 PARTIAL / 4 FAIL across 29 tests. (S4 and S10 resolved in v0.3.10.)
3. Root Cause Analysis¶
Engine Bug 1: Two-MATCH variable leak (S4, S10) ✅ Resolved in v0.3.10 (PR #495)¶
Resolution: The variable reuse across WITH boundary was fixed in PR #495.
The patterns below now work without workarounds.
Original pattern that raised KeyError: 'c':
gf.execute("""
MATCH (c:Class {name: $entity_class})-[:IS_A*0..]->(parent:Class)
MATCH (t:Tool)-[:OPERATES_ON]->(parent)
RETURN DISTINCT t.name AS tool,
t.description AS description,
parent.name AS operates_on
""", {'entity_class': 'Product'})
Workaround (still valid style, but no longer required): Introduce a WITH
projection between the two MATCH clauses to make scope explicit:
gf.execute("""
MATCH (c:Class)-[:IS_A*0..]->(parent:Class)
WHERE c.name = $entity_class
WITH parent
MATCH (t:Tool)-[:OPERATES_ON]->(parent)
RETURN DISTINCT t.name AS tool, parent.name AS operates_on
""", {'entity_class': 'Product'})
Engine Bug 2: RETURN DISTINCT ORDER BY non-projected variable (S10) ✅ Resolved in v0.3.10 (PR #494)¶
Resolution: ORDER BY on the original variable after RETURN DISTINCT was
fixed in PR #494. Both of the patterns below now work.
Original pattern that raised UndefinedVariable:
# Now works — t.name is derivable from the RETURN alias
gf.execute("""
MATCH (t:Tool)-[:OPERATES_ON]->(c)
RETURN DISTINCT t.name AS tool ORDER BY t.name
""")
Alias-based ORDER BY also still works:
RETURN DISTINCT t.name AS tool ORDER BY tool
Doc Bug 1: VariableAlreadyBound in combined CREATE+MATCH (S12)¶
Pattern that fails:
# VariableAlreadyBound: Variable `t` already bound
gf.execute("""
CREATE (t:Tool {name: 'check_stock', description: 'Check product stock'})
MATCH (t:Tool {name: 'check_stock'}), (i:Class {name: 'Inventory'})
CREATE (t)-[:OPERATES_ON]->(i)
""")
t is introduced by the CREATE pattern-binding. Using t again in a MATCH pattern in the same statement rebinds it. This is actually correct Neo4j/openCypher behavior — MATCH (t:Tool) after a CREATE (t:Tool) re-declares t. The doc should split these into two execute() calls:
gf.execute("CREATE (:Tool {name: 'check_stock', description: 'Check product stock'})")
gf.execute("""
MATCH (t:Tool {name: 'check_stock'}), (i:Class {name: 'Inventory'})
CREATE (t)-[:OPERATES_ON]->(i)
""")
Doc Bug 2: CREATE (:Tool) inside DEPENDS_ON creates duplicate node (S13)¶
Pattern that creates a duplicate:
db.execute("""
CREATE (t:Tool {name: 'place_order', ...})
CREATE (t)-[:DEPENDS_ON {type: 'required'}]->(:Tool {name: 'check_inventory'})
""")
:Tool {name: 'check_inventory'} was already created in a prior execute(). Using CREATE here creates a new, separate node with that name. The fix is MATCH for the target:
db.execute("MATCH (t:Tool {name: 'place_order'}), (dep:Tool {name: 'check_inventory'}) CREATE (t)-[:DEPENDS_ON {type: 'required'}]->(dep)")
Doc Bug 3: Shared nodes need MERGE, not CREATE (S17, S19, S20, S23)¶
The PRODUCES and REQUIRES chain queries depend on DataModel (and Capability) nodes being shared across tools. But the docs create them inline:
CREATE (t)-[:CAN_DO]->(:Capability {name: 'query_stock', category: 'read'})
This creates a new Capability node for each tool, even if another tool already has one with the same name. Queries like MATCH (t1)-[:PRODUCES]->(dm)<-[:REQUIRES]-(t2) then match zero rows because t1 points to dm node #3 and t2 points to dm node #4 (both named 'Product' but different nodes).
Fix: Use MERGE for shared reference nodes, or create them once and link via MATCH:
# Create shared nodes once
db.execute("CREATE (:DataModel {name: 'Product'})")
db.execute("CREATE (:Capability {name: 'query_stock', category: 'read'})")
# Then link via MATCH
db.execute("""
MATCH (t:Tool {name: 'search_products'}), (dm:DataModel {name: 'Product'})
CREATE (t)-[:PRODUCES]->(dm)
""")
db.execute("""
MATCH (t:Tool {name: 'check_inventory'}), (dm:DataModel {name: 'Product'})
CREATE (t)-[:REQUIRES]->(dm)
""")
Doc Bug 4: REQUIRED_BY relationship never created (S22)¶
The multi-step planning snippet queries (start:DataModel)-[:REQUIRED_BY*1..4]->(:Tool) but the tool graph only has REQUIRES edges in the direction (Tool)-[:REQUIRES]->(DataModel). REQUIRED_BY is the logical inverse but is never created. Either:
- Create REQUIRED_BY edges: MATCH (t)-[:REQUIRES]->(dm) CREATE (dm)-[:REQUIRED_BY]->(t) — but this adds edge duplication.
- Traverse REQUIRES in reverse direction: (start:DataModel)<-[:REQUIRES*1..4]-(t:Tool).
Doc Bug 5: ToolRegistry(db=db) keyword not in class definition (S23)¶
The e-commerce example passes an existing GraphForge instance as ToolRegistry(db=db), but the ToolRegistry class definition only accepts db_path:
class ToolRegistry:
def __init__(self, db_path=None): # No 'db' parameter!
self.db = GraphForge(db_path) if db_path else GraphForge()
The fix is to accept an optional db keyword:
def __init__(self, db_path=None, db=None):
self.db = db if db else (GraphForge(db_path) if db_path else GraphForge())
4. Tool Registry Benchmark¶
Methodology: Synthetic registries built with shared DataModel, Capability, Role, and Domain nodes (created once, linked via MATCH). Each tool has 1–3 capabilities, 0–2 data model requirements, 1 produces link, 1 role permission, and 1 domain. ~5% of tools marked deprecated. 20 iterations per query, reporting p50 and p95 wall-clock milliseconds.
Registry Characteristics¶
| Scale | Tool nodes | Total nodes | Build time |
|---|---|---|---|
| 50 tools | 50 | 94 | 0.33 s |
| 100 tools | 100 | 144 | 0.27 s |
| 500 tools | 500 | 544 | 2.30 s |
| 1000 tools | 1000 | 1044 | ~4.5 s |
Infrastructure nodes: 8 domains + 20 capabilities + 12 data models + 4 roles = 44 nodes (shared across all tools).
Query Latency (p50 / p95, milliseconds)¶
| Query | 50 tools | 100 tools | 500 tools | 1000 tools |
|---|---|---|---|---|
find_by_capability |
1.74 / 6.52 | 2.74 / 2.99 | 10.68 / 11.14 | 20.51 / 28.23 |
get_prerequisites |
0.32 / 0.38 | 0.40 / 0.57 | 0.99 / 1.14 | 1.81 / 2.14 |
next_tools |
0.59 / 0.84 | 0.71 / 0.85 | 1.72 / 1.81 | 3.04 / 3.31 |
authorized_tools |
0.59 / 0.76 | 0.82 / 0.99 | 2.62 / 2.81 | 4.92 / 5.40 |
substitutes_jaccard (original) |
2.78 / 3.76 | 6.03 / 12.46 | 135 / 144 | 226 / 248 |
substitutes_jaccard (optimized) |
— | — | — | 8.9 / 9.4 |
SIMILAR_TO lookup (pre-computed) |
— | — | — | 3.3 / 3.5 |
Jaccard Deep-Dive: Root Cause and Solutions¶
The original Jaccard query degrades severely at scale. A 1000-tool registry with typical density (1–3 capabilities per tool, 20 shared capabilities) gives 110 candidate tools sharing at least one capability with a given tool. This causes:
Phase 1 (find tools sharing a capability): 5 ms — O(candidates)
Phase 2 (count target tool's capabilities): 221 ms — O(candidates × target_cap_count) — this is the bottleneck
Phase 3 (count each alt's capabilities): folded into phase 2
Phase 2 executes the MATCH (t2:Tool {name: tname})-[:CAN_DO]->(tc) scan 110 times (once per candidate row), because target_total is not computed before entering the cross-join. This is an N×M scan disguised as three sequential WITH clauses.
Fix: pre-compute target_total before the cross-join:
-- Original: target re-scanned once per candidate (O(N×M))
MATCH (t1:Tool {name: $name})-[:CAN_DO]->(shared)<-[:CAN_DO]-(alt:Tool) ...
WITH alt, count(shared) AS shared_caps, $name AS tname
MATCH (t2:Tool {name: tname})-[:CAN_DO]->(tc) ← runs N times
...
-- Optimized: target scanned once, result passed forward (O(N+M))
MATCH (target:Tool {name: $name})-[:CAN_DO]->(tc:Capability)
WITH count(tc) AS target_total, $name AS tname ← runs once
MATCH (t1:Tool {name: tname})-[:CAN_DO]->(shared)<-[:CAN_DO]-(alt:Tool) ...
Benchmark result: 226 ms → 8.9 ms — 25× speedup at 1000 tools with identical results.
Pre-computed SIMILAR_TO edges reduce hot-path lookup to 3.3 ms (2.7× faster than optimized on-demand). The pre-compute step takes 5 s to find all 76 000 pairs above Jaccard 0.25, plus ~300 s to write edges individually. With bulk ingest via UNWIND, the write step drops to ~3 s. Appropriate for: agent startup, nightly refresh, or on-demand when new tools are added.
Recommendation by use case:
- Hot-path agent loop (< 200 tools): optimized on-demand Jaccard query (< 6 ms)
- Hot-path agent loop (200–1000 tools): pre-computed SIMILAR_TO edges, refreshed on tool changes
- Offline analysis / admin UI: either approach
Recommended ToolRegistry Query Patterns¶
The correct find_by_capability query (no two-MATCH variable leak, WITH DISTINCT dedup):
def find_by_capability(self, capability: str, limit: int = 5) -> list[dict]:
return self.db.to_dicts("""
MATCH (t:Tool)-[:CAN_DO]->(cap:Capability)
WHERE toLower(cap.name) CONTAINS toLower($cap)
OR toLower(cap.description) CONTAINS toLower($cap)
OR toLower(t.description) CONTAINS toLower($cap)
WITH DISTINCT t
WHERE t.deprecated = false
RETURN t.name AS name, t.description AS description,
t.endpoint AS endpoint, t.latency_ms AS latency_ms
ORDER BY t.latency_ms ASC
LIMIT $limit
""", {'cap': capability, 'limit': limit})
The optimized Jaccard query (pre-computes target_total once, 25× faster at 1000 tools):
MATCH (target:Tool {name: $name})-[:CAN_DO]->(tc:Capability)
WITH count(tc) AS target_total, $name AS tname
MATCH (t1:Tool {name: tname})-[:CAN_DO]->(shared:Capability)<-[:CAN_DO]-(alt:Tool)
WHERE alt.name <> tname AND alt.deprecated = false
WITH alt, count(shared) AS shared_caps, target_total
MATCH (alt)-[:CAN_DO]->(ac:Capability)
WITH alt, shared_caps, target_total, count(ac) AS alt_total
WITH alt, toFloat(shared_caps) / (target_total + alt_total - shared_caps) AS jaccard
RETURN alt.name AS name, jaccard AS similarity
ORDER BY jaccard DESC
LIMIT $k
5. Ergonomics Assessment¶
execute() vs to_dicts() for Agent Loops¶
Overhead measurement (find_by_capability query, 5-result set, p50):
| Scale | execute() p50 (ms) |
to_dicts() p50 (ms) |
Overhead |
|---|---|---|---|
| 50 tools | 1.711 | 1.720 | +0.5% |
| 100 tools | 2.725 | 2.748 | +0.8% |
| 500 tools | 10.897 | 10.918 | +0.2% |
Tight-loop measurement (1000 iterations, find_by_capability query):
| Method | Total (ms) | Per-iter (ms) |
|---|---|---|
execute() + manual .value × 3 per row |
277.7 | 0.278 |
to_dicts() auto-unwrap |
279.0 | 0.279 |
Conclusion: to_dicts() has effectively zero overhead. It should be the canonical pattern for agent code. The .value access pattern shown in agent-tool-recall.md (10+ occurrences) should be replaced.
CypherValue Leakage in execute() Results¶
The agent-grounding.md doc uses dict(r) and writes:
# Returns: [{'tool': 'check_inventory', 'description': '...', 'capabilities': [...]}]
But dict(r) returns {'tool': CypherString('check_inventory'), ...}. The comment implies plain Python types. This is a documentation accuracy issue — either update the comment or switch to to_dicts().
In agent-tool-recall.md, the ToolRegistry class explicitly unwraps with .value:
return [{'name': r['name'].value, 'description': r['description'].value,
'endpoint': r['endpoint'].value} for r in rows]
This is correct but verbose. With to_dicts():
return self.db.to_dicts("MATCH ...", params)
# Returns [{'name': 'check_inventory', 'description': '...', 'endpoint': '...'}] directly
Framework Integration Boilerplate¶
Converting a GraphForge to_dicts() result into each framework's tool format:
LangGraph (LangChain core):
from langchain_core.tools import StructuredTool
def gf_to_langchain(tool_data: list[dict]) -> list:
return [
StructuredTool.from_function(
name=t['name'],
description=t['description'],
func=lambda **kwargs, endpoint=t['endpoint']: call_api(endpoint, kwargs)
)
for t in tool_data
]
# Usage in LangGraph node:
def agent_node(state):
tools_data = gf.to_dicts("MATCH (t:Tool)-[:CAN_DO]->(c) WHERE ...", params)
llm_with_tools = llm.bind_tools(gf_to_langchain(tools_data))
return {'messages': [llm_with_tools.invoke(state['messages'])]}
LlamaIndex:
from llama_index.core.tools import FunctionTool
def gf_to_llamaindex(tool_data: list[dict]) -> list:
return [
FunctionTool.from_defaults(
name=t['name'],
description=t['description'],
fn=lambda endpoint=t['endpoint'], **kwargs: call_api(endpoint, kwargs)
)
for t in tool_data
]
Semantic Kernel:
# SK maps cleanly: Domain -> Plugin, Tool -> KernelFunction
# Use graph to populate FunctionChoiceBehavior filters:
def get_sk_filter(gf, user_role: str) -> dict:
tools = gf.to_dicts("""
MATCH (t:Tool)-[:REQUIRES_PERMISSION]->(r:Role {name: $role})
WHERE t.deprecated = false
RETURN t.name AS name
""", {'role': user_role})
return {"included_functions": [t['name'] for t in tools]}
settings.function_choice_behavior = FunctionChoiceBehavior.Auto(
filters=get_sk_filter(gf, 'customer')
)
Haystack:
from haystack import component
from haystack.tools import Tool
@component
class GraphForgeToolRetriever:
def __init__(self, gf):
self.gf = gf
@component.output_types(tools=list)
def run(self, query: str, user_role: str = 'user'):
results = self.gf.to_dicts("""
MATCH (t:Tool)-[:CAN_DO]->(c:Capability)
WHERE c.name CONTAINS $query AND t.deprecated = false
RETURN t.name AS name, t.description AS description, t.endpoint AS endpoint
LIMIT 5
""", {'query': query})
tools = [Tool(name=r['name'], description=r['description'],
parameters={}, function=lambda: None) for r in results]
return {'tools': tools}
Boilerplate comparison: All four adapters require 8–12 lines to bridge to_dicts() output to the framework's tool format. This is reasonable for user-space code and does not require a shipped graphforge.integrations module.
6. Framework Integration Analysis¶
Comparison Matrix¶
| Framework | Tool Schema | Dynamic Selection | Graph-Native | GraphForge Integration Point |
|---|---|---|---|---|
| LangGraph | Pydantic/JSON Schema | Manual (rebind per call) | No | Custom retriever before bind_tools() |
| LlamaIndex | ToolMetadata + fn_schema | YES (ObjectRetriever) | Partial (GraphStore) | Custom ObjectRetriever + GraphStore impl |
| CrewAI | Pydantic args_schema | No (static per-agent) | No | Dynamic agent factory from graph |
| OpenAI Swarm | Docstring + type hints | No (static per-agent) | No | Agent handoff routing from graph |
| AutoGen | Type hints + docstring | Partial (v0.4+) | No | Shared tool registry, agent selector |
| Semantic Kernel | Plugin/Function + filters | YES (filter system) | No | FunctionChoiceBehavior filters from graph |
| Haystack | JSON Schema dict | Partial (custom component) | No | Pipeline component for tool retrieval |
Key Finding: The Gap GraphForge Fills¶
No framework has graph-native tool selection built in. All frameworks fall back to one of: 1. Pass all tools to the LLM context (breaks beyond ~20 tools — accuracy degrades) 2. Embedding retrieval of tool descriptions (misses structural relationships)
GraphForge provides what embeddings cannot:
| Capability | Embedding Retrieval | GraphForge Traversal |
|---|---|---|
| "Find tools for inventory" | ✅ Keyword/semantic match | ✅ MATCH (t)-[:CAN_DO]->(c) WHERE c.name CONTAINS ... |
| "What data does this tool produce?" | ❌ Not expressed in description | ✅ MATCH (t)-[:PRODUCES]->(dm) |
| "Can I call this tool?" | ❌ Not in tool schema | ✅ MATCH (t)-[:REQUIRES_PERMISSION]->(r:Role {name: $role}) |
| "What tools can follow this one?" | ❌ No data flow model | ✅ MATCH (t)-[:PRODUCES]->(dm)<-[:REQUIRES]-(next) |
| "This tool is deprecated, what replaced it?" | ❌ Not standard metadata | ✅ MATCH (new)-[:SUPERSEDES]->(old:Tool {name: $name}) |
| "What tools are not yet callable?" | ❌ No precondition model | ✅ NOT EXISTS { MATCH (t)-[:REQUIRES]->(dm) WHERE NOT dm.name IN $available } |
Priority Integration Targets¶
LlamaIndex (Priority 1): The closest existing fit. ObjectRetriever is specifically designed to be replaced with a custom retrieval backend. GraphStore interface (query(), get(), get_rel_map()) could be implemented directly. Already understands KGs — less user education needed.
Semantic Kernel (Priority 2): The most structured tool model of any framework. Plugin = Domain, KernelFunction = Tool maps cleanly to the GraphForge schema. FunctionChoiceBehavior filter system designed for exactly this use case — select which functions to expose per request.
LangGraph (Priority 3): Largest community (most impact), but requires most custom code. Pattern: GraphForge as the "tool memory" that informs bind_tools() before each agent step. The PRODUCES → REQUIRES chain maps naturally to LangGraph's conditional edge routing.
Haystack (Priority 4): Cleanest component architecture — @component decorator makes GraphForgeToolRetriever a first-class pipeline stage. Good for hybrid RAG + tool selection pipelines.
Tool Chain Planning — The Killer Feature¶
None of the frameworks express tool composition via data flow. The pattern:
MATCH (t1:Tool {name: 'search_products'})-[:PRODUCES]->(dm:DataModel)<-[:REQUIRES]-(t2:Tool)
WHERE t2.deprecated = false
RETURN t2.name AS next_tool, dm.name AS via
ORDER BY t2.cost_per_call ASC
Returns [{next_tool: 'check_inventory', via: 'Product'}, {next_tool: 'place_order', via: 'Product'}] — the agent knows exactly which tools can follow search_products and why. No prompt engineering or embedding similarity can produce this.
This is demonstrated at benchmark latency of 0.6–1.7 ms at 50–500 tools.
7. Competitive Analysis¶
| System | Embedded? | Graph queries? | Built-in algorithms? | Tool selection model |
|---|---|---|---|---|
| GraphForge | ✅ in-process | ✅ openCypher | None (GDS planned) | Full Cypher — semantic + structural |
| Neo4j | ❌ server | ✅ openCypher | ✅ GDS plugin | Used as backend by LangGraph/LlamaIndex integrations |
| Memgraph | ❌ server | ✅ openCypher | ✅ MAGE algos | Same as Neo4j |
| KùzuDB | ✅ in-process | ✅ Cypher-like | None | Same embedded advantage; no dataset library |
| LangGraph | ✅ in-process | ❌ state machines | N/A | LLM function calling only |
| Vector DB only | ❌ typically cloud | ❌ similarity only | N/A | Embedding cosine similarity |
GraphForge's differentiated position: The only embedded-in-process graph database with openCypher, a curated dataset library, and Python-first design. For agent development (notebook / script / serverless), this combination eliminates the Neo4j server deployment that blocks most agentic graph integrations.
Scaling ceiling: The use-case doc correctly states "10,000+ tools without configuration changes." Benchmark confirms: 500-tool registry with all query patterns performs acceptably (all < 12 ms except Jaccard). For the typical tool library (50–200 tools), all queries are under 3 ms.
8. Recommendations¶
| ID | Recommendation | Priority | Effort | Issue |
|---|---|---|---|---|
| R-1 | Fix agent-grounding.md S4 workaround: add WITH parent between MATCH clauses; fix S10 to use ORDER BY tool alias; fix S12 to split CREATE/MATCH into two execute() calls |
Critical | Low | New #doc-fix |
| R-2 | Fix agent-tool-recall.md: (1) DEPENDS_ON to use MATCH; (2) create DataModel/Capability nodes once then MATCH; (3) fix REQUIRED_BY to REQUIRES (reversed); (4) add db= parameter to ToolRegistry |
Critical | Low | New #doc-fix |
| R-3 | Replace all .value unwrapping in agent-tool-recall.md ToolRegistry class with to_dicts() |
High | Low | Same #doc-fix |
| R-4 | Update LangChain imports from langchain.agents to langchain_core.tools / langgraph.prebuilt |
High | Low | Same #doc-fix |
| R-5 | Update LlamaIndex imports from llama_index.* to llama_index.core.* |
High | Low | Same #doc-fix |
| R-6 | File engine bug: two-MATCH variable leak causing KeyError on variable not in second clause's scope |
High | Medium | New engine bug |
| R-7 | File engine bug: RETURN DISTINCT ... ORDER BY <variable> raises UndefinedVariable — ORDER BY alias should be used |
Medium | Low | New engine bug or existing |
| R-8 | Pre-compute SIMILAR_TO edges for Jaccard similarity — add to ToolRegistry as offline step |
Medium | Low | Doc enhancement |
| R-9 | Add bulk ingest example to agent-tool-recall.md using UNWIND $tools AS t CREATE (:Tool {name: t.name, ...}) to reduce build time from 2.3 s to ~0.3 s at 500 tools |
Low | Low | Same #doc-fix |
Shipped ToolRegistry utility vs user-space: Keep as user-space code in docs for now. The class as documented has 4 correctness issues (S13, S17, S22, S23). Until those doc bugs are fixed, shipping the class would require first fixing it — at which point the corrected version belongs in the updated doc rather than in the library. Revisit after doc fixes land and a clean, tested version exists.
9. Issues Filed¶
| Issue | Title | Type | Priority |
|---|---|---|---|
| #486 | doc: fix agent-grounding.md and agent-tool-recall.md snippets (5 bugs) | doc-bug | high |
| #482 | fix: variable reuse across WITH boundary raises KeyError (covers two-MATCH variable leak) | engine-bug | high |
| #481 | fix: RETURN DISTINCT ... ORDER BY <var> raises UndefinedVariable |
engine-bug | medium |
Appendix A: Working ToolRegistry Reference Implementation¶
A corrected ToolRegistry class that passes all validation tests, using to_dicts() throughout, with MERGE-style shared nodes and proper db= support:
from graphforge import GraphForge
class ToolRegistry:
"""GraphForge-backed tool registry for LLM agents.
Corrected from agent-tool-recall.md — uses shared DataModel/Capability nodes
and to_dicts() for ergonomic result consumption.
"""
def __init__(self, db_path: str | None = None, db: GraphForge | None = None):
self.db = db if db else (GraphForge(db_path) if db_path else GraphForge())
def find_by_capability(self, capability: str, limit: int = 5) -> list[dict]:
return self.db.to_dicts("""
MATCH (t:Tool)-[:CAN_DO]->(cap:Capability)
WHERE toLower(cap.name) CONTAINS toLower($cap)
OR toLower(cap.description) CONTAINS toLower($cap)
OR toLower(t.description) CONTAINS toLower($cap)
WITH DISTINCT t
WHERE t.deprecated = false
RETURN t.name AS name, t.description AS description,
t.endpoint AS endpoint, t.latency_ms AS latency_ms
ORDER BY t.latency_ms ASC
LIMIT $limit
""", {'cap': capability, 'limit': limit})
def get_prerequisites(self, tool_name: str) -> list[str]:
rows = self.db.to_dicts("""
MATCH (t:Tool {name: $name})-[:REQUIRES]->(dm:DataModel)
RETURN dm.name AS data_model
""", {'name': tool_name})
return [r['data_model'] for r in rows]
def next_tools(self, tool_name: str) -> list[dict]:
return self.db.to_dicts("""
MATCH (t:Tool {name: $name})-[:PRODUCES]->(dm:DataModel)<-[:REQUIRES]-(next:Tool)
WHERE next.deprecated = false
RETURN next.name AS name, next.description AS description, dm.name AS via
""", {'name': tool_name})
def authorized_tools(self, role: str) -> list[str]:
rows = self.db.to_dicts("""
MATCH (t:Tool)-[:REQUIRES_PERMISSION]->(r:Role {name: $role})
WHERE t.deprecated = false
RETURN t.name AS name
ORDER BY t.name
""", {'role': role})
return [r['name'] for r in rows]
def substitutes(self, tool_name: str, top_k: int = 3) -> list[dict]:
# Note: Uses WITH to pass tool name across MATCH boundaries (avoids KeyError bug)
return self.db.to_dicts("""
MATCH (t1:Tool {name: $name})-[:CAN_DO]->(shared:Capability)<-[:CAN_DO]-(alt:Tool)
WHERE alt.name <> $name AND alt.deprecated = false
WITH alt, count(shared) AS shared_caps, $name AS tname
MATCH (t2:Tool {name: tname})-[:CAN_DO]->(tc:Capability)
WITH alt, shared_caps, count(tc) AS target_total
MATCH (alt)-[:CAN_DO]->(ac:Capability)
WITH alt, shared_caps, target_total, count(ac) AS alt_total
WITH alt, toFloat(shared_caps) / (target_total + alt_total - shared_caps) AS jaccard
RETURN alt.name AS name, jaccard AS similarity
ORDER BY jaccard DESC
LIMIT $k
""", {'name': tool_name, 'k': top_k})
Setting up the registry correctly (shared nodes pattern):
db = GraphForge("tools.db") # or GraphForge() for in-memory
# 1. Create shared reference nodes ONCE
for domain in ['inventory', 'catalog', 'orders']:
db.execute(f"CREATE (:Domain {{name: '{domain}'}})")
for dm in ['Product', 'Order', 'PaymentMethod']:
db.execute(f"CREATE (:DataModel {{name: '{dm}'}})")
for cap in ['query_stock', 'search_products', 'purchase']:
db.execute(f"CREATE (:Capability {{name: '{cap}', category: 'read'}})")
for role, level in [('customer', 1), ('admin', 10)]:
db.execute(f"CREATE (:Role {{name: '{role}', level: {level}}})")
# 2. Create tools (no inline CREATE for shared nodes)
db.execute("""
CREATE (t:Tool {
name: 'check_inventory',
description: 'Check current stock levels for a product',
endpoint: 'api.inventory.check',
version: '2.1',
cost_per_call: 1,
latency_ms: 45,
deprecated: false
})
""")
db.execute("""
MATCH (t:Tool {name: 'check_inventory'}), (cap:Capability {name: 'query_stock'}),
(dm:DataModel {name: 'Product'}), (r:Role {name: 'customer'})
CREATE (t)-[:CAN_DO]->(cap)
CREATE (t)-[:REQUIRES]->(dm)
CREATE (t)-[:REQUIRES_PERMISSION]->(r)
""")
registry = ToolRegistry(db=db)
Appendix B: Validated Framework Integration Snippets¶
LangGraph — Dynamic Tool Binding¶
from langchain_core.tools import StructuredTool
from langgraph.graph import StateGraph, MessagesState
from langgraph.prebuilt import ToolNode, tools_condition
from graphforge import GraphForge
gf = GraphForge("tools.db")
registry = ToolRegistry(db=gf)
def build_tools_for_context(intent: str, user_role: str) -> list:
"""Query GraphForge to get relevant tools, convert to LangChain format."""
tool_data = registry.find_by_capability(intent)
authorized = set(registry.authorized_tools(user_role))
tool_data = [t for t in tool_data if t['name'] in authorized]
return [
StructuredTool.from_function(
name=t['name'],
description=t['description'],
func=lambda **kwargs, ep=t['endpoint']: call_api(ep, kwargs)
)
for t in tool_data
]
def agent_node(state: MessagesState):
last_msg = state['messages'][-1].content
tools = build_tools_for_context(intent=last_msg, user_role='customer')
llm_with_tools = llm.bind_tools(tools)
return {'messages': [llm_with_tools.invoke(state['messages'])]}
LlamaIndex — Custom ObjectRetriever¶
from llama_index.core.objects import ObjectRetriever
from llama_index.core.tools import FunctionTool
from graphforge import GraphForge
class GraphForgeToolRetriever(ObjectRetriever):
def __init__(self, gf: GraphForge):
self.registry = ToolRegistry(db=gf)
def retrieve(self, query_str: str) -> list[FunctionTool]:
tool_data = self.registry.find_by_capability(query_str)
return [
FunctionTool.from_defaults(
name=t['name'],
description=t['description'],
fn=lambda **kwargs, ep=t['endpoint']: call_api(ep, kwargs)
)
for t in tool_data
]
# Usage in ReActAgent:
from llama_index.core.agent import ReActAgent
retriever = GraphForgeToolRetriever(gf=GraphForge("tools.db"))
agent = ReActAgent.from_tools(tool_retriever=retriever, llm=llm, verbose=True)