ArchitectureLOG_016///NOV 17, 2025///8 min READ

The Meta-MCP Architecture: Reducing Token Usage by 97%

We moved from monolithic agent prompts to a decentralized tool orchestration engine. Here is the technical breakdown of how we achieved massive parallelization and cost reduction.

Danial Hasan
Chief Architect @ Squad

When we started building Squad, our "Director" agent was a monolith. It had a massive system prompt containing definitions for every tool in our ecosystem—Github, Slack, Jira, and our internal APIs.

The problem? Context pollution. Every time the Director needed to make a simple decision, it was burning 150k tokens just to remember how to use the Jira API. Latency was 10s+, and costs were astronomical.

We realized we didn't need a smarter model. We needed a better tool interface.

The "Code Execution" Pattern

Inspired by Anthropic's research, we shifted from "Text-Based Tool Calling" (where the LLM outputs JSON to be parsed) to "Code Execution" (where the LLM writes Python scripts).

Instead of giving the agent 500 tool definitions, we gave it one tool: a Python Workbench.

def execute_workbench(code: str):
  # 1. Sandbox the execution environment
  sandbox = create_isolated_env()
  
  # 2. Inject helper functions (Meta-Tools)
  sandbox.inject({
    'search_tools': search_registry,
    'call_tool': execute_mcp_tool,
    'create_receipt': sign_evidence
  })
  
  # 3. Run the agent's code
  return sandbox.run(code)

This shift was profound. The agent no longer needed to know the schema of every tool upfront. It could discover tools at runtime.

Lazy Loading the Tool Registry

We built "Meta-MCP"—a Model Context Protocol server that acts as a registry.

  • The Agent asks: "I need to check a PR status."
  • It calls search_tools("github pr").
  • Meta-MCP returns only the schemas for github.get_pr and github.list_prs.
  • The Agent writes code to call those specific tools.

This reduced our persistent context from 150k tokens to just 5k tokens. The agent only loads what it needs, when it needs it.

Architecture Diagram

DIRECTOR
AGENT
(5k Context)
PYTHON WORKBENCH
> search_tools()
> execute_code()
> emit_receipt()
Github MCP
Linear MCP
Postgres MCP
Fig 1. The Meta-MCP Discovery Loop

The Impact on Reliability

Beyond token savings, this architecture improved reliability. When an agent writes code to call a tool, it can handle errors programmatically.

If github.get_pr fails, the Python script can catch the exception and try a different search query—without needing a full LLM round-trip inference for the retry logic. The "Thought-Action" loop becomes a tight code execution loop.

"We treat Agent actions not as text generation, but as remote procedure calls with error handling."

What's Next

We are currently rolling out Meta-MCP v2, which includes "Recipe Caching." If the Director successfully plans a feature, we cache the Python script. Next time a similar request comes in, we skip the planning phase entirely and just execute the cached recipe.

This is how we achieve the "Squad Builds Squad" velocity.