Wiring up Claude agents to MCP tool servers in production

Anthropic launched the Model Context Protocol in November 2024. Since then, the community has built thousands of MCP servers, SDKs exist for every major language, and OpenAI, VS Code, Cursor, and ChatGPT all support the protocol. MCP is the de facto standard for connecting AI agents to external tools and data.

But most tutorials stop at "add this to your Claude Desktop config." That is not production. This guide walks you through setting up MCP tool servers that a Claude agent can call programmatically, from first tool call to deployment checklist.

What MCP actually is (and is not)

MCP is a client-server protocol built on JSON-RPC 2.0. An MCP host (your application) creates an MCP client for each MCP server it connects to. Each server exposes tools, resources, and prompts through a standardized interface.

According to the official architecture docs, MCP follows a client-server architecture where "an MCP host, an AI application like Claude Code or Claude Desktop, establishes connections to one or more MCP servers."

Three capabilities a server can expose:

Tools: Functions the LLM can call (with user approval in interactive contexts)
Resources: File-like data the client can read, such as API responses or documents
Prompts: Pre-written templates for specific tasks

Two transport mechanisms exist:

stdio: Standard input/output for local processes on the same machine. No network overhead. This is what Claude Desktop uses when it spawns a local server.
Streamable HTTP: Introduced in the March 2025 spec update, replacing the older SSE transport. This is what you use for remote, multi-client production deployments.

Your first MCP server in Python

You are going to build a minimal MCP server using the official Python SDK. I am picking Python because the FastMCP class uses type hints and docstrings to auto-generate tool definitions, which saves you from writing JSON schemas by hand.

Requirements: Python 3.10+, MCP SDK 1.2.0+

Install the tooling:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv init my-tools && cd my-tools
uv venv && source .venv/bin/activate
uv add "mcp[cli]" httpx

Expected output after uv add:

Resolved 12 packages in 1.2s
Installed 12 packages in 0.8s
 + mcp[cli] 1.2.x
 + httpx 0.27.x

Create server.py:

from mcp.server.fastmcp import FastMCP
import httpx

mcp = FastMCP("my-tools")

@mcp.tool()
async def lookup_user(user_id: str) -> str:
    """Look up a user by their ID in the company directory.

    Args:
        user_id: The employee ID (e.g. EMP-1234)
    """
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"https://api.internal.example.com/users/{user_id}",
            timeout=10.0
        )
        resp.raise_for_status()
        data = resp.json()
        return f"Name: {data['name']}, Dept: {data['department']}"

def main():
    mcp.run(transport="stdio")

if __name__ == "__main__":
    main()

Two things to note. First, the docstring becomes the tool description that Claude sees, so write it like you are explaining the tool to a colleague. Second, for stdio servers, never write to stdout. The MCP SDK uses stdout for JSON-RPC messages. Use logging or print(..., file=sys.stderr) for debug output.

Run it: uv run server.py. The process will sit waiting for JSON-RPC messages on stdin. That is correct behavior.

Troubleshooting: If you see ModuleNotFoundError: No module named 'mcp', your virtual environment is not activated. Run source .venv/bin/activate and try again.

Connecting Claude to your server via the Agent SDK

Anthropic publishes the Claude Agent SDK on npm and the claude-agent-sdk on PyPI. This is the programmatic way to run Claude as an agent with MCP servers attached.

Here is the TypeScript version, because the Agent SDK's TypeScript ergonomics are stronger for MCP configuration:

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Look up employee EMP-1234 and summarize their info",
  options: {
    mcpServers: {
      "my-tools": {
        command: "uv",
        args: ["--directory", "/absolute/path/to/my-tools", "run", "server.py"]
      }
    },
    allowedTools: ["mcp__my-tools__lookup_user"]
  }
})) {
  if (message.type === "result" && message.subtype === "success") {
    console.log(message.result);
  }
}

The allowedTools array is mandatory. Without it, Claude sees the tools but cannot call them. The naming convention is mcp__<server-name>__<tool-name>. You can use wildcards: mcp__my-tools__* allows all tools from that server.

For remote HTTP-based servers, swap the config:

mcpServers: {
  "my-tools": {
    type: "http",
    url: "https://my-mcp-server.example.com/mcp",
    headers: {
      Authorization: `Bearer ${process.env.MCP_TOKEN}`
    }
  }
}

You can also use a .mcp.json file at your project root instead of inline config. The SDK loads it automatically:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}

The token cost problem (and how to solve it)

Here is where production gets real. Anthropic's own engineering blog documents the problem: tool definitions can consume 50,000+ tokens before an agent reads a single user message. A five-server setup with GitHub, Slack, Sentry, Grafana, and Splunk eats approximately 55,000 tokens in definitions alone, according to Anthropic's advanced tool use post.

At Anthropic internally, they reported seeing tool definitions consume 134,000 tokens before optimization.

Two features address this:

Tool Search Tool. Instead of loading all definitions upfront, mark tools with defer_loading: true. Claude gets a search tool (about 500 tokens) and discovers the rest on demand. Anthropic reports this preserves 95% of the context window and improved accuracy on MCP evaluations, with Opus 4 going from 49% to 74% and Opus 4.5 from 79.5% to 88.1%.

{
  "tools": [
    {"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
    {
      "name": "github.createPullRequest",
      "description": "Create a pull request",
      "input_schema": {},
      "defer_loading": true
    }
  ]
}

Code execution with MCP. Rather than passing every intermediate result through the context window, Claude writes code that calls tools directly in an execution environment. Anthropic's engineering team documented a case where presenting MCP servers as code APIs reduced token usage from 150,000 tokens to 2,000, a 98.7% reduction. Cloudflare published similar findings, calling this pattern "Code Mode."

Pulumi's March 2026 DevOps guide reinforces this: code execution is "a massive token reduction, faster, and more flexible. You're giving the agent the ability to generate its own capabilities at runtime by writing code to interact with APIs." (Source)

Production checklist

Before you deploy MCP servers that real users depend on, work through this list.

Transport choice. Use stdio for local development and single-user tools. Use Streamable HTTP for anything deployed to a server. The MCP spec version 2025-11-25 is the current release, per the 2026 MCP Roadmap. Streamable HTTP replaced the older SSE transport and supports horizontal scaling, though the roadmap notes that stateful sessions and load balancers remain active areas of work.

Authentication. For remote servers, MCP recommends OAuth for obtaining tokens. At minimum, use bearer tokens in the authorization_token field or custom headers. Never ship an MCP server that accepts unauthenticated connections to anything with write access.

Tool scoping. Use allowedTools with specific tool names rather than blanket wildcards in production. mcp__db__query is better than mcp__db__* when you do not want the agent calling mcp__db__drop_table.

Error handling. Your tool functions should catch exceptions and return meaningful error strings. If your tool throws an unhandled exception, the MCP server process may crash, and you lose the agent's entire context. Wrap external API calls in try/except blocks with timeouts.

Logging. For stdio servers: never write to stdout. This corrupts JSON-RPC messages. Use stderr or a logging library configured to write to files. For HTTP servers, standard output logging is fine.

Token budget. If you are connecting more than 10 tools, use Tool Search Tool with defer_loading: true. Do not load 50+ tool definitions into context. You are burning money and reducing accuracy.

Testing. Use the MCP Inspector to test your server before connecting it to an agent. It lets you send requests and see responses without involving an LLM.

Where MCP is headed

The 2026 MCP Roadmap identifies four priority areas: transport evolution and scalability, agent communication (including the experimental Tasks primitive), governance maturation, and enterprise readiness covering audit trails, SSO, and gateway behavior. The roadmap also notes that MCP now "runs in production at companies large and small" and is "shaped by a growing community through Working Groups, Spec Enhancement Proposals (SEPs), and a formal governance process."

LeadDev's March 2026 AI coding tools roundup highlights MCP-connected agents as the emerging standard for production AI coding workflows, with Claude Code's skills and MCP integration leading the category. (Source)

The protocol is moving fast but the foundation is stable enough to build on. The spec, the SDKs, and the ecosystem of pre-built servers for GitHub, Slack, Postgres, filesystem access, and dozens of other integrations are all production-ready. The gap is not tooling. The gap is teams knowing how to wire it up properly and run it without hemorrhaging tokens.

That is what this guide is for.

Sage Thornton covers developer tools and infrastructure for The Daily Vibe.

Wiring up Claude agents to MCP tool servers in production

What MCP actually is (and is not)

Your first MCP server in Python

Connecting Claude to your server via the Agent SDK

The token cost problem (and how to solve it)

Production checklist

Where MCP is headed

Related Articles

RSAC 2026 turned "agentic security" into a product category. The hard problems are still unsolved.

Buy-side and sell-side agents are talking directly. DSPs weren't invited.

OpenAI signs Smartly to build conversational ads inside ChatGPT