AI Agents Should Never Hold Real API Keys: Use Phantom Tokens

There are two mental models for securing AI agents that have access to external APIs. Most teams use the wrong one. Understanding the difference is not a matter of tooling preference; it determines whether your credentials can be stolen at all.

The subtraction model (and why it fails)

The dominant approach to agent security goes like this: give the agent your real API key, then layer controls on top to limit what it can do with it. Sandbox the process. Add prompt guards. Audit outbound traffic. Put a firewall in front.

This is the subtraction model. You start with full access and subtract what you don’t want.

Tools like Claude’s computer-use sandbox, many agent orchestration frameworks, and most “secure agent” products work this way. They are not wrong to add controls. The problem is architectural. The key still lives in the execution context. The agent has it. The agent’s memory has it. Any tool call result the agent receives has potential access to it.

If something goes wrong at runtime, an attacker who compromises the agent session doesn’t need to bypass your firewall. The key is already there.

Prompt injection makes this especially bad. An attacker doesn’t need code execution. They need to get malicious content into the agent’s context, say, through a tool result or a web page the agent reads. If the key is in the environment, a sufficiently crafted injection can cause the agent to include it in an outbound call.

You can’t sandbox your way out of this. You’re trying to contain something the agent already has.

The capability model

The capability-based approach flips the architecture. The agent never receives the real credential. It receives a scoped proxy token that represents a specific capability: “you may call the Stripe API to read customer records, nothing else.”

The real key lives at the proxy boundary, not in the agent’s execution context. When the agent makes a call, the proxy intercepts it, validates that the request falls within the token’s scope, swaps in the real credential server-side, and forwards the call. The agent sees a 200 OK. It never sees a key.

The agent literally cannot exfiltrate what it never had.

This is not a new idea. It’s how capability-based security has worked in OS research for decades. The difference is that most modern web stacks were never built around it. Your AWS SDK assumes it has credentials. Your OpenAI client assumes it has a key. The default path is always “put the key in the environment.”

Phantom tokens change that default.

How phantom tokens work

When you create a phantom token in API Stronghold, you define its scope explicitly: which upstream API, which endpoints or methods are allowed, and how long the token lives. The default TTL is 24 hours.

At call time:

The agent sends a request with the phantom token in the Authorization header (same format as a real bearer token, the agent code doesn’t change).
API Stronghold receives the request and validates the token against the scope definition.
If the request falls within scope, AS swaps in the real upstream credential server-side and proxies the call.
The response goes back to the agent. The real key never appears in any response body, header, or log visible to the agent.

Every call gets written to an audit log at the boundary. You see what the agent called, when, and what the scope was at that moment. You don’t have to trust the agent’s own logs.

Short TTL matters here. If a token is compromised, the attacker has a narrow window. There’s no way to refresh it without going back to AS, which requires real authentication. The token is also non-replayable by default; once used, the nonce is consumed.

Your agent can't leak what it never had

Phantom tokens let AI agents call any API without holding the real credential. Scoped, short-lived, and bounded by scope definition. Set up in under 10 minutes.

Try API Stronghold Free See how phantom tokens work

No credit card required

What an attacker actually gets

Suppose an attacker fully compromises an agent session. They get access to the agent’s execution context, its memory, everything it has.

What do they find? A phantom token. Short-lived, scoped to one API, possibly already expired by the time they try to use it. They can attempt to replay it, but AS will reject it if the nonce is consumed or the TTL has passed. Even if they get one successful call through, they’re limited to exactly the operations the scope permits.

They cannot use it to call other APIs. They cannot use it to get the real upstream key. They cannot use it to generate new tokens. The blast radius is bounded by the scope definition you wrote when you created the token.

This is a fundamentally different security posture from “the agent has the real key but we added controls.”

Why this matters for MCP servers

Model Context Protocol (MCP) tools are worth treating separately, because they amplify the risk profile significantly.

In an MCP setup, the agent calls tools, which can return arbitrary content. A malicious or compromised tool can return a prompt injection payload in its result. If the agent holds a real API key, that payload can instruct the agent to include the key in a subsequent call. The agent is operating exactly as designed. The tool result is just data. But data can contain instructions.

With phantom tokens, there is nothing to exfiltrate. A prompt injection that tries to extract credentials gets nothing useful. The token it finds is scoped, short-lived, and bounded. The real key is not there.

This is one of the reasons phantom tokens are worth thinking about before your MCP server count grows. Each tool you add is a new surface for injection. The underlying credential architecture determines how bad a successful injection can be.

Setting this up

Three commands to get a working phantom token for an existing agent:

# 1. Create a route for your upstream API
as routes create --name stripe-customers --upstream https://api.stripe.com --allowed-paths "/v1/customers/**" --methods GET

# 2. Issue a phantom token scoped to that route
as tokens issue --route stripe-customers --ttl 24h --output env

# 3. Point your agent at the AS proxy endpoint instead of the real API
export STRIPE_BASE_URL=https://proxy.apistronghold.com
export STRIPE_API_KEY=<phantom_token_from_step_2>

Your agent code doesn’t change. The SDK doesn’t change. The base URL points at AS instead of Stripe. The phantom token goes where the real key used to go.

When the agent makes a Stripe call, AS intercepts it, validates the scope, swaps in the real Stripe key, and proxies the request. The agent gets a normal response. The real key stays on AS’s side of the boundary.

The architecture question worth asking

Before adding another prompt guard or sandbox rule, it’s worth asking: if an attacker fully compromised your agent runtime, what credential damage could they do?

If the answer is “they’d have my real API key,” the subtraction model has a ceiling. You’re managing blast radius by adding controls on top of a credential that already exists in a vulnerable context.

If the answer is “they’d have a short-lived scoped phantom token, and it’s probably already expired,” you’re working with the capability model.

The architecture matters more than the controls built on top of it.

Change the architecture, not just the controls

API Stronghold gives every agent a scoped phantom token. Real keys stay at the proxy boundary. The phantom token setup takes about ten minutes.

Start Free, No Card Required Read the production setup guide

No credit card required

AI Agents Should Never Hold Real API Keys: Use Phantom Tokens

The subtraction model (and why it fails)

The capability model

How phantom tokens work

Your agent can't leak what it never had

What an attacker actually gets

Why this matters for MCP servers

Setting this up

The architecture question worth asking

Change the architecture, not just the controls

Keep your API keys out of agent context

Get posts like this in your inbox

One vault for all your API keys

AI Agents Should Never Hold Real API Keys: Use Phantom Tokens

The subtraction model (and why it fails)

The capability model

How phantom tokens work

Your agent can't leak what it never had

What an attacker actually gets

Why this matters for MCP servers

Setting this up

The architecture question worth asking

Change the architecture, not just the controls

Keep your API keys out of agent context

Get posts like this in your inbox

Your agent has keys it doesn't need. That's the attack surface.

One vault for all your API keys