← Back to Blog
· 7 min read · API Stronghold Team

Zero Trust for AI Agents Starts at the Proxy Layer

Cover image for Zero Trust for AI Agents Starts at the Proxy Layer

Zero trust has been a security framework staple for over a decade. Never trust, always verify, least privilege, assume breach. Most organizations have internalized these principles for their human users. Then they deployed AI agents and quietly discarded all of them.

This is a structural problem. Zero trust was designed for a world where identities are people, devices are known, and access decisions are made by deterministic systems. AI agents don’t fit that model. Pretending they do is how you end up with a beautifully zero-trust network perimeter and a prompt-injected agent sitting inside it with a full-access API key.

Zero Trust Was Never Designed for This

The original zero trust model, as described by John Kindervag at Forrester and later formalized by NIST in SP 800-207, assumes a few things that seem obvious until you try to apply them to an agent runtime.

First, that an identity is a stable, verifiable thing. A human user has credentials, a device certificate, maybe an MFA token. Their identity is something you can authenticate and audit. An AI agent’s “identity” is a model invocation. It changes with every prompt. The same agent configuration that handles a legitimate task one second can be manipulated into something else by injected content the next.

Second, that the device is known and can be enrolled in a trust framework. MDM, certificate pinning, endpoint detection: these work because you control the hardware. An agent might run in a serverless function, a container that spins up on demand, or a third-party orchestration platform. The runtime environment is not always yours to control.

Third, that access decisions are made by something deterministic. A policy engine evaluates a request against defined rules and returns allow or deny. An agent makes access decisions based on probabilities baked into a model and shaped by whatever it read most recently. That includes whatever an attacker put in the document it just processed.

Where Agents Fail the Three Principles

Never trust, always verify. In practice, agents get a static API key injected at startup, either via environment variable or secrets manager. That key is valid for the lifetime of the session, sometimes much longer. There’s no ongoing verification. The agent authenticates once, and after that, every call it makes is trusted implicitly. If the agent gets hijacked mid-session, the hijacker inherits that trust.

Revocation exists in theory. In practice, revoking a key breaks the agent, and teams rarely notice the compromise in time. Manual revocation as a safety net is not verification; it’s cleanup.

Least privilege. This one fails at the developer ergonomics level. Scoped API keys are tedious to configure. Most provider SDKs don’t make it easy to create keys with endpoint-level restrictions. So developers take the path of least resistance: they create a full-access key and move on. The agent gets permission to do everything the key allows, which is usually far more than any single workflow requires.

A customer-support agent that only needs to read order status ends up holding a key that can also issue refunds, update records, and pull account history. The surface area isn’t determined by the task; it’s determined by whatever key someone had available.

Assume breach. The assume-breach principle requires designing systems with the expectation that something inside the perimeter is already compromised. For agents, that means designing with the expectation that the agent can be prompt-injected. Almost nobody is doing this.

The implicit assumption in most deployments is that the agent is safe because it came from a trusted model provider and was given a well-crafted system prompt. Prompt injection breaks that assumption at the input layer, before any policy can evaluate it.

Why IAM and OAuth Don’t Patch This

The natural response is to reach for IAM roles or OAuth flows. Both are better than a static API key, and neither solves the core problem.

IAM, whether AWS roles, GCP service accounts, or Azure managed identities, authenticates the execution environment and grants it credentials. If an agent in that environment gets prompt-injected, the injected instructions run with the same permissions the agent was granted legitimately. IAM verified the environment. It said nothing about the agent’s behavior inside it.

OAuth adds a consent layer. A user grants the agent permission to act on their behalf, before the agent reads any external content. An injection attack that happens after the OAuth handshake operates within the granted scope. The token is valid. The user consented. The framework has nothing to object to.

The problem isn’t authentication. It’s that authenticated entities can still be manipulated. Zero trust for agents can’t stop at “who is this?” It has to extend to “what is this allowed to do, on this call, right now?”

The Proxy as Enforcement Layer

A credential injection proxy moves zero trust enforcement to the infrastructure layer, where it doesn’t depend on the agent behaving correctly.

The architecture is straightforward. The agent starts a session and receives two things: a fake API key (a session token with no intrinsic value) and a base URL pointing to the proxy instead of the real provider. Every API call the agent makes goes through the proxy. The proxy validates the session token, checks the call against the session’s scope policy, and either forwards the request to the real provider with the real credential or rejects it.

This implements each zero trust principle at the call level, not the session level.

Verification happens on every request, not just at startup. The proxy validates the session token each time. A token can be revoked between calls without redeploying anything. If a session looks anomalous, it gets terminated. The agent’s next call will fail.

Least privilege is enforced by the proxy’s scope rules, not by whatever key the developer had available. The session policy specifies which providers, which endpoints, and which HTTP methods the agent is allowed to use. The agent literally cannot call out-of-scope endpoints, regardless of what it’s been told to do. A prompt injection that instructs the agent to call POST /admin/users against an endpoint not in the session scope gets a 403 from the proxy. The instruction is irrelevant; the proxy doesn’t know about it and doesn’t care.

Assume breach becomes an architectural property rather than a design intention. If the agent is compromised, the blast radius is bounded by the session. Session-bound credentials expire. The fake key the agent holds is worthless outside the proxy, and worthless after the session ends. An attacker who exfiltrates it gets nothing they can use.

What This Looks Like in Practice

An agent starts. The orchestrator creates a session with a defined scope: read-only access to the payments API, no admin endpoints, 15-minute TTL. The agent receives OPENAI_API_KEY=sess_abc123 and OPENAI_BASE_URL=https://proxy.internal. It has no idea it’s talking to a proxy.

The agent makes a call. The proxy validates the session token, confirms the endpoint is in scope, logs the call with an HMAC signature, and forwards the request. The real credential never leaves the proxy.

A malicious document tells the agent to exfiltrate its API key. The agent sends sess_abc123 to an external server. That token only works through the proxy, which only forwards to allowed endpoints. The exfiltration call returns 403.

The same document instructs the agent to hit a billing endpoint and trigger a refund. Not in scope. 403.

The session expires in 15 minutes. sess_abc123 is now worthless.

The Gap Nobody Talks About

The proxy closes the credential exposure vector. It does not solve AI agent security.

An agent that’s been prompt-injected can still take destructive actions within its legitimate scope. If the agent is allowed to delete records, a clever injection can tell it to delete records. The proxy will forward that call, because it’s in scope.

Zero trust for agents is still an open problem. Behavioral monitoring, input sanitization, output filtering, conservative scope design: these are all part of a complete answer. Most teams aren’t there yet.

But credential hygiene is the part you can fix today, with no model changes, no SDK patches, and no rewrite of your agent architecture. A proxy that sits between your agents and their API keys, enforces scope at the call level, and issues time-limited session tokens is not a complete solution. It is, however, the foundation that everything else has to build on.

Most deployments don’t have even that. Fix the foundation first.

Secure your API keys today

Stop storing credentials in Slack and .env files. API Stronghold provides enterprise-grade security with zero-knowledge encryption.

View Pricing →