Agency Is Not Autonomy

A design distinction for AI agent systems · April 23, 2026 · Working draft

Introduction

As of April 2026, every major cloud and foundation-model vendor has shipped a production agent platform. Google released the Gemini Enterprise Agent Platform. Microsoft shipped Agent Framework 1.0. AWS released Strands Agents 1.0 with Bedrock AgentCore. Anthropic released Claude Managed Agents. OpenAI upgraded the Agents SDK with a sandboxed durable harness. Vercel shipped AI SDK 6. LangChain promoted LangGraph to the flagship offering. These systems embed autonomous decision-making into production software at scale.

The design question underneath all of them is the same: where inside a system is a decision made, and what is responsible for making it. Every other architectural choice follows from that answer.

This piece argues that the field's current vocabulary makes the question hard to answer cleanly. The word agent is doing two jobs at once. One is autonomy: the scope and authority an environment grants a participant to act on its own behalf. The other is agency: the capacity an internal mechanism has to generate an action from a principle, in a way that cannot be fully predicted from the mechanism's structure. Treating the two as one produces three concrete failures. Systems cannot be audited, because the parts that reason are not named. Systems accumulate more granted authority than the task requires, because the cost of granting it is not visible. And systems spend inference on problem surface that the environment already constrained, because the intelligence they invoke was not placed where it was needed.

The rule that follows has two halves. Grant no more autonomy than necessary. Respect the full agency of what you have granted autonomy to. Maximal capacity deployed to the minimally necessary surface.

The current state of the field

The consensus architecture across these platforms has converged on a similar shape. A leaf agent runs a loop in which a language model decides what tool to call next; the framework manages iteration, retries, streaming, and context. Composition of multiple agents uses either implicit handoffs or explicit graphs. Governance (identity, durability, audit, policy enforcement) is moving out of SDK code and into a managed runtime. MCP is the standard tool surface. A2A is the emerging cross-runtime coordination protocol.

The architecture itself is fine. The vocabulary describing it is not. The word agent is used for at least four different things across and within these frameworks: a participant in a multi-party system, a mechanism that chooses the next action, a workflow with some reasoning added at a few points, and the composite that contains any of the above. The overloading is not cosmetic. It shapes which tools the frameworks expose and which ones teams reach for.

Autonomy and agency

Two distinct properties share the name agent. One is about where a decision is allowed to be made. The other is about what makes it.

Autonomy is a property of a participant. It is the scope the environment grants: what data can be read, what side effects can be produced, what accounts can be acted under, what approvals are pre-cleared. Autonomy is external-facing. It can be specified in advance, audited, and revoked. Its natural measurement is permission: how much authority has been handed over.

Agency is a property of a mechanism. It is the capacity to generate an action from a principle in a way that is not enumerable from the mechanism's structure. Agency is internal. You cannot read it off a spec. You have to run the mechanism to see what it produces. Its natural measurement is interpretability: how much is known about the action before the mechanism generates it. Agency is, by construction, the part of the system you cannot fully predict. That is what it is for.

The two are orthogonal. A participant can be granted generous autonomy and have a rule-engine mechanism inside it that does no reasoning at all. A small, narrowly-scoped participant can have a frontier-model loop inside it doing genuine open-ended reasoning. A participant's autonomy budget and its mechanism's agency vary independently.

Figure 1. A participant has an autonomy budget, granted by its environment. A mechanism has a level of agency, determined by what it does internally. The two axes are orthogonal.

The existing literature on "least agency" reads the term through a security lens: agents should have only the permissions required for their task. That is a claim about autonomy, and it is correct as far as it goes. It leaves the agency side unaddressed. A system can have tightly scoped participants running mechanisms that are badly suited to the work those participants do: either over-capacity (a frontier model making decisions that are deterministic rules) or under-capacity (a rule engine making decisions that require judgment). The second half of the principle, respecting the agency of the mechanism you have invoked, is what the current framing does not reach.

The distinction itself is not new. Shoham's Agent-Oriented Programming (1993) separated these two concerns. Hewitt's actor model (1973) separated actor from topology. BDI (1991) separated the deliberative interior of an agent from the coordination substrate around it. The distinction went dormant during the 2010s because the mechanism side of the divide was thin; there was no mass-deployable substrate for real reasoning. LLMs make it thick again.

Workflow and reasoning

A workflow is a system whose behavior is enumerable from its structure. Every reachable path is visible at design time; given any input, you can trace the graph and know what will happen. Reasoning is different. It follows a principle (a goal, a deliberative procedure, a utility function) to generate actions that are not enumerable from the structure. You have to run the mechanism to see what it does.

Agency is the presence of reasoning. A workflow has none. A ReAct loop has it in abundance. A workflow with an LLM call at one node has reasoning at that node and none elsewhere. The composite is not a reasoning agent; it is a workflow that invokes reasoning at one point.

Figure 2. A workflow's behavior is a set of paths, all visible at design time. A reasoner's behavior is a principle producing paths that may not appear in any pre-written set.

Both are legitimate computational artifacts and they solve different problems. The tools for building them differ. Workflows want graphs, state machines, retries, schema validation. Reasoners want context, tools, memory, evaluation. When a composite is called "an agent," teams mix the two tool sets. They put deterministic schema validation inside reasoning loops, where it clips the reasoning. They put open-ended LLM calls inside workflow nodes whose contracts did not budget for unbounded output.

The spectrum is not real

The most influential current framing is Harrison Chase's: systems are more or less agentic depending on how much an LLM drives control flow. A simple router with one LLM classification step is mildly agentic. A state graph with several conditional LLM edges is moderately agentic. A pure ReAct loop is highly agentic. The framing sounds like epistemic humility. It is a category error.

Agency is not a continuous property of a system. It is a discrete property of parts. Each decision point is either made by a reasoner or by a predetermined rule. A composite has some of each. The ratio varies. The parts do not.

Figure 3. The spectrum view treats agenticness as a single scalar. The composite view counts discrete parts. No one describes a material as "0.7 metallic"; parts are metal, parts are not.

A workflow with one LLM classifier node is not a mildly agentic system. It is a workflow that invokes reasoning at one point. The workflow stays a workflow. The reasoner stays a reasoner. The composite stays a composite. If the view says there are no discrete agents, only degrees of agenticness, the constraint problem becomes impossible. You cannot point at the agents to bound them, because the framing denies they exist as nameable things.

The design rule

Two rules, one per axis.

Grant no more autonomy than necessary. Respect the full agency of what you have granted autonomy to.

The first half is the Principle of Least Authority (Dennis & Van Horn, 1966) applied to agent systems: start with zero scope, justify each expansion, enforce scope at the participant boundary. A participant's autonomy budget is what it can read, what it can write, what it can call, and what side effects it can produce. This half is what the existing "least agency" literature already describes, under a slightly different name.

The second half is the one the field misses. Agency is, by definition, the capacity to produce actions that are not enumerable from structure. A system that demands predictability from a reasoning mechanism is asking for a workflow. A system that grants reasoning-capacity to a mechanism and then clips it (through over-specified prompts, rigid output schemas used as control surfaces, or orchestration that constrains every intermediate step) is paying for reasoning and getting a workflow, badly.

The pragmatic form: maximal capacity deployed to the minimally necessary problem surface. Narrow scope, full agency inside it.

Figure 4. One decision point is genuinely open; the surrounding flow is deterministic. The open point gets a bounded agent-part with an explicit scope and a precise goal. Everything else is workflow.

Three independent arguments support the rule. None requires accepting the others.

Interpretability. You can only constrain what you can name. Discrete agent-parts give you referents: countable units with explicit scope and explicit goals. Smeared agency across a composite does not. An audit that asks "what reasons in this system" must get back a list, not a quantity.

Safety. Unbounded autonomy is the failure mode OWASP's Excessive Agency risk names. Every bounded participant is a place the system becomes inspectable. Every clipped mechanism is a place where capacity has been paid for without being used, which hides the real decisions elsewhere.

Efficiency. Intelligence deployed to problem surface the environment has already constrained is waste. A reasoning loop that could have been a workflow is an inference bill that did not need to be paid. A reasoning loop with surface too wide produces worse output than the same loop with narrower surface, because capacity spent on questions the environment already answered is capacity not spent on the open one. Teams that do not care about safety or interpretability should still prefer this rule because it lowers their unit cost per correct decision.

The three arguments converge on the same prescription. That convergence is the rule's strongest warrant.

Agency migrates

The level of agency required for a function is not fixed. A task that needed reasoning at one point in time may stabilize as the pattern of correct behavior becomes writable, at which point it can be compressed into a workflow-part. A task that was a workflow may need reasoning again when the environment shifts and the old rules begin to miss cases. The boundary between agent-part and workflow-part is not drawn once at design time; it is redrawn as certainty about the function changes.

Figure 5. Agency migrates in both directions as certainty about the function changes. Platforms should support both transitions as first-class operations.

Common law develops this way: novel disputes are judged, patterns crystallize into precedent and then statute, conditions shift and new reasoning becomes necessary again. Organizations move work between judgment and process along the same lines, and skill acquisition moves between conscious attention and habit along the same lines. No current agent framework treats the migration between agent-part and workflow-part as a first-class operation. The closest approximations are runtime observability plus manual refactor, which means the migration costs an engineering project every time the system should have supported it with a tool.

Consequences

Six consequences follow from the two-part rule.

Default is workflow; both autonomy and agency are justified case by case. Start with zero scope and no reasoning loops. Add each only where the deterministic alternative cannot do the job.
Every agent-part has a named autonomy scope and a precise agency goal. "Handle refund requests" fails on both axes. "Given these facts, choose one of {approve, deny, escalate}" passes: the scope is fixed (read these facts, emit one of these actions) and the goal the reasoner is optimizing for is writable.
Agent-parts are swappable at the mechanism layer. Claude, GPT-5, a human, a small classifier: any of them fits the same slot. Swapping the mechanism does not reconstitute the participant. A system that confuses the two layers treats the swap as a rewrite.
Audit is an enumeration. Hand the auditor a numbered list of agent-parts, each with scope (autonomy granted) and goal (agency invoked). "The system is agentic to some degree" is not a specification.
Governance attaches to autonomy, not to agency. Policy and approval gates belong at participant boundaries, where inputs arrive and actions emit. They do not belong distributed through the reasoning process. Clipping reasoning is how you pay for capacity you do not use.
Parts migrate over time. When an agent-part's outputs stabilize, extract the workflow-part and retire the reasoning there. When a workflow-part begins missing cases, promote it to an agent-part. Both transitions should be cheap.

Current frameworks measured against the rule

Frameworks vary in whether they separate the two axes cleanly. The dividing question is whether a participant's autonomy scope and a mechanism's reasoning are represented as distinct things the developer can see, or whether one is hidden inside the other.

Claude Agent SDK, OpenAI Agents SDK, AWS Strands, Vercel AI SDK 6, and Pydantic AI take a similar approach. Each participant is built around a single reasoning loop: an LLM calls tools, the framework handles iteration and context. Autonomy is specified declaratively: the developer lists which tools the participant can call, and the tool signatures define what the reasoning is allowed to touch. Reasoning runs inside that scope. These frameworks keep autonomy and agency as separate concerns at the level of a single participant. None of them treats migration between reasoning and workflow as a first-class operation; if a reasoning loop stabilizes into a deterministic pattern, extracting that pattern into a workflow-part is a refactor the developer does by hand.

Google ADK 2.0, Microsoft Agent Framework 1.0, and Mastra combine the same single-participant loop with an explicit graph for composing multiple participants. The graph expresses deterministic orchestration cleanly. The problem is that the graph also accepts nodes that run reasoning loops, and the framework does not distinguish those from deterministic nodes. A developer reading a graph cannot tell which nodes reason and which do not without opening each one.

LangGraph is the clearest example of the confusion. StateGraph is the single abstraction for both workflow and agent construction. A node can be a pure function, or it can be an LLM call, and the type signature is the same either way. A developer swapping one for the other does not change the graph's shape. The consequence: the graph carries no structural record of which nodes reason. Finding the agent-parts of a LangGraph system requires reading the source of every node. Scoping autonomy correctly requires first finding where the agency lives; the framework has made those the same task. The community calls the compiled graph "the agent," which is accurate only if agency is a property of systems rather than parts. It is not.

Copilot Studio and the broader low-code tier are deterministic throughout. There are no reasoning loops inside nodes. These tools are labeled as workflow infrastructure, and that is accurate. They are not confused; they are a different product category.

Hard cases

The rule reads stronger than it is on clean examples. Two classes of system stress it.

Deep research agents. Elicit, Open Deep Research, and the autonomous research tiers shipped by frontier labs operate under open-ended charters: "survey the literature on X," "find relevant prior art for Y." The reasoning path through the information is not enumerable at design time. Does the rule demand a narrow autonomy scope for something this open?

The rule still applies; it moves one level up. The autonomy scope is not the reasoning path; it is the boundary around the task. What sources can be accessed. What side effects are permitted. Whether the participant can make purchases, execute code against live systems, send communications on the user's behalf. Inside that scope, agency is unconstrained, and should be. The typical failure mode is systems that grant broad autonomy (full internet access, arbitrary tool use, write access to the user's environment) because the task feels open-ended, when the only thing that needed to be open was the reasoning path through read-only sources. Bounded autonomy, full agency. The rule is not about making the reasoning predictable; it is about making the reasoning's consequences bounded.

Autonomous red-team. A red-team agent's whole purpose is to find attacks its defenders did not anticipate. The operational goal is writable ("given this target, produce candidate vulnerabilities with evidence") even when the space of vulnerabilities is not. What needs constraint is autonomy: the agent operates against a known target, in a sandboxed environment, producing reports rather than exploits against production. Agency itself has to be respected for the work to be worth doing. A red-team whose reasoning is clipped (told which attack classes to ignore, which assumptions to preserve, which lines of inquiry are off-limits) is one whose blind spots have been declared in advance. Those declared blind spots become the failure surface; attackers do not honor them. Systems fail here in two ways: by granting red-team agents production autonomy (the autonomy failure), or by clipping their agency in the name of safety (the agency failure). Both produce worse security, for different reasons.

The rule's honest limit: if a task's operational goal cannot be written either as a direct specification or as an evaluatable condition (not "find X" or "maximize Y" but something that only becomes specifiable after the system has run for a while), then the right move is research, not product. You are not yet building an agent-part; you are exploring whether one can exist. Treat the work as exploration until the goal stabilizes, then apply the rule.

Conclusion

Two confusions, one word. Autonomy and agency are different properties measured on different axes. Participant and mechanism are different things. Treating them as one collapses the design surface into a single abstraction, which is why systems built under that abstraction are hard to audit, accumulate more granted authority than their tasks require, and waste intelligence on problem surface the environment already constrained.

The discipline is two rules, not one. Grant no more autonomy than necessary. Respect the full agency of what you have granted autonomy to. Three independent arguments (interpretability, safety, efficiency) converge on the same prescription. A team that cares about only one of the three should still prefer this rule, because the argument for it holds without the others. That convergence is what makes the discipline worth taking seriously.

Sources

Shoham, Y. "Agent-Oriented Programming." Artificial Intelligence 60 (1993).
Hewitt, C., Bishop, P., Steiger, R. "A Universal Modular Actor Formalism for Artificial Intelligence." IJCAI (1973).
Rao, A. S., Georgeff, M. P. "Modeling Rational Agents within a BDI-Architecture." (1991).
Dennis, J., Van Horn, E. "Programming Semantics for Multiprogrammed Computations." Communications of the ACM (1966).
Miller, M. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. (2006).
Evans, E. Domain-Driven Design. (2003).
Dennett, D. The Intentional Stance. (1987).
Chase, H. "What is an AI agent?" LangChain blog.
Chase, H. "Not Another Workflow Builder." LangChain blog.
AWS Well-Architected Generative AI Lens: least privilege and permissions boundaries for agentic workflows.
"Introducing Gemini Enterprise Agent Platform." Google Cloud Blog, April 2026.
"Microsoft Agent Framework 1.0." Microsoft Foundry Blog, April 2026.
"Introducing Strands Agents 1.0." AWS Open Source Blog.
"AI SDK 6." Vercel.
"Building agents with the Claude Agent SDK." Anthropic.