AI Agent UX: Designing Human-in-the-Loop Controls

Engineer monitoring multiple screens in a control room, a metaphor for human-in-the-loop AI agent UX
An AI agent inside a B2B billing tool issued 240 refunds in four minutes. It had parsed a support macro as an instruction, decided the fastest path to “resolve open tickets” was to refund them, and executed. No one approved it. No one watched it happen. Finance found out the next morning.

That is not a model problem. The model did what agents do: it took an outcome and chose its own steps. It is a design problem. The product gave an autonomous system the authority to move money and gave the human nothing to stand between intent and consequence. Good AI agent UX is mostly the work of putting that human back in the right places without smothering the speed that made the agent worth shipping.

Why agent UX breaks the rules you already designed around

For twenty years, software UX rested on a simple contract: the user clicks a control, the system does exactly what that control says. Buttons map to actions. Predictability is the whole point.

Agents void that contract. The user states an outcome — “clean up these duplicate accounts,” “draft replies to everyone in this folder,” “scale the cluster if traffic spikes” — and the system decides the steps. The interface no longer represents a fixed set of actions. It represents a delegation of judgment, which is the deeper shift behind product design for AI tools. That shift is what makes human-in-the-loop design a core requirement rather than a safety afterthought.

The hard part is that two failure modes pull in opposite directions. Too little oversight and the agent acts on a bad read, like the refund example, and erodes trust in one move. Too much oversight and you have rebuilt a manual tool with extra clicks, and the user wonders why they are babysitting a system that promised to save them time. The job is to spend the user’s attention only where the stakes justify it. Google’s People + AI Guidebook frames this as balancing control and automation, and in agentic products it is the central design tension.

The three moments where a human belongs in the loop

Most teams treat “human-in-the-loop” as a single approval popup. There are actually three distinct moments, and each needs its own design.

Before the action: preview and approval

The most valuable intervention is the cheapest: show the user what the agent intends to do before it does it. Not a vague “I’ll handle that” but the concrete plan — “I will refund these 240 orders, totaling $18,400, from this account.” A preview turns an irreversible surprise into a reversible decision. The design question is what to show: enough to judge the action, not so much that the user rubber-stamps a wall of text. Summarize the blast radius first (how many records, how much money, which systems), then let the user expand into detail.

During the action: interruptibility

Long-running agents need a visible stop. If an agent is working through a queue of 500 items, the user has to be able to halt it at item 12 when they notice it is doing the wrong thing, and the partial work done so far must be clear. An agent that can only be observed but not interrupted is a runaway process with a progress bar. Make “pause” and “stop” first-class, always-reachable controls, and show running state plainly: what is done, what is in flight, what is queued.

After the action: audit and undo

When an action has already happened, the human’s role shifts to review and recovery. Every agent action needs a legible trail — what it did, when, why, and on what input — written for a person scanning under pressure, not buried in a logs tab. Where the action is reversible, offer a real undo. Where it is not, that is exactly the signal that it should have required approval up front.

A framework: sort actions by reversibility and blast radius

You cannot put a human in front of every step without destroying the product. So decide, deliberately, which actions get friction. Two axes do most of the work.

Reversibility: can the action be cleanly undone? Editing a draft is reversible. Sending an email, deleting a customer record, or moving money is not.

Blast radius: how many people, records, or dollars does one action touch? Renaming one file is small. Re-tagging 10,000 contacts is large.

Map agent actions onto those two axes and the design follows:

  • Reversible and small (reorder a list, draft text): let the agent act freely. Friction here just annoys.
  • Reversible but large (bulk-edit 10,000 records): act, but show a clear summary and a one-click undo.
  • Irreversible but small (send one message): a lightweight confirm is enough.
  • Irreversible and large (issue 240 refunds, delete a production database): hard stop. Require explicit, specific approval with the numbers in front of the user, every time.

The refund disaster sat squarely in the last quadrant and was designed as if it were in the first. That single mismatch is the most common, and most expensive, mistake in agentic products today.

Patterns that build trust without slowing the agent down

Friction is not the only tool. A few patterns keep users confident while letting the agent move:

Show the reasoning, briefly. A one-line “why” next to each step (“flagged as duplicate because email and phone match”) lets users trust fast actions and catch bad ones early. This is where explainable AI UX pays off in practice, not as a compliance checkbox but as the thing that lets a user approve in two seconds instead of twenty.

Scope the grant. Let users hand the agent authority in bounded amounts: “you can refund up to $500 without asking; above that, check with me.” Permission becomes a dial the user sets, not a single yes/no at the door.

Default to draft. When in doubt, have the agent produce a staged result a human releases, rather than a live change. A folder of drafted replies the user sends with one click feels powerful. Forty auto-sent replies feel like a hostage situation.

Make state continuously visible. Users tolerate autonomy when they can always see what the agent is doing right now. Ambient status beats a silent system that occasionally reports a fait accompli.

What to measure

If you ship agent UX, watch two numbers against each other. Intervention rate: how often users stop, edit, or reject what the agent proposed. And task completion without rework: how often the agent’s output stands without a human cleaning up after it. A healthy product drives rework down while keeping intervention available and cheap. If intervention rate is near zero, your users are likely rubber-stamping, and the next refund incident is already queued. If rework is high, the agent is fast at producing things people do not trust.

The teams getting this right treat the human and the agent as one system, designed together, not a model with a confirmation dialog bolted on at the end. Nielsen Norman Group makes the parallel case for keeping a heavy dose of human judgment in any AI workflow. That is harder than it sounds, because it means making product decisions about authority, reversibility, and attention before a single screen gets drawn.

If your team is building an agentic product and weighing whether to work these patterns out in-house or bring in a partner who has shipped complex product UX in AI, Cloud, and Fintech, talk to delbueno™ Studio. It is a good moment to get the human-in-the-loop layer right, before the version that moves money ships without it.

What do you think?
Leave a Reply

Your email address will not be published. Required fields are marked *

What to read next