The price gap is real

An agent that calls unicode-normalize pays $0.001. The same agent calling image-generate-ultra pays $0.30. That's a 300x spread, and the catalog has worse outliers. A naive agent that just retries on failure can burn through a $5 budget in 17 calls if it picks badly.

So you need rules. Not a single global cap. Layered caps.

Per-task cap

Every task an agent picks up should carry a budget. Not in tokens. In dollars.

The cap should match what the task is worth. Classifying a string of user input? $0.005 ceiling. Generating a finished poster image? $0.50 ceiling. The agent reads the cap before it picks an endpoint, and any tool call that exceeds it gets refused at the planner, not after the wallet has already signed a 402 quote.

type Task = {
  intent: string;
  maxSpendUsd: number;
};

function canAfford(endpoint: Endpoint, remaining: number) {
  return endpoint.priceUsd <= remaining;
}

Two lines. Most agent frameworks don't enforce this, which is why they burn money.

Per-session cap

Above the task is the session. A user pays for an agent run; that run has a total wallet allowance. When 80% is spent, the agent should refuse new optional work and finish only what the user explicitly asked for.

Hard numbers from our own dogfood logs: a research agent making 40 calls averages $0.04 per call, but the 95th percentile run hits $0.18 per call because one or two image or video endpoints sneak in. Without a session cap, that long tail eats lunch.

class Session {
  spent = 0;
  cap = 2.00;

  authorize(priceUsd: number, required: boolean) {
    if (this.spent + priceUsd > this.cap) {
      if (required) throw new BudgetExceeded();
      return false;
    }
    this.spent += priceUsd;
    return true;
  }
}

Notice the required flag. Optional calls get refused early. Required calls throw, and the agent has to renegotiate scope with the user.

Fallback chains

Now the fun part. Most tasks have a quality knob. Translation can run cheap or expensive. Image generation has draft and ultra tiers. Geo lookups can hit a coarse endpoint first and fall through to a precise one if confidence is low.

Build the chain explicitly. Cheapest first. Stop when the result clears a confidence bar.

const translateChain = [
  { endpoint: "translate-fast",  priceUsd: 0.002 },
  { endpoint: "translate-pro",   priceUsd: 0.02 },
  { endpoint: "translate-ultra", priceUsd: 0.15 },
];

async function translate(text: string, session: Session) {
  for (const step of translateChain) {
    if (!session.authorize(step.priceUsd, false)) continue;
    const r = await call402(step.endpoint, { text });
    if (r.confidence > 0.9) return r;
  }
  throw new NoAcceptableResult();
}

The cheap endpoint clears confidence maybe 70% of the time. You only pay for the expensive one when you have to. Blended cost across 1000 calls? About $0.008 each, not $0.15.

What not to do

Don't pick endpoints by name match. An agent that sees "image-generate" and grabs the first one in the catalog will end up on the ultra tier half the time. Sort by price ascending when intent matches multiple endpoints. Make the agent earn the right to spend more.

Don't trust user-supplied caps without sanity bounds. If a user says "spend up to $1000 on this", that's almost always a typo or a prompt injection. Cap the cap.

And log every authorize() call with intent, endpoint, and price. When a user asks why the bill hit $4.20, you want an answer in seconds, not hours grepping through tool-use traces.