The shape of the menu
Synthforge has three SKUs per modality. Image, video, audio. Each comes in draft, standard, and ultra, which gives the catalog nine endpoints with prices that step from one cent to thirty cents per call depending on which slot you hit.
The split isn't cosmetic. Draft runs on the smallest checkpoint we host with no upscaler and no second pass. Ultra runs the biggest model we have access to, with a refiner pass on top and (for video) frame interpolation. Standard sits in the middle: same base model as ultra, no post-processing.
Why ultra costs $0.30
Inference cost. The ultra image model is a 12B-param diffusion checkpoint that needs ~14s on an H100. The refiner adds another 3s. Per-call GPU time alone is around $0.11 at on-demand rates. Add storage egress, the CDP facilitator skim, and an honest margin, and $0.30 is what's left.
Video ultra is steeper underneath but we cap it at the same $0.30 because demand drops off a cliff above that. Audio ultra is overpriced relative to its compute cost, which subsidizes a generous draft tier.
Why draft is a penny
Two reasons. One, draft mode shouldn't lose money. The model is small enough (~1.5B for image, ~800M for audio) that on a warm GPU we're at sub-cent economics per call. Two, agents iterate. An agent picking a thumbnail for a blog post might run twelve drafts before promoting one to standard. At a penny each, that loop costs less than a dime. At thirty cents each, it costs $3.60 and nobody would do it.
So draft exists to make iteration cheap enough that agents actually iterate.
Standard is the default
If an agent calls Synthforge without specifying a tier, it gets standard. $0.05 for image, $0.08 for audio, $0.12 for video. Same base model as ultra, no refiner, no interpolation. About 80% of paid calls today land here.
We picked this default because most agent workflows want one acceptable output, not the absolute best. A summarizer bot generating a header image for a daily newsletter doesn't need refiner-pass detail. Standard gets it something good enough on the first try, which is the cheapest outcome that doesn't suck.
When to pick which
Rough rules:
- Draft when you're iterating on prompts, A/B testing variations, generating placeholders, or burning through ideas before committing. Anything where the output will be regenerated or compared against alternatives.
- Standard when you need one good output and you're not going to inspect it by hand. Default for autonomous flows.
- Ultra when a human will see it and you've already validated the prompt at draft or standard. Hero images, podcast intros, final demo videos, anything customer-facing.
A pattern we see a lot: agent runs three drafts to converge on a prompt ($0.03), then one ultra to render the final ($0.30). Total spend $0.33 for one keeper, versus four ultras at $1.20 with no prompt validation and no signal about which prompt was worth the spend.
What the curve actually looks like
Pulled from 14,212 calls over the last 30 days, scored by CLIP similarity against the prompt:
tier median CLIP score p90 latency draft 0.247 3.1s standard 0.281 6.8s ultra 0.294 21.4s
The jump from draft to standard is big on score. The jump from standard to ultra is small on score and large on latency. So if your agent is latency-sensitive (real-time chat, live demo), standard is usually the right call even when budget allows ultra.
Three tiers because two felt cramped and four felt like overkill.