50 endpoints, one cluster
We launched the Mediakit cluster with 50 endpoints live on Bazaar. PDF, image, audio, video, OCR, office docs, watermarking. Every endpoint is its own URL, its own price, its own input schema, its own listing. Total catalog cost if you ran each one once: $2.807.
The obvious question: why 50 endpoints instead of one /convert that takes a from and to parameter?
Why per-format beats a generic converter
A single /convert endpoint sounds clean until you write the docs.
PDF compression takes a target size or quality level. OCR takes a language hint and a layout mode. Audio loudnorm takes a target LUFS. Video trim takes start and end timestamps. You can't pack that into one input schema without it turning into a discriminated union with 50 branches, and at that point you already have 50 endpoints. You just hid them behind a router.
But agents don't search for /convert. They search the Bazaar for "pdf to markdown" or "extract tables from pdf". Each of those queries should hit a listing with its own price, listing description, and usage stats, plus its own slug. Bundle them and you lose every one of those search hits.
There's also the pricing problem. pdf-to-text is $0.20 because we're paying upstream for OCR fallback on scanned docs. json-yaml is $0.002 because it's twenty lines of pure parsing. A single endpoint with a format flag has to either charge the max ($0.20 for a YAML conversion) or eat the cost on OCR jobs. Neither works.
The 50, grouped
PDF gets 17 endpoints. pdf-to-text, pdf-to-markdown, pdf-to-jpg, pdf-extract-tables, pdf-split, pdf-merge, pdf-compress, pdf-watermark, office-to-pdf, html-to-pdf, plus aliased names like pdf2md and compress-pdf that route to the same workers (we kept both naming conventions because agents and human devs guess differently).
Image: 6 endpoints. image-convert (PNG/JPG/WEBP/AVIF), image-upscale at $0.02, image-translate for in-image OCR + replace, logo-detect.
Audio + video: 14 endpoints. audio-transcribe at $0.01 per minute, speaker-diarize at $0.10, video-trim, video-thumbnail, youtube-transcript at $0.01, subtitles, mp4-to-mp3.
Office + data: 10 endpoints. excel-to-csv, xlsx-to-csv, csv-to-jsonl, csv-to-ics, json-yaml, xml-to-word, html-to-markdown.
OCR + receipts: ocr at $0.20, receipt-ocr at $0.01, receipt-parser at $0.01. The receipt endpoints are tuned models, not general OCR with a prompt. Different price and output schema. Different endpoint.
Calling one
Same x402 shape as the rest of the portfolio. 402 on the first call, then settle and retry.
curl -X POST https://mediakit.agentutility.dev/pdf-to-markdown \ -H "X-PAYMENT: $PAYMENT_HEADER" \ -F "file=@invoice.pdf"
The 402 response includes the price ($0.20), the network (base), and the asset (USDC). Pay it with any x402 client, retry with the X-PAYMENT header, get markdown back.
Cost breakdown
If you ran every Mediakit endpoint once today, you'd spend $2.807. Cheapest call: json-yaml at $0.002. Most expensive: pdf-to-markdown, pdf-to-text, pdf2md, convert-pdf, and ocr, all at $0.20. Average price across the 50: ~$0.056.
For comparison, CloudConvert charges $0.008 per "conversion minute" with a $9/month minimum. Sejda's PDF API starts at $7.50/month for 50 calls. We have no minimum, no monthly. Pay per call. Settle on Base in seconds.
What's next
OCR languages beyond English. A /diff-pdf endpoint for invoice reconciliation. A batched transcribe that takes 10+ files in one settled payment so you save the per-call overhead. If you have a format the catalog doesn't cover, file an issue against the agentutility repo or ping the cluster on the Bazaar.