Skip to content
clusters: prooflayer · edgemarket · edgefinance · synthforge · mediakit · wordmint · webprobe · locale · comppoint
$ man video-to-text

/video-to-text(1)

agentutility / mediakit / video-to-text
PRICE / CALL
$0.10
USDC · base mainnet · scheme: exact
METHOD
POST
CLUSTER
mediakit
CATEGORY
ai
STATUS
live
NAME
video-to-text video transcription / audio transcription via whisper v3 large
SYNOPSIS
POST https://x402.org/v1/video-to-text
     Content-Type: application/json
     X-PAYMENT:    <signed-transferWithAuthorization>

     { ... }
↳ first call → 402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.
DESCRIPTION

Video transcription / audio transcription via Whisper v3 large. Auto-detects 90+ languages. Translate-to-English mode. Speaker diarization optional. 60 min max.

INPUTrequest schema
propertytypedescriptionreq?
media_urlstringPublic URL of an audio or video file. Supports mp3, mp4, mpeg, mpga, m4a, wav, webm. Max 60 minutes.required
languagestringOptional ISO language code (e.g. 'en', 'fr', 'es'). If omitted, auto-detected.optional
taskstringEither 'transcribe' (default) or 'translate' (translates to English).
enum: transcribe · translate
optional
diarizebooleanWhether to identify different speakers. Default false.optional
OUTPUTresponse shape
fieldtypedescription
textstringFull transcript text
chunksarrayTime-segmented chunks with timestamps
detected_languagesarrayLanguages auto-detected in the audio
duration_secondsnumberSource media duration in seconds
taskstringEcho of the task performed
source_urlstringEcho of the input URL
EXAMPLEStwo ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.org/v1/video-to-text \
  -H 'Content-Type: application/json' \
  -d '{ }'
first response = 402 Payment Required with payment requirements; sign + retry with X-PAYMENT.
EXAMPLE 2 · mcp
# install once
claude mcp add x402 --command "npx x402-deployer-mcp"

# then ask Claude Code:
# "use the video-to-text tool to ..."
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
tags
transcriptionwhispervideoaudiosubtitles
methods
POST
cluster
mediakit
price
$0.10 USDC per call
ADJACENTother endpoints in mediakit
endpointdescriptionprice
extract-tablesExtract tables from PDF / table extractor / PDF to CSV / spreadsheet from PDF.$0.10
mp4-to-mp3MP4 → MP3 audio extractor.$0.10
pdf-extract-tablesPDF table extractor / table from PDF / scanned-table parsing / financial-table OCR / multi-page table consolidator / Datalab Marker tables.$0.10
pdf-to-jpgPDF to JPG / PNG / WEBP image converter.$0.10
speaker-diarizeSpeaker diarization / who-said-what transcription.$0.10
transcribeVideo / audio transcription via Whisper v3.$0.10
upscale-imageAI image upscaler / super-resolution / image enlarger.$0.10
video-summarizeVideo summarizer / podcast summarizer / lecture notes generator.$0.10
SEE ALSO
agentutility(7) · mediakit(7) · x402(7) · mcp(7) · llms.txt · registry.json · bazaar.x402.org