Skip to content
clusters: prooflayer · edgemarket · edgefinance · synthforge · mediakit · wordmint · webprobe · locale · comppoint
$ man speaker-diarize

/speaker-diarize(1)

agentutility / mediakit / speaker-diarize
PRICE / CALL
$0.10
USDC · base mainnet · scheme: exact
METHOD
POST
CLUSTER
mediakit
CATEGORY
ai
STATUS
live
NAME
speaker-diarize speaker diarization / who-said-what transcription
SYNOPSIS
POST https://x402.org/v1/speaker-diarize
     Content-Type: application/json
     X-PAYMENT:    <signed-transferWithAuthorization>

     { ... }
↳ first call → 402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.
DESCRIPTION

Speaker diarization / who-said-what transcription. Whisper v3 + speaker labels. Returns utterances grouped by speaker, plus per-speaker stats (count, seconds, words). 60 min max.

INPUTrequest schema
propertytypedescriptionreq?
media_urlstringPublic URL to the audio or video file to transcribe and diarize (mp3, mp4, wav, m4a, etc., up to 60 min).required
languagestringOptional ISO-639-1 language hint (e.g. 'en', 'es'); omit to auto-detect from the audio.optional
num_speakersnumber1-20 hint. Auto-detected if omitted.optional
OUTPUTresponse shape
fieldtypedescription
textstringFull transcript as a single string with all speaker turns concatenated in chronological order.
utterancesarrayArray of speaker turns, each with speaker label, start/end timestamps, and the spoken text.
speaker_countnumberNumber of distinct speakers detected in the audio.
speaker_statsarrayPer-speaker rollup with speaker label, utterance count, total seconds spoken, and word count.
duration_secondsnumberTotal length of the input media in seconds.
detected_languagesarrayArray of ISO-639-1 language codes Whisper detected across the audio.
source_urlstringEcho of the media_url that was transcribed, for request/response correlation.
EXAMPLEStwo ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.org/v1/speaker-diarize \
  -H 'Content-Type: application/json' \
  -d '{ }'
first response = 402 Payment Required with payment requirements; sign + retry with X-PAYMENT.
EXAMPLE 2 · mcp
# install once
claude mcp add x402 --command "npx x402-deployer-mcp"

# then ask Claude Code:
# "use the speaker-diarize tool to ..."
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
tags
transcriptiondiarizationspeakerswhisperpodcastmeeting
env
FAL_KEY_TRANSCRIBE
methods
POST
cluster
mediakit
price
$0.10 USDC per call
ADJACENTother endpoints in mediakit
endpointdescriptionprice
extract-tablesExtract tables from PDF / table extractor / PDF to CSV / spreadsheet from PDF.$0.10
mp4-to-mp3MP4 → MP3 audio extractor.$0.10
pdf-extract-tablesPDF table extractor / table from PDF / scanned-table parsing / financial-table OCR / multi-page table consolidator / Datalab Marker tables.$0.10
pdf-to-jpgPDF to JPG / PNG / WEBP image converter.$0.10
transcribeVideo / audio transcription via Whisper v3.$0.10
upscale-imageAI image upscaler / super-resolution / image enlarger.$0.10
video-summarizeVideo summarizer / podcast summarizer / lecture notes generator.$0.10
video-to-audioVideo → audio extractor / video to audio converter.$0.10
SEE ALSO
agentutility(7) · mediakit(7) · x402(7) · mcp(7) · llms.txt · registry.json · bazaar.x402.org