Skip to content
clusters: prooflayer · edgemarket · edgefinance · synthforge · mediakit · wordmint · webprobe · locale · comppoint
$ man pdf-extract-tables

/pdf-extract-tables(1)

agentutility / mediakit / pdf-extract-tables
PRICE / CALL
$0.10
USDC · base mainnet · scheme: exact
METHOD
POST
CLUSTER
mediakit
CATEGORY
uncategorized
STATUS
live
NAME
pdf-extract-tables pdf table extractor / table from pdf / scanned-table parsing / financial-table ocr / multi-page table consolidator / datalab marker tables
SYNOPSIS
POST https://x402.org/v1/pdf-extract-tables
     Content-Type: application/json
     X-PAYMENT:    <signed-transferWithAuthorization>

     { ... }
↳ first call → 402 Payment Required. Sign USDCtransferWithAuthorization, retry with theX-PAYMENT header.
DESCRIPTION

PDF table extractor / table from PDF / scanned-table parsing / financial-table OCR / multi-page table consolidator / Datalab Marker tables. AI + OCR pipeline that finds every table in a PDF (digital or scanned) and returns row × column text matrices, page-by-page. Optional cell bounding boxes for downstream layout reconstruction. Optional page_range filter ('1-5', '3', '1,3,5'). Handles merged headers, multi-page financial statements, balance sheets, lab results, scanned reports. 30 pages max. Sibling of pdf-to-markdown using the same Datalab backend, but pre-parsed to tables only.

INPUTrequest schema
propertytypedescriptionreq?
pdf_urlstringPublic URL of a PDF file (http or https). Must be directly fetchable, not behind auth or a viewer redirect. Max 30 pages.required
page_rangestringOptional 1-indexed page filter applied after extraction. Accepts ranges, single pages, or comma-lists: '1-5', '3', '1,3,5'. Default: all pages.optional
OUTPUTresponse shape
fieldtypedescription
source_urlstringEchoes back the PDF URL that was extracted, for traceability.
page_countstringTotal number of pages in the input PDF (capped at 30).
tablesstringArray of detected tables with page number, row × column text matrix, and optional cell bounding boxes.
sourcestringBackend identifier, typically 'datalab-marker', indicating the OCR/parsing engine used.
EXAMPLEStwo ways to call
EXAMPLE 1 · curl
curl -X POST https://x402.org/v1/pdf-extract-tables \
  -H 'Content-Type: application/json' \
  -d '{ }'
first response = 402 Payment Required with payment requirements; sign + retry with X-PAYMENT.
EXAMPLE 2 · mcp
# install once
claude mcp add x402 --command "npx x402-deployer-mcp"

# then ask Claude Code:
# "use the pdf-extract-tables tool to ..."
MCP server handles payment automatically — your coding agent just calls the tool by name.
METADATA
tags
pdftable-extractionocrmediakitdocument-parsingfinancial-tablesdatalabpdf-tables
methods
POST
cluster
mediakit
price
$0.10 USDC per call
ADJACENTother endpoints in mediakit
endpointdescriptionprice
extract-tablesExtract tables from PDF / table extractor / PDF to CSV / spreadsheet from PDF.$0.10
mp4-to-mp3MP4 → MP3 audio extractor.$0.10
pdf-to-jpgPDF to JPG / PNG / WEBP image converter.$0.10
speaker-diarizeSpeaker diarization / who-said-what transcription.$0.10
transcribeVideo / audio transcription via Whisper v3.$0.10
upscale-imageAI image upscaler / super-resolution / image enlarger.$0.10
video-summarizeVideo summarizer / podcast summarizer / lecture notes generator.$0.10
video-to-audioVideo → audio extractor / video to audio converter.$0.10
SEE ALSO
agentutility(7) · mediakit(7) · x402(7) · mcp(7) · llms.txt · registry.json · bazaar.x402.org