The math has flipped
A junior engineer used to write 30 lines a day and a reviewer would catch the bugs in 5 minutes. Now a junior agent writes 3,000 lines an hour. Same reviewer, same 5 minutes per logical chunk. Pretty obvious where the queue forms.
GitHub's own data through 2025 showed PR open rates climbing roughly 2x year over year while merge times stayed flat. The reviewer is the bottleneck. So what does a sensible team do? They automate the reviewer. But the existing tools — Snyk, Socket.dev, Dependabot, the GitHub security scanner — were built for a different shape of problem. Each one runs as a CI step or a dashboard. None of them are a single HTTP call you can paste into an agent's tool list.
Prooflayer is the agent-callable rewrite of that surface.
What's in it
Thirteen endpoints right now. Each one answers a different "is this safe to ship" question. The aggregator endpoint, production-readiness-score, is the lead. It composes the others into one number from 0 to 100. Hit it before deploying. Cost: $0.10 USDC.
The individual probes underneath:
secrets-exposure-check($0.02) — grep + entropy + path heuristics on top-level config files. Catches AWS keys committed to.env, private keys leaked intonext.config.js, server-only env vars exposed viaNEXT_PUBLIC_.dep-risk-summary($0.03) — openspackage.json+ lockfile, counts unpinned deps, flags transitive risk, surfaces known-bad packages.prompt-injection-surface($0.03) — scans the repo for places where user-controlled strings flow into LLM prompts without sanitization. The injection surface that most repos don't know they have.db-migration-risk($0.02) — looks at a SQL migration and tells you which ALTER statements will lock the table, which DROP COLUMNs are silent foot-guns, whether the new index needs CONCURRENTLY on Postgres.deploy-config-risk($0.02) —Dockerfilelint,wrangler.tomlreview,vercel.jsonhardening. The boring config files that ship the obvious mistakes.package-risk-npmandpypi-package-risk($0.03 / $0.01) — supply-chain scanners with typosquat detection. Catches thecolourvscolorthing before it's in production.cve-lookup($0.005) — the cheapest probe. Just a CVE database query.
There's also ai-content-detector and github-repo-health and a couple others. Full list at /prooflayer.
Why per-call beats SaaS here
Snyk's per-developer-per-month pricing makes sense if every dev is reviewing every commit. It doesn't make sense when an agent wants to spot-check 17 dependency graphs at 3am before deciding whether to merge. The same goes for Socket.dev and GitHub Advanced Security. They priced for human review pace.
At $0.02 per scan, an agent can run secrets-exposure-check on every PR open across an org for a few dollars a day. The math isn't even close.
A pattern that already works
The composite endpoint, production-readiness-score, gets called way more than the individual probes. Agents like one number. They like a verdict. They don't want to run six scans, parse six outputs, then reason about which findings outweigh which. They want "score: 73, ship: probably yes, the high-severity findings are in dev-only paths".
So the lead endpoint became the busiest endpoint. Worth noting because that pattern probably generalizes. If you're building a verification cluster, build the aggregator first and the probes as backing parts. Don't ship 12 probes and hope the agent figures out how to combine them.
What's missing
A couple of obvious gaps. There's no IAC scanner yet (Terraform, Pulumi). There's no runtime-behavior probe (sandbox the binary, watch syscalls). There's no PR-comment poster — Prooflayer answers questions but doesn't yet leave breadcrumbs on the PR thread for human reviewers. All of those are in the build queue.
The other thing worth saying: a 0-100 score is a starting point, not a verdict. If your CI is gating merges on score >= 80 from an endpoint that charges $0.10 a call, you've made a real-money commitment to that endpoint's calibration. We publish the scoring rubric so it's auditable. But the right move for most teams is to use the score as a signal, not a gate. At least until you've watched it correlate with actual incidents for a few months.
Probes are the durable part. Scores are the convenience.