Can VeriVox tell me who is speaking?

You can enroll a reference sample of a known voice; VeriVox scores every segment against it and flags conflicts. Scores are a labeling aid and corroboration for human attribution — the export always says which lines were machine-attributed versus certified by a person.

How much does VeriVox cost?

Personal use is free forever. Commercial self-hosting by law firms, court reporters, forensic experts and agencies requires a paid license. The core engines are MIT open source.

Is the output admissible in court?

VeriVox produces transcripts designed for scrutiny: chain-of-custody manifest, reproducible enhancement commands, per-line provenance, and preserved machine originals under every human correction. Admissibility is always case- and jurisdiction-specific — the design goal is that nothing in the transcript has to be taken on faith.

Forensic Audio Intelligence

The audio is the evidence.
Everything else must prove itself.

Q: How is VeriVox different from AI transcription services?

A single AI pass hallucinates on faint audio. VeriVox runs multiple enhancement recipes and multiple transcription passes, aligns them word-by-word, and reports agreement. Disagreements are queued for human ear-verification — and the export labels every line as machine-only, corrected, or verified by ear.

Evidence audio is faint, chaotic, and unforgiving — and a single AI pass will confidently hallucinate through it. VeriVox enhances in recipes, transcribes in passes, votes the passes into consensus, verifies the voices — and keeps the human ear as the final arbiter. What comes out is a transcript a courtroom can interrogate.

Request early access See the pipeline

ingest 2030-01-14T09:14:03Z · sha256 7f3a91cc…4be081 · container m4a/aac 48k mono · custody manifest written · 0 bytes left this machine

The problem

One pass of AI is an opinion,
not a record.

Jail calls. Bodycam wind. Voicemails from a moving car. A threat overheard through a floor. On faint audio, a speech model doesn't say "inaudible" — it invents something polite and moves on. Presenting that as a transcript isn't just wrong. It's discoverable.

ASR confidence scores measure fluency, not truth
Diarization swaps speakers exactly when it matters
Enhancement chains go undocumented — and get challenged
"The AI transcribed it" survives no cross-examination

> What the model heard in the faintest minute, confidence 0.34:
>  "Thank you."  ← hallucinated filler on faint speech

> What five passes + an enrolled voice + a human ear produced:
>  a verified excerpt — with the machine's failure preserved
  beside it, because that contrast IS the credibility.

A real pattern from a real matter: the gravest lines in a recording are usually the faintest. The tool that admits what the machine couldn't hear is the tool whose verified lines get believed.

The pipeline

Five stages. Every one leaves a trail.

Each stage writes its provenance into the record — reproducible down to the exact command. Read the full pipeline →

01 · INGEST

Chain of custody

SHA-256, container forensics, timestamps — a custody manifest before anything touches the audio.

02 · ENHANCE ×N

Recipes, not tweaks

Denoise, voice-isolate, level — as named, scored, reproducible recipes. Nothing added, nothing invented.

03 · TRANSCRIBE ×N

Passes, not a pass

Multiple models × multiple recipes, word-level timestamps. One opinion is noise; a population is data.

04 · CONSENSUS

Agreement you can defend

Word-level voting across passes. High agreement earns trust; disagreement goes to the ear-queue — sorted, not hidden.

05 · VERIFY

Voices & the ear

Enrolled-voice similarity flags who's who — and refuses to guess. The human ear certifies. The export says which was which.

The honesty ledger

Every line carries its provenance.

This is the export. Not a wall of text — a ledger where each line declares how it earned its place.

# transcript · exhibit-04.m4a · sha256 7f3a91cc… · 5 passes · 2 voices enrolled · operator-certified

[03:12.4–03:16.1] ✓ SUBJECT-A: You were never supposed to be here today. agree 5/5 · verified by ear

[03:22.0–03:25.8] ✓ WITNESS-1: I'm telling you exactly what I saw. agree 4/5 · verified by ear

[07:41.8–07:44.0] ✎ SUBJECT-A: The paperwork was already gone by then. agree 3/5 · corrected

(ASR original: "the paper it was already gone been" — original preserved)

[22:03.1–22:04.6] ⚑ SUBJECT-A: [grave excerpt — operator-verified] agree 0/5 — machine could not resolve

[24:48.2–24:50.0] · UNATTRIBUTED: [flagged: background media audible] excluded from excerpts

✓ verified by ear ✎ corrected — machine original preserved ⚑ grave excerpt — flagged for counsel agree n/N — cross-pass consensus · machine-only — says so, out loud

Data sovereignty

Your evidence never leaves your machine.

No upload. No account. No vendor holding your client's worst day on their servers. The models run locally — on the laptop in the room where the privilege lives. For firms that want shared infrastructure, a private bridge routes — and never stores. Security & sovereignty →

▸ runs offline — full pipeline, air-gapped if you need it

▸ local models — transcription, diarization, voice embeddings

▸ custody manifest — hashes + commands, reproducible end-to-end

▸ open tools — auditable pipeline, no black box

▸ zero retention — we cannot lose what we never hold

▸ zero telemetry — the evidence doesn't phone home

Who it's for

Four professions. One standard of proof.

Law firms

Litigation

Turn awful evidence audio into defensible transcripts and excerpt sheets — with the provenance opposing counsel will ask for already attached.

Court reporters

Certification

A multi-pass draft with the disagreements pre-sorted. Your ear certifies; your certification stays the authority.

Investigators

Attribution

Enroll a voice once; score it across every recording in the file. "Same speaker?" becomes a number with a method behind it.

Government

Sovereignty

Local-first processing that satisfies the data-residency rules the cloud vendors ask you to waive.

Open source & free for personal use

Free for the people it was built for.

VeriVox exists because one of us needed it — a parent, alone with a recording that mattered, refusing to hand a courtroom a machine's guess. Personal use is free, forever. The core engines are MIT open source — the method is public. The firms that bill with it license it, and fund the promise. Open source & licensing →

▸ MIT core — consensus, voice-verify, auditor: forkable, auditable

▸ personal use — free forever, no feature gates on the truth

▸ commercial self-host — licensed: firms, reporters, experts, agencies

▸ the dashboard — routes and renders; never stores your audio

Questions

Asked before you ask.

Does VeriVox upload my audio to the cloud?

No. The full pipeline — enhancement, transcription, diarization, voice embeddings — runs locally. It can run fully offline, air-gapped if your matter requires it. Zero retention, zero telemetry.

How is this different from AI transcription services?

A single AI pass hallucinates on faint audio — confidently. VeriVox runs multiple enhancement recipes and multiple transcription passes, aligns them word-by-word, and reports agreement. Disagreement isn't hidden; it's queued for a human ear. The export labels every line: machine-only, corrected, or verified.

Can it tell me who is speaking?

Enroll a reference sample of a known voice and VeriVox scores every segment against it, flagging conflicts with the diarization. Scores are a labeling aid and corroboration for human attribution — never presented as machine identification. When the audio is too faint to score, it says so instead of guessing.

Will the output hold up in court?

The design goal is that nothing has to be taken on faith: custody manifest from ingest, reproducible enhancement commands, per-line provenance, machine originals preserved under every correction. Admissibility is case- and jurisdiction-specific — but every question a cross-examiner asks has an answer written into the record.

How much does it cost?

Personal use is free, forever — that's the mission. Commercial self-hosting (law firms, court-reporting practices, forensic experts, agencies) requires a paid license, which funds the free tier. The core engines are MIT open source. Details →

What do I need to run it?

A modern laptop. Apple-silicon Macs get hardware-accelerated local models; Windows and Linux are supported. Very long recordings benefit from more memory — nothing exotic.

Early access

The most advanced forensic
transcription stack in the world —
and it runs on your machine.

Built by people who needed it on a real matter — and refused to present a hallucination to a courtroom.

Request early access

verivoxai.com · pilot cohort: litigation · court reporting · investigations

The audio is the evidence.Everything else must prove itself.

One pass of AI is an opinion,not a record.