Forensic Audio Intelligence
The audio is the evidence.
Everything else must prove itself.
Evidence audio is faint, chaotic, and unforgiving — and a single AI pass will confidently hallucinate through it. VeriVox enhances in recipes, transcribes in passes, votes the passes into consensus, verifies the voices — and keeps the human ear as the final arbiter. What comes out is a transcript a courtroom can interrogate.
ingest 2030-01-14T09:14:03Z · sha256 7f3a91cc…4be081 · container m4a/aac 48k mono · custody manifest written · 0 bytes left this machine
The problem
One pass of AI is an opinion,
not a record.
Jail calls. Bodycam wind. Voicemails from a moving car. A threat overheard through a floor. On faint audio, a speech model doesn't say "inaudible" — it invents something polite and moves on. Presenting that as a transcript isn't just wrong. It's discoverable.
- ASR confidence scores measure fluency, not truth
- Diarization swaps speakers exactly when it matters
- Enhancement chains go undocumented — and get challenged
- "The AI transcribed it" survives no cross-examination
> "Thank you." ← hallucinated filler on faint speech
> What five passes + an enrolled voice + a human ear produced:
> a verified excerpt — with the machine's failure preserved
beside it, because that contrast IS the credibility.
A real pattern from a real matter: the gravest lines in a recording are usually the faintest. The tool that admits what the machine couldn't hear is the tool whose verified lines get believed.
The pipeline
Five stages. Every one leaves a trail.
Each stage writes its provenance into the record — reproducible down to the exact command. Read the full pipeline →
01 · INGEST
Chain of custody
SHA-256, container forensics, timestamps — a custody manifest before anything touches the audio.
02 · ENHANCE ×N
Recipes, not tweaks
Denoise, voice-isolate, level — as named, scored, reproducible recipes. Nothing added, nothing invented.
03 · TRANSCRIBE ×N
Passes, not a pass
Multiple models × multiple recipes, word-level timestamps. One opinion is noise; a population is data.
04 · CONSENSUS
Agreement you can defend
Word-level voting across passes. High agreement earns trust; disagreement goes to the ear-queue — sorted, not hidden.
05 · VERIFY
Voices & the ear
Enrolled-voice similarity flags who's who — and refuses to guess. The human ear certifies. The export says which was which.
The honesty ledger
Every line carries its provenance.
This is the export. Not a wall of text — a ledger where each line declares how it earned its place.
Data sovereignty
Your evidence never leaves your machine.
No upload. No account. No vendor holding your client's worst day on their servers. The models run locally — on the laptop in the room where the privilege lives. For firms that want shared infrastructure, a private bridge routes — and never stores. Security & sovereignty →
Who it's for
Four professions. One standard of proof.
Law firms
Litigation
Turn awful evidence audio into defensible transcripts and excerpt sheets — with the provenance opposing counsel will ask for already attached.
Court reporters
Certification
A multi-pass draft with the disagreements pre-sorted. Your ear certifies; your certification stays the authority.
Investigators
Attribution
Enroll a voice once; score it across every recording in the file. "Same speaker?" becomes a number with a method behind it.
Government
Sovereignty
Local-first processing that satisfies the data-residency rules the cloud vendors ask you to waive.
Open source & free for personal use
Free for the people it was built for.
VeriVox exists because one of us needed it — a parent, alone with a recording that mattered, refusing to hand a courtroom a machine's guess. Personal use is free, forever. The core engines are MIT open source — the method is public. The firms that bill with it license it, and fund the promise. Open source & licensing →
Questions
Asked before you ask.
Does VeriVox upload my audio to the cloud?
No. The full pipeline — enhancement, transcription, diarization, voice embeddings — runs locally. It can run fully offline, air-gapped if your matter requires it. Zero retention, zero telemetry.
How is this different from AI transcription services?
A single AI pass hallucinates on faint audio — confidently. VeriVox runs multiple enhancement recipes and multiple transcription passes, aligns them word-by-word, and reports agreement. Disagreement isn't hidden; it's queued for a human ear. The export labels every line: machine-only, corrected, or verified.
Can it tell me who is speaking?
Enroll a reference sample of a known voice and VeriVox scores every segment against it, flagging conflicts with the diarization. Scores are a labeling aid and corroboration for human attribution — never presented as machine identification. When the audio is too faint to score, it says so instead of guessing.
Will the output hold up in court?
The design goal is that nothing has to be taken on faith: custody manifest from ingest, reproducible enhancement commands, per-line provenance, machine originals preserved under every correction. Admissibility is case- and jurisdiction-specific — but every question a cross-examiner asks has an answer written into the record.
How much does it cost?
Personal use is free, forever — that's the mission. Commercial self-hosting (law firms, court-reporting practices, forensic experts, agencies) requires a paid license, which funds the free tier. The core engines are MIT open source. Details →
What do I need to run it?
A modern laptop. Apple-silicon Macs get hardware-accelerated local models; Windows and Linux are supported. Very long recordings benefit from more memory — nothing exotic.
Early access
The most advanced forensic
transcription stack in the world —
and it runs on your machine.
Built by people who needed it on a real matter — and refused to present a hallucination to a courtroom.
Request early accessverivoxai.com · pilot cohort: litigation · court reporting · investigations