manifesto — closing Edison's gap, 150 years late

the human voice perfectly reproduced slow or fast by a copyist and written down.

— Thomas Edison, lab notebook, July 1877

the year recording was invented, the spec already existed.

edison wrote that line for himself, in a lab notebook, the summer he invented the phonograph. it isn't marketing. it's the first written description of a product nobody had built yet. captured voice. perfectly reproduced. written down. one continuous workflow, no stenographer in the room, no hours of transcription afterwards.

the year was 1877. recording itself was three months old.

a hundred and fifty years of partial answers.

stenography stayed expensive. shorthand stayed hard. dictation machines arrived in offices and never left them. cassette recorders went into reporters' bags. minidisc gave way to digital recorders, gave way to phones. and at every step, the last mile — the one edison wrote down — got handed back to a human.

then in 2014, neural speech recognition reached human-parity on read speech in english. by 2022, whisper made near-human-quality transcription a free download. the model problem was solved. by 2026, every commercial transcription tool has roughly the same model accuracy in the noise band. differences in word error rate between serious tools are smaller than the variance you'd see between two human transcribers on the same file.

the model problem solved itself. the gap edison wrote about stayed open.

the gap is what happens after the file arrives.

every transcription tool you've used in the last five years has roughly the same model. the difference between them is what they do with the model's output. most of them treat the transcript as a deliverable. you get a .docx, the vendor calls the job done, you spend the next hour fixing speaker labels and proper nouns and verifying quotes against the audio in a separate program.

the cleanup tax across the category sits at 25% to 40% of audio length, by user-reported numbers. on a thirty-minute interview that's nine to twelve minutes of editing for every file. multiply by a research project. multiply by a podcast season. multiply by a paralegal's quarterly deposition load. the tax is invisible because nobody invoices for it. it's also larger than the visible bill.

what we believe, in five sentences.

the file is not the deliverable. the usable record is.
every word in a transcript should remember the second of audio it came from.
fixing one speaker label should fix all of them.
audio you don't want uploaded shouldn't be uploaded — ever, by anyone, including us.
the next ten years of transcription is a workflow competition, not a model competition. we're competing there.

what the work looks like, concretely.

a transcript that arrives as a workspace, not a document. a browser editor where every word is an audio anchor — click it, hear it. speaker labels you fix in bulk and watch propagate. proper nouns the model learns once and remembers across every file in your account. exports shaped like the work the transcript is going into: deposition format for paralegals, NVivo CSV for qualitative researchers, Jefferson notation for conversation analysts, .srt and .vtt for captioners.

a private mode that runs the model in your browser, on your device, with no audio upload. for therapy, for legal interviews, for journalism source material — the audio stays on the same device that recorded it. there is no vendor in the chain. the audit takes five minutes; we wrote up how to run it.

a benchmark that measures the metric that actually predicts your time — minutes of cleanup per hour of audio — on a published corpus, with the methodology open. we ship the numbers, not the marketing. that's the benchmark page.

a price that fits how individuals actually buy transcription: $0.25 per minute, pay per file, no subscription, no minimum, all features included. private mode and cloud mode the same price. that's the pricing page.

what we won't do.

we won't be a meeting bot. no calendar OAuth, no auto-join, no live captioning of video calls. file uploads only. the buyers we serve refuse meeting bots — many of them are required to refuse, by HIPAA, by privilege, by source-protection ethics. file-upload-only is a feature, not a limitation.
we won't sell a tier upgrade for bulk speaker fix. features that move the cleanup tax should be in the product, not in a tier. the current category practice of gating workflow primitives behind a $30/month subscription tax is one of the things the manifesto is for.
we won't train on customer audio. cloud mode audio is processed for transcription and discarded per the retention policy. private mode audio doesn't reach us at all. if you don't trust the claim, the audit instructions are public.
we won't compete on word error rate. it's a useless metric for buyers. the WER differences between serious tools in 2026 sit inside the noise band of inter-cleaner variance. if a vendor is leading with WER, the vendor is selling you the wrong number. we wrote a long version of this.

who this is for.

the journalist who pays for transcription out of pocket because the newsroom IT lags. the qualitative researcher whose IRB won't let interview audio leave the laptop. the paralegal who bills hours rebuilding deposition format from a generic .docx. the podcast producer trimming an off-the-record passage out of a working file. the therapist whose carrier won't reimburse the time she spends on session notes. the conversation analyst producing jefferson transcripts by hand at an hour-per-minute ratio.

all of them have been making the same compromise for the same reason — the tools delivered the file and called it finished. we're closing the gap edison wrote down a hundred and fifty years ago. the version he would recognize.

a hundred and fifty years late, but worth getting right.

closing Edison's gap.

the year recording was invented, the spec already existed.

a hundred and fifty years of partial answers.

the gap is what happens after the file arrives.

what we believe, in five sentences.

what the work looks like, concretely.

what we won't do.

who this is for.

the work, broken into pages

benchmark

private transcription

formats

pricing

about

writing

lifetime deal while we're in beta.