when this matters for authors
most transcription tools are built for short, disposable audio: a meeting, a podcast episode, a one-off interview. a book project doesn't look like that. the audio accumulates over years, the material is unpublished by definition, and a publisher contract or a source agreement may already constrain where the file can sit.
the concrete concerns:
- publisher contracts and confidentiality. many trade nonfiction contracts include confidentiality clauses about unpublished material. uploading manuscript audio or research interviews to a transcription vendor — even a reputable one — may quietly cross a line in the contract you signed.
- unannounced subjects. you're profiling a person who hasn't agreed to be profiled publicly yet, or you're writing about a company before any announcement. their name, in your interview audio, on a third-party server, is a leak surface.
- source protection on sensitive beats. oral histories with survivors. interviews with sources speaking on background. biographical research with people who agreed to talk to you and not to your vendors. the transcription vendor that holds the audio holds the source.
- "this is my work, i want it to stay mine." unpublished manuscript material is, in a real sense, the single most valuable artifact a working author produces. keeping it off third-party infrastructure is a defensible default.
two jobs this fits
authors send us audio for two recurring tasks. they look different and the privacy argument lands the same way for both.
- (a) interview-based research. oral histories. expert interviews. biographical research with friends, family, colleagues of a subject. these recordings are long, they accumulate, and the source agreements often assume the audio stays with you. on-device transcription means the file never moves.
- (b) dictating manuscript drafts. chapter drafts on a walk. plot work in the car. a memoir passage you can speak more naturally than you can type. record the file, transcribe it on your laptop, edit the transcript into chapter prose. the dictation-to-draft loop without the manuscript audio leaving your machine.
workflow
- capture the audio. record dictation in the browser with the voice recorder, or upload existing files from a phone, a field recorder, a zoom call with a source.
- open audiohighlight, select private mode. transcription runs locally in the browser via WebGPU + Whisper. nothing uploads.
- review the transcript. fix speaker labels for "me" and the interviewee once each, and they propagate across the file. click any word to replay the audio at that second — the verification loop you'll want when a quote goes into the book.
- export. .docx for scrivener, word, or google docs. .md for obsidian or a manuscript repo. the long-form export also works for chapter drafts: it collapses the speaker-by-speaker structure into clean prose you can edit into a chapter.
- close the tab. the audio file stays on your machine; the transcript is wherever you saved it. nothing on our side.
where on-device fits and where it doesn't
- fits: research interviews and dictated drafts. anything where the audio leaving your machine adds a risk you'd rather not own — and for unpublished manuscript material, that's most of the time.
- fits: long-form audio you process at your own pace. a 90-minute oral history transcribes in about 90 minutes on a current macbook. for one or two files at a time over the course of a writing day, the wait is fine.
- doesn't fit: bulk-processing a years-old archive overnight. if you have 80 hours of legacy interview tape from a previous project, cloud mode is the right tool for the bulk pass — it runs 5–10x faster. you can still keep the currently-active material on-device.
- doesn't fit: live captioning during the interview. we transcribe finished files. for live captions while you talk to a source, use a dedicated tool — that's a different product.
what we don't claim
we don't claim verbatim accuracy. for direct quotes that go in the book, listen to the audio with the transcript open and verify the ones that matter. the editor is built for this — click the word, hear the second.
we don't claim the on-device model handles every accent and language equally. english is best. for non-english source interviews, cloud mode is more accurate; private mode degrades on languages outside the on-device model's strong set.
pricing
$0.25 per minute. a 90-minute oral-history interview costs $22.50; an hour of dictation runs $15. private mode and cloud mode are the same price. no subscription, no minimum.
book-length projects accumulate fast — 10+ hours of audio is common, 50+ hours not unusual on biography or oral-history work. at that scale, a flat per-project rate often makes more sense than per-minute. write hello@audiohighlight.com with a rough total runtime and we'll quote a flat rate for the project.