preview
this tool ships with launch. for now, the page exists so you know what's coming and so you can join the list to be told when it's live. nothing fake, nothing hidden — the working build is in progress and will be sent to people on the list when it can run a real file end-to-end.
how it will work
- you open this page. the transcription model loads as a one-time download (around 200MB on first visit, cached afterwards). you can audit the network tab — the only request fetches the model.
- you drop an audio file in (mp3, m4a, wav, flac, opus — anything ffmpeg-can-read). the file is decoded in the browser, the model runs locally, and the transcript appears. no upload happens at any point.
- you fix labels, click words to verify against audio, and export in the format you want (.docx, .srt, .vtt, plain text, json).
- you close the tab. the in-browser session is gone. the audio file stays where it was; the transcript exports are wherever you saved them.
hardware floor
on-device transcription needs a reasonably modern device. our floor at launch:
- chrome, edge, or arc on a desktop or laptop with WebGPU support. (firefox is close, safari is on the roadmap.)
- discrete GPU helps but is not required. a 2021-or-later mac (M1/M2/M3) or a current windows laptop with an integrated GPU runs the model at roughly real-time.
- 4GB+ free RAM during transcription. the model takes ~2GB; the audio decoding takes more depending on length.
for older machines or unsupported browsers, fall back to cloud mode (which uploads the audio to our servers under standard terms) or wait for the on-device build to be smaller.
what runs locally, what doesn't
all audio decoding, model inference, and editing happens in your browser. the only network request the page makes during transcription is fetching the model file (cached after first load). nothing about your audio is sent anywhere.
the page itself is loaded from our server (this domain). if you want to verify there's no hidden audio upload, use your browser's network tab — every request is visible. we publish a how-to on this verification.
limits
- english first. on-device models are smaller than cloud models. they handle english near-fluently and degrade on most other languages. multilingual on-device support arrives after launch.
- real-time-ish. a 30-minute file transcribes in roughly 30 minutes on a modern laptop. cloud mode is faster (1–2 minutes for the same file) — the tradeoff is the upload.
- can't share a link. the transcript lives in your browser. exports are how you move it elsewhere. a future build may add encrypted-share for on-device transcripts; not at launch.
who this is for
three audiences whose work we've shaped this around:
- therapists — session audio that doesn't ride a network. see for therapists.
- lawyers — privileged audio without a vendor in the chain. see for lawyers.
- journalists — source-protected interviews. see for journalists.