why this audit matters
a browser-based transcription tool that runs the model locally and a browser-based transcription tool that uploads your audio to a server look identical from the outside. both have a drag target. both produce a transcript. both call themselves "private" or "secure." the difference — whether your audio leaves the device or doesn't — is invisible unless you check.
for some kinds of audio, the difference is the whole point. therapists with HIPAA-bound recordings, lawyers with privileged interviews, journalists with source material — these aren't buyers who can rely on a vendor's word. they need to verify.
the audit is straightforward. five minutes, no special tools, no developer skills required. here's how.
before you start
you'll need a desktop browser. chrome, firefox, safari, edge, and arc all support the developer tools we'll use. mobile browsers don't, so do this on a laptop.
you'll also need a small audio file you don't mind testing with. anything works — a 30-second voice memo, a public podcast clip, the audio off a public youtube video. you're not testing accuracy. you're testing data flow.
step 1 — open the network tab
on chrome, edge, or arc: right-click the page, chooseinspect, then click the network tab in the developer tools panel that opens.
on firefox: same thing. inspect → network.
on safari: enable the develop menu first (preferences → advanced → "show develop menu in menu bar"), then develop → show web inspector → network.
you should see a table with columns for name,type, size, time, and others. this is every network request the page is making, in real time. it's the audit surface.
step 2 — clear the network log and reload
there's a clear button in the network panel — a circle with a line through it, or just the word "clear." click it. then reload the page (cmd+r or ctrl+r).
you'll see the page's load requests come through: HTML, CSS, JavaScript, fonts, images. read the names. they should look like assets that belong to the site you're on, possibly with one or two CDN domains for fonts or analytics. nothing yet should be sending data out — these are all downloads, the page fetching itself.
this baseline matters. if there are already mysterious POST requests happening on page load, you have a separate problem (the page is doing something with your visit), but it's also not directly relevant to the transcription audit. note them and continue.
step 3 — note the model file load
a real on-device tool has to load the speech-recognition model into your browser. these files are big — typically 50–200MB depending on which model the tool uses. you'll see one or two requests around that size.
the request URL usually points to a content-delivery network (cloudflare, jsdelivr, hugging face, the tool's own static assets domain). the file extension is often .bin, .onnx, or .gguf. the request method is GET. the request body is empty — you're downloading the model, not uploading anything.
this is the moment that distinguishes browser-based tools from cloud tools. a cloud tool doesn't need to download a 200MB model — it has the model on the server. so a model download is positive evidence that the tool is set up to run inference locally. it's not proof (the tool could download the model and still upload your audio anyway), but no model download is a red flag.
step 4 — drop the audio in
now the actual test. drag your audio file into the tool's drop target, or click upload, or whatever the tool uses to accept the file. while you do this, watch the network tab.
if the tool is genuinely on-device, you should see no new network requests appear during the file drop. the file is being read locally by the browser; no upload is happening. zero new rows in the network tab. zero. nada.
if you see a new POST request appear with your audio file in the request body — typically multipart/form-data with the filename in the payload — your audio is being uploaded. the marketing claim was wrong. close the tab.
there's a middle case: the tool might send a small POST during transcription that contains usage telemetry but not the audio (file size, model selected, success/failure). this is common and not necessarily a privacy problem, but you should be able to inspect the request body and confirm the audio bytes aren't in it.
step 5 — let it run, watch the requests
while transcription is running, the network tab should stay quiet (or get only the small telemetry request mentioned above). the model is doing its work locally — your laptop's GPU is busy, your CPU is busy, but the network is not. you can watch your activity monitor or task manager to confirm the work is happening locally.
when transcription finishes, the transcript should appear without another network request. if you see a large response come back over the wire just before the transcript appears, something's off — either the tool quietly uploaded the audio earlier (you may have missed the request) or it's pulling the result from a server.
red flags to watch for
- POST requests with binary bodies during file drop. this is the audio being uploaded. if you see one, the "on-device" claim is false.
- requests to third-party domains during transcription. even if your audio isn't visibly in the body, requests to domains you don't recognize are worth investigating — preview the response, check the size, see if any chunks look like audio data.
- WebSocket connections that stay open and exchange large frames. websockets sit in their own row in the network panel. click them and watch the message frames — if there's a steady stream of large binary messages going out, that's audio being streamed.
- vague "we don't store your audio" language. this language is compatible with "we receive your audio, transcribe it, then delete it." that's not on-device. on-device means the audio never reaches their servers in the first place.
- model file appears to be much smaller than expected. on-device speech recognition models are typically 50–500MB for English, larger for multilingual. a 5MB "model" download is not a real model — it's likely a thin client that talks to a server.
what good looks like
a tool that passes this audit will have:
- a substantial model file download on first visit (cached after that, so subsequent visits don't re-download)
- no new requests when you drop the audio file in
- no new requests during transcription, or at most a tiny telemetry request without the audio bytes
- visibly local CPU/GPU usage during transcription (your machine fan kicks on, the tab uses several GB of memory)
- a transcript that appears in the editor when transcription completes, with no large server response just before
why this audit matters more than vendor claims
a vendor's word is a contract. the audit is verification. for sensitive audio — therapy, depositions, source-protected interviews — verification is what makes the tool actually usable. without it, you're trusting a marketing page.
this audit also gives you a clean answer to a question that's otherwise hard to settle: when something goes wrong (a subpoena, a breach, a leaked transcript), where could the audio possibly have ended up? if you've audited the tool and confirmed it never left your device, the answer is "nowhere it shouldn't have." that's a defense that works.
we publish this audit on our own tool because we think it should be the default expectation, not a courtesy. when you're ready to test ours, the steps above are exactly the ones that work. if you find anything that doesn't match this description, write us at hello@audiohighlight.com.