we are not your compliance counsel. this is not legal advice. this is a long-form unpacking of how HIPAA's requirements interact with the workflow of using an AI transcription tool. for binding interpretations, ask your compliance officer or HIPAA-specializing counsel.
the BAA is necessary, not sufficient
a business-associate agreement is the document covered entities sign with their vendors that handle protected health information. if you're a clinician evaluating an AI transcription service, asking whether they sign a BAA is the right first question. but it's the first question, not the only one.
a BAA establishes a contractual obligation. it does not verify that the vendor's actual practices match the contract. it doesn't address the physical security of the device the audio was recorded on. it doesn't cover access controls inside your practice. it doesn't dictate the retention policies the vendor uses internally — those flow from the terms of the BAA but don't get audited automatically.
when a covered entity gets investigated by OCR (the HHS office for civil rights, which enforces HIPAA), the investigation looks at all of these. the BAA is one document in a stack. the rest of the stack is your responsibility, and some of it is structurally easier to handle if the audio never reaches a third party in the first place.
the four categories HIPAA actually cares about
HIPAA's security rule organizes safeguards into three categories — administrative, physical, and technical — plus a privacy rule that handles disclosure and consent. an AI transcription tool intersects all four.
1. administrative safeguards
your practice's policies about who can access PHI, how training is conducted, how breaches are responded to, how access is revoked when staff leave, and how risk assessments are conducted. when an AI transcription tool enters the picture, the administrative safeguards have to extend to:
- training for any staff member who will use the tool — what audio is appropriate, what isn't, how to handle accidental PHI recording
- access management — how user accounts on the transcription service map to your practice's authentication; what happens when a clinician leaves and their account needs to be revoked
- incident response — what happens if the vendor reports a breach; how the practice's notification obligations interact with the vendor's
- vendor management — annual review of the vendor's certifications, security posture, and any sub-processors they use (the "downstream BAA" question)
most of these don't go away with on-device transcription — your practice still needs them. but several of them get structurally easier when there's no vendor in the chain. you don't manage a vendor that doesn't exist.
2. physical safeguards
the actual hardware. who has physical access to the device the PHI lives on. how lost or stolen devices are handled. how disposed-of devices are wiped.
for AI transcription, physical safeguards apply to:
- the recording device — phone, laptop, dictation recorder. who has physical access. is it password-protected. does it auto-lock. is it encrypted at rest.
- the editing device — the computer where you review and edit the transcript. same questions. for cloud-mode transcription, this is also the device that displays the transcript pulled from the vendor.
- backup media — if you back up your devices, the audio and transcripts are on the backup. that backup needs the same physical controls.
on-device transcription doesn't change the physical safeguards story — the audio file still lives on a device, and you still have to control physical access to that device. but it does mean there's no vendor-side data center to add to the physical-safeguards inventory.
3. technical safeguards
encryption, access controls, audit logs. for AI transcription this is the most-discussed category and the one where vendor differences matter most.
the questions to ask any cloud-based transcription vendor:
- encryption in transit — TLS 1.2 or 1.3, certificate pinning, no fallback to weak ciphers
- encryption at rest — AES-256 for audio and transcripts, key management practices, key rotation
- access controls — who at the vendor can see customer audio and transcripts; audit logging of internal access; SSO and MFA on customer accounts
- retention — how long audio is kept after transcription completes; deletion timeline; whether deletion is verifiable
- training data isolation — whether customer audio enters any model training pipeline; what verification exists
- incident detection — how the vendor detects unauthorized access or exfiltration of customer data; how quickly they notify you when something happens
for on-device transcription, most of these become moot — the audio doesn't leave your device, so vendor-side encryption, access controls, retention, and exfiltration detection don't apply. your laptop's full-disk encryption (FileVault, BitLocker) handles the at-rest piece. there's no in-transit piece because there's no transit. there are no vendor employees who can see your audio because there's no vendor.
this is the single biggest reduction in compliance surface that on-device transcription provides. it's also the reason the marginal value of an additional vendor BAA is much lower when on-device is an option.
4. privacy rule (consent and disclosure)
the privacy rule handles when PHI can be disclosed and to whom. for AI transcription this comes up around:
- patient consent for recording — separate from HIPAA, but interacts with state recording- consent laws (one-party vs two-party). HIPAA doesn't require patient consent to use AI transcription on a recording made with appropriate consent; state law may.
- minimum necessary — the principle that PHI use should be limited to what's needed for the purpose. for AI transcription, this affects what you upload (the whole session vs just the parts you need to dictate notes from) and how long the transcribed output sits on the vendor's servers.
- de-identification — whether the transcripts you produce contain identifiers (name, DOB, address, etc.); whether the vendor's processing constitutes "use" of the identifiers vs mere transcription of words
where on-device transcription removes whole categories of work
when the speech-recognition model runs in your browser:
- no business associate exists — the BAA isn't necessary because there's no business associate. (this isn't a workaround; it's the structural answer.)
- technical safeguards collapse to your device — encryption, access control, retention all happen on hardware you already control under your existing HIPAA-compliant practice posture.
- vendor management goes away — there's no vendor to audit annually, no sub-processor chain to track, no SOC 2 to review.
- incident response gets simpler — there's no vendor-side breach scenario; the only breach scenario is your own device.
- minimum-necessary becomes structural — the vendor can't see PHI it doesn't have. minimum-necessary is enforced by physics, not policy.
what doesn't change: physical safeguards on your device, the rest of your practice's HIPAA posture (training, access management within your practice, patient consent for recording), state-specific recording-consent laws, and the minimum-necessary principle as it applies to what you record.
when cloud mode is the right answer despite this
on-device isn't always the right choice. cases where cloud mode (under a signed BAA) makes more sense:
- shared workstations where multiple clinicians need access to transcripts from a central account
- older hardware that can't run on-device transcription at usable speed
- workflows where the transcript needs to be available across devices (clinician's laptop, scribe's machine, EHR system) and the cloud vendor's syncing handles the cross-device availability cleanly
- multi-clinician collaboration on the same recording (peer review, case conferences) where local-only access doesn't fit the workflow
for these, the BAA is necessary. so are the technical safeguards questions above. our cloud mode is BAA-eligible — write us if your practice needs the document and the technical-safeguards questionnaire.
what the BAA from a transcription vendor should include
if you're evaluating a transcription vendor's BAA, the shape worth looking at:
- permitted uses and disclosures — explicitly limited to providing transcription services; no separate "permitted to use for product improvement" clause
- training data prohibition — explicit clause that customer audio and transcripts will not be used for ML model training
- retention — explicit retention timeline for audio (target: deleted within 30 days of transcription completion, ideally on request)
- sub-processors — list of any sub-processors used (CDN, hosting, third-party model APIs); commitment to BAAs with each
- incident response — notification timeline (24 hours from discovery is standard; some vendors will only commit to 60 days, which is the OCR maximum); your right to audit
- termination — what happens to audio and transcripts on termination; your right to demand deletion or return
a BAA without these clauses is incomplete. some vendors will negotiate; some won't. the ones that won't are signaling something about their internal practices.
practical recommendation
for clinical audio specifically:
- solo practice, individual dictation: on-device. the compliance surface reduction is significant; the workflow fits.
- small practice with multiple clinicians sharing access to transcripts: cloud mode under a BAA, with the clauses above.
- large practice or hospital system: your EHR almost certainly already has integrated voice-to- text under existing BAAs. use that for routine dictation. on-device or cloud-mode AI transcription fills the gap for longer-form recordings (consultations, case conferences, peer reviews) that the EHR's dictation isn't designed for.
if you're stuck on the BAA question, the underlying frame is: the BAA is a contract that depends on the vendor's practices matching the contract. on-device removes the dependency. for some workflows that's worth a lot. for others, the BAA is fine.