the problem
NVivo (and ATLAS.ti, and MAXQDA) expects transcripts in a specific schema: timestamps in hh:mm:ss, a speaker column, the utterance, and ideally the speaker turn identified row by row. CSV is the cleanest import path; .docx works if the formatting is right.
most transcription tools produce a generic .docx with paragraph breaks and bold speaker labels. that file imports — and then the researcher spends an hour rebuilding the timestamps, splitting the paragraphs into per-turn rows, and tagging the speaker column. for a study with twenty interviews, that hour stacks up.
search "nvivo transcript import format" and you get NVivo's own help docs (helpful, but documentation about the format — not a tool). a lone GitHub project (Teams2NVivo) confirms the demand exists by addressing a sliver of it. nothing on the first page produces a generic transcript in NVivo-shape from audio. that's the gap.
what we ship
a transcription pass that emits NVivo-shaped output directly:
- per-row timestamp in the format NVivo expects (
hh:mm:ss) — every speaker turn is a row, not a paragraph. - speaker column from diarization. labels you fix in bulk in the editor propagate through every row for that speaker.
- utterance column with sentence-boundary segmentation that doesn't split mid-thought. mid-utterance pauses preserved as ellipses where the audio actually pauses, not where the model thinks grammar wants it.
- CSV and .docx export both NVivo-import-ready. the CSV uses the column headers NVivo's import wizard expects; the .docx uses the styled paragraph format their auto-coding step parses.
- profiles for ATLAS.ti and MAXQDA arriving after launch. same engine; different export target. (the underlying word-level-timestamps model is shared; the formatter is a thin layer.)
workflow
- drop the interview audio. transcription runs and produces a first-pass transcript with diarization.
- review in the browser editor. fix speaker labels in bulk — "Speaker 1" becomes "P03" once and propagates through every row. fix proper nouns as they appear; the model learns study-specific vocabulary across files.
- export to NVivo CSV (or ATLAS.ti / MAXQDA equivalent). open it in NVivo. code it.
where this fits
- fits: semi-structured interviews, focus groups, ethnographic field recordings. anywhere the unit of analysis is the speaker turn and the timestamp matters for back-reference to audio.
- doesn't fit (yet): video data with frame-level coding. our export is audio-anchored. video-anchored coding requires the transcript to know about the video, which is on the roadmap but not at launch.
privacy
for IRB-restricted recordings or any audio where data residency matters — and for fieldwork on subjects whose institutional review prohibits cloud processing — run the file in private mode. the NVivo export works identically on local transcription; the audio just stays on your laptop.
citation
when you publish work that used this tool, citation language and a methodological footnote ("transcribed using audiohighlight, version X, NVivo profile, on date Y") are on the about page. we keep version numbers stable for replication.