what makes podium audio hard
a press conference is a hostile audio environment for any generic transcription tool:
- too many speakers. the principal, a press secretary or comms aide, and 6–15 reporters asking questions. most tools cap at 8 speakers or guess wildly past that.
- identification by role, not by name. you don't know the AP reporter's name, but you know it's the AP reporter. labels need to be "the AP reporter," "the FOX reporter," "the Reuters stringer" — not "speaker 7," "speaker 11."
- distance and overlap. reporters shouting over each other, the principal interrupting, partial questions cut off by the next one. the transcript will have errors. the question is how fast you can find and fix them.
- deadline. wire desks file in 30–45 minutes. the transcript is a tool, not a deliverable. it has to get you to publishable quotes faster than retyping from listening would.
the workflow
- upload the audio. mp3 or wav from a recorder, a phone, the press pool feed, or the c-span rip. up to 5 GB per file. for a 45-minute briefing the upload is under a minute on hotel wifi.
- transcription runs. on a 45-minute conference, the first pass is ready in 2–3 minutes (cloud mode). on-device private mode is available for embargoed briefings — same speed, audio never leaves your laptop.
- bulk-fix speaker labels by role. this is the killer step. the editor groups every turn by speaker. you scrub the first 10 seconds of each cluster, decide who it is, type the role label once. "speaker 1" becomes "the senator." "speaker 2" becomes "the press secretary." "speaker 7" becomes "the AP reporter." every row in the transcript updates at once.
- fix proper nouns globally. committee names, agency acronyms, the principal's district, names of people referenced. fixed once, propagated everywhere. fixed names persist for follow-up briefings — the next conference with the same principal doesn't re-litigate "is it Stephen or Steven."
- pull quotes for the lede. highlight the lines you'll quote in the story. each highlighted quote exports with its timestamp.
- verify each quote before filing. click any word in the transcript — hear the principal say it. this is the fact-check step that prevents the misquote-and-correction cycle. on a 6-quote story the verification pass takes 90 seconds.
- export. plain text for the CMS, .docx for the editor's track-changes pass, json if your newsroom has a custom story-builder tool.
bulk speaker labels: the killer feature
most transcription tools force you to fix speaker labels turn-by-turn. on a 45-minute briefing with 12 speakers and 80 turns, that's 80 individual edits. the deadline doesn't allow it. our editor groups every turn by detected speaker first, lets you label the cluster once, and propagates the label to every turn in that cluster. 12 speakers means 12 edits, not 80.
when the auto-detection mis-clusters two reporters as one speaker (it happens — two voices with similar pitch and equal mic distance), the cluster view makes it obvious. split with one click, label the two halves separately, move on.
quote verification before publication
every word in the transcript is linked to its second of audio. before you file, work through the quoted lines in your story and click the first word of each. the audio plays from that exact second. you hear the principal actually say the words you're attributing. this catches:
- homophones the model got wrong ("their" vs "there" doesn't matter for a story, but "I will" vs "I won't" ends a career).
- partial quotes where the model dropped a hedge ("we are considering" became "we are doing").
- attribution errors where the cluster split misfired and a reporter's question got tagged to the principal.
this verification step is non-negotiable for any quote that will appear in print. the editor makes it fast enough that you do it under deadline.
private mode for embargoed briefings
background briefings, embargoed lock-up sessions, off-camera gaggles where the audio is not for distribution — run those in private mode. audio stays on your laptop, transcript stays on your laptop, the export is local, no vendor in the chain. same speed as cloud mode on a recent macbook. see /private-transcription/journalism for the privacy posture in detail.
pricing for deadline work
$0.25 per minute. a 30-minute briefing is $7.50. a 60-minute conference is $15. private mode and cloud mode are the same price. no subscription, no minimum, no per-seat fee. for wire desks and political beats with steady volume, batch pricing arrives after launch — write hello@audiohighlight.com and tell us your shape.
waitlist signups get the lifetime deal: first month free, 50% off forever after.