Captions, transcripts & audio description

Time-based media carries meaning in two channels at once: what you hear and what you see. A person who can’t hear the audio needs the spoken words and meaningful sounds in text; a person who can’t see the screen needs the important visual information conveyed in audio; and many people simply prefer to read. When a video ships with no captions, a podcast with no transcript, or a documentary whose on-screen events are never spoken aloud, that content is closed to a large audience — and, increasingly, out of legal compliance.

This lesson works through the three core failures of inaccessible media and the standard, well-supported fixes for each: synchronised captions for video, a full transcript for audio-only content, and audio description of key visuals. None of them requires exotic tooling — captions and descriptions ride on the same <track> element the browser already understands.

What you’ll learn

The difference between captions, subtitles, transcripts, and audio description, and when each is required; how to attach a synchronised caption <track> to a video; how to provide a complete, navigable transcript for audio-only content; and how to add audio description so blind users receive the key visual information a sighted viewer takes for granted.

Standards this lesson maps to
Standard Criterion Level What it requires
WCAG 2.2 1.2.1 Audio-only and Video-only (Prerecorded) A Prerecorded audio-only content has a text transcript; prerecorded video-only content has a text or audio alternative.
WCAG 2.2 1.2.2 Captions (Prerecorded) A Synchronised captions are provided for all prerecorded audio in video content.
WCAG 2.2 1.2.3 Audio Description or Media Alternative A An audio description or full text alternative is provided for prerecorded video.
WCAG 2.2 1.2.5 Audio Description (Prerecorded) AA Audio description is provided for all prerecorded video content.
WCAG 2.2 1.2.4 Captions (Live) AA Captions are provided for live audio content in synchronised media.
EN 301 549 7.1 / 7.2 / 7.3 (incorporates WCAG) European standard; clause 7 covers captioning, audio description and the controls to reach them.
Section 508 502 / 503 (incorporates WCAG A & AA) US federal ICT must meet WCAG 2.0 A and AA, including captions and audio description.
ADA Title II WCAG 2.1 AA (DOJ rule) AA US state/local government web content must conform to WCAG 2.1 AA, including media.

The three problems we’ll fix

Each card below isolates one common media defect. For every issue you get a plain-language statement of the problem, a Bad example (shown as escaped, non-running code so it can’t harm this page), a Good example, the copyable Code, and an ordered fix.

Video with no captions

WCAG 2.2 · 1.2.2 A EN 301 549 Section 508 ADA Title II

A video with a soundtrack but no captions locks out anyone who is deaf or hard of hearing, anyone in a sound-off environment, and many people watching in a second language. Captions are not just the dialogue: they carry speaker identification and meaningful non-speech sounds — [door slams], [ominous music] — that a hearing viewer relies on. Auto-generated captions are a starting point, not compliance: their accuracy is rarely high enough, and they routinely miss punctuation, speaker turns, and sound effects. To satisfy 1.2.2 you need accurate, synchronised, human-checked captions.

Bad

The video element offers the media but no caption track. Deaf and hard-of-hearing viewers get the picture and nothing else (1.2.2).

bad-no-captions.html
<video src="intro.mp4" controls></video>

Good

A <track kind="captions"> in a hosted WebVTT file is attached and marked as the default. The srclang and label let the player expose it in its caption menu.

good-captions.html
<video controls>
  <source src="intro.mp4" type="video/mp4">
  <track kind="captions" src="intro.en.vtt"
         srclang="en" label="English" default>
</video>

Code

A WebVTT caption file is plain text: a WEBVTT header, then cues with start/end timestamps. Identify speakers and include meaningful sounds in square brackets — these are part of the captions, not optional extras.

intro.en.vtt
WEBVTT

00:00:01.000 --> 00:00:04.000
[upbeat music]

00:00:04.500 --> 00:00:07.200
<v Dana>Welcome to the team. Let me show you around.

00:00:07.500 --> 00:00:09.000
[door buzzes]

How to fix

  1. Write or correct a caption file (WebVTT or SRT) — start from auto-captions but edit them to full accuracy.
  2. Include speaker labels and meaningful non-speech sounds, not just spoken words.
  3. Attach it with <track kind="captions">, setting srclang, a human-readable label, and default where appropriate.
  4. Use kind="subtitles" only for translations of dialogue; captions (which include sounds) are what satisfies 1.2.2.
  5. Test that captions stay in sync and that the player’s caption toggle is keyboard reachable.

Audio or podcast with no transcript

WCAG 2.2 · 1.2.1 A EN 301 549 Section 508

For audio-only content — a podcast, an interview, a recorded briefing — a full text transcript is the alternative, and at Level A it is required (1.2.1). Without it the content is unavailable to deaf and hard-of-hearing users, unreachable by anyone who can’t play sound, unsearchable, and unindexable. A good transcript identifies who is speaking and captures meaningful non-speech audio, so a reader gets the same information a listener does. It should live on the page (or behind a clearly labelled link beside the player), not in a file the user has to hunt for.

Bad

The audio player stands alone. There is no transcript and no link to one, so the spoken content reaches only people who can hear it (1.2.1).

bad-no-transcript.html
<h2>Episode 12: Accessibility in practice</h2>
<audio src="ep12.mp3" controls></audio>

Good

A complete transcript follows the player on the same page, with speaker names and meaningful sounds. It is real text, so it is selectable, searchable, and readable by assistive technology.

good-transcript.html
<h2>Episode 12: Accessibility in practice</h2>
<audio src="ep12.mp3" controls></audio>

<h3>Transcript</h3>
<p><strong>Host:</strong> Welcome back to the show.</p>
<p><strong>Guest:</strong> Thanks for having me.</p>
<p>[recording pauses]</p>

Code

If the transcript is long, you can place it in a <details> disclosure right beside the player, or link to a transcript page. Either way the link or summary must be clearly associated with the specific audio.

transcript-details.html
<audio src="ep12.mp3" controls></audio>
<details>
  <summary>Read the transcript for Episode 12</summary>
  <p><strong>Host:</strong> Welcome back to the show.</p>
  <!-- full transcript continues -->
</details>

How to fix

  1. Produce a complete transcript of every audio-only file — all speech plus meaningful non-speech sounds.
  2. Identify each speaker so a reader can follow the conversation.
  3. Put the transcript as real text on the page, or in a <details> disclosure, or behind a clearly labelled link next to the player.
  4. Don’t deliver the transcript as an image or a scanned PDF — it must be selectable, machine-readable text.
  5. Keep the transcript in sync whenever the audio is re-edited.

Video with no audio description of key visuals

WCAG 2.2 · 1.2.5 AA 1.2.3 A EN 301 549 Section 508

Captions cover what is heard; audio description covers what is seen. When important information appears only on screen — an action, a facial reaction, on-screen text, a chart, a scene change — a blind or low-vision viewer misses it entirely unless it is narrated. Audio description fills the natural pauses in the dialogue with a spoken account of those key visuals. At Level AA, 1.2.5 requires audio description for all prerecorded video; at Level A, 1.2.3 can be met with either audio description or a full text alternative (a description document) that conveys the same information. The best long-term fix is to write scripts and shoot with describable pauses in mind.

Bad

The video has captions for the dialogue but the on-screen action is never described. A blind viewer hears “…and that’s the one we’ll ship” with no idea which option was pointed at (1.2.5).

bad-no-description.html
<video controls>
  <source src="demo.mp4" type="video/mp4">
  <track kind="captions" src="demo.en.vtt" srclang="en" default>
</video>

Good

A described version is offered. The cleanest approach is a separate audio-described rendition of the video (extra narration mixed in), exposed as a second source or a clearly labelled alternate player.

good-described.html
<video controls>
  <source src="demo.mp4" type="video/mp4">
  <track kind="captions" src="demo.en.vtt" srclang="en" default>
</video>
<p>
  <a href="demo-described.mp4">
    Watch the audio-described version of this demo
  </a>
</p>

Code

Where a described rendition isn’t available, a full text alternative — a description document covering dialogue and key visuals — meets 1.2.3 at Level A. The kind="descriptions" track exists for description text, though screen-reader support for speaking it aloud is still limited.

text-alternative.html
<h3>Described transcript</h3>
<p>[Dana points to the second mock-up on the whiteboard.]</p>
<p><strong>Dana:</strong> And that’s the one we’ll ship.</p>
<p>[The screen cuts to the launch dashboard, all metrics green.]</p>

How to fix

  1. Identify every key visual not conveyed by the soundtrack — actions, on-screen text, charts, scene changes.
  2. Write description for those visuals to fit the natural pauses in the dialogue.
  3. Produce an audio-described rendition, or — to meet 1.2.3 at Level A — publish a full text alternative that includes the visual information.
  4. Plan describable gaps when scripting and editing; retro-fitting description into wall-to-wall dialogue is far harder.
  5. Make the described version or text alternative easy to find, with a clear link right next to the original video.

Recap

  • Add synchronised captions to every video with audio using a <track kind="captions">; captions include speaker IDs and meaningful non-speech sounds, not just dialogue (1.2.2).
  • Give every audio-only file a complete, navigable transcript in the page — the only alternative for a podcast or recorded call (1.2.1).
  • Provide audio description of key visuals — actions, text on screen, scene changes — so blind users get the information sighted viewers see (1.2.5 at AA; a media alternative satisfies 1.2.3 at A).

Caption and describe your media and you satisfy WCAG, EN 301 549, Section 508, and ADA Title II together — the same artefacts meet them all.