Time-based media carries meaning in two channels at once: what you hear and what you
see. A person who can’t hear the audio needs the spoken words and meaningful sounds in
text; a person who can’t see the screen needs the important visual information conveyed
in audio; and many people simply prefer to read. When a video ships with no captions,
a podcast with no transcript, or a documentary whose on-screen events are never spoken
aloud, that content is closed to a large audience — and, increasingly, out of legal
compliance.
This lesson works through the three core failures of inaccessible media and the
standard, well-supported fixes for each: synchronised captions for
video, a full transcript for audio-only content, and
audio description of key visuals. None of them requires exotic
tooling — captions and descriptions ride on the same <track> element
the browser already understands.
What you’ll learn
The difference between captions, subtitles, transcripts, and audio
description, and when each is required; how to attach a synchronised caption
<track> to a video; how to provide a complete, navigable
transcript for audio-only content; and how to add audio description so blind users
receive the key visual information a sighted viewer takes for granted.
Standards this lesson maps to
Standard
Criterion
Level
What it requires
WCAG 2.2
1.2.1 Audio-only and Video-only (Prerecorded)
A
Prerecorded audio-only content has a text transcript; prerecorded video-only content has a text or audio alternative.
WCAG 2.2
1.2.2 Captions (Prerecorded)
A
Synchronised captions are provided for all prerecorded audio in video content.
WCAG 2.2
1.2.3 Audio Description or Media Alternative
A
An audio description or full text alternative is provided for prerecorded video.
WCAG 2.2
1.2.5 Audio Description (Prerecorded)
AA
Audio description is provided for all prerecorded video content.
WCAG 2.2
1.2.4 Captions (Live)
AA
Captions are provided for live audio content in synchronised media.
EN 301 549
7.1 / 7.2 / 7.3 (incorporates WCAG)
—
European standard; clause 7 covers captioning, audio description and the controls to reach them.
Section 508
502 / 503 (incorporates WCAG A & AA)
—
US federal ICT must meet WCAG 2.0 A and AA, including captions and audio description.
ADA Title II
WCAG 2.1 AA (DOJ rule)
AA
US state/local government web content must conform to WCAG 2.1 AA, including media.
The three problems we’ll fix
Each card below isolates one common media defect. For every issue you get a
plain-language statement of the problem, a Bad example (shown as
escaped, non-running code so it can’t harm this page), a Good example,
the copyable Code, and an ordered fix.
Video with no captions
WCAG 2.2 · 1.2.2AEN 301 549Section 508ADA Title II
A video with a soundtrack but no captions locks out
anyone who is deaf or hard of hearing, anyone in a sound-off environment, and many
people watching in a second language. Captions are not just the dialogue: they
carry speaker identification and meaningful non-speech sounds — [door slams],
[ominous music] — that a hearing viewer relies on. Auto-generated captions
are a starting point, not compliance: their accuracy is rarely high enough, and
they routinely miss punctuation, speaker turns, and sound effects. To satisfy
1.2.2 you need accurate, synchronised, human-checked captions.
Bad
The video element offers the media but no caption track. Deaf and hard-of-hearing
viewers get the picture and nothing else (1.2.2).
bad-no-captions.html
<video src="intro.mp4" controls></video>
Good
A <track kind="captions"> in a hosted WebVTT file is attached
and marked as the default. The srclang and label let
the player expose it in its caption menu.
A WebVTT caption file is plain text: a WEBVTT header, then cues
with start/end timestamps. Identify speakers and include meaningful sounds in
square brackets — these are part of the captions, not optional extras.
intro.en.vtt
WEBVTT
00:00:01.000 --> 00:00:04.000
[upbeat music]
00:00:04.500 --> 00:00:07.200
<v Dana>Welcome to the team. Let me show you around.
00:00:07.500 --> 00:00:09.000
[door buzzes]
How to fix
Write or correct a caption file (WebVTT or SRT) — start from auto-captions
but edit them to full accuracy.
Include speaker labels and meaningful non-speech sounds, not just spoken
words.
Attach it with <track kind="captions">, setting
srclang, a human-readable label, and
default where appropriate.
Use kind="subtitles" only for translations of dialogue;
captions (which include sounds) are what satisfies 1.2.2.
Test that captions stay in sync and that the player’s caption toggle is
keyboard reachable.
Audio or podcast with no transcript
WCAG 2.2 · 1.2.1AEN 301 549Section 508
For audio-only content — a podcast, an interview, a
recorded briefing — a full text transcript is the alternative, and at Level A it is
required (1.2.1). Without it the content is unavailable to deaf and hard-of-hearing
users, unreachable by anyone who can’t play sound, unsearchable, and unindexable.
A good transcript identifies who is speaking and captures meaningful non-speech
audio, so a reader gets the same information a listener does. It should live on the
page (or behind a clearly labelled link beside the player), not in a file the user
has to hunt for.
Bad
The audio player stands alone. There is no transcript and no link to one, so
the spoken content reaches only people who can hear it (1.2.1).
bad-no-transcript.html
<h2>Episode 12: Accessibility in practice</h2>
<audio src="ep12.mp3" controls></audio>
Good
A complete transcript follows the player on the same page, with speaker names
and meaningful sounds. It is real text, so it is selectable, searchable, and
readable by assistive technology.
good-transcript.html
<h2>Episode 12: Accessibility in practice</h2>
<audio src="ep12.mp3" controls></audio>
<h3>Transcript</h3>
<p><strong>Host:</strong> Welcome back to the show.</p>
<p><strong>Guest:</strong> Thanks for having me.</p>
<p>[recording pauses]</p>
Code
If the transcript is long, you can place it in a <details>
disclosure right beside the player, or link to a transcript page. Either way the
link or summary must be clearly associated with the specific audio.
transcript-details.html
<audio src="ep12.mp3" controls></audio>
<details>
<summary>Read the transcript for Episode 12</summary>
<p><strong>Host:</strong> Welcome back to the show.</p>
<!-- full transcript continues -->
</details>
How to fix
Produce a complete transcript of every audio-only file — all speech plus
meaningful non-speech sounds.
Identify each speaker so a reader can follow the conversation.
Put the transcript as real text on the page, or in a
<details> disclosure, or behind a clearly labelled link
next to the player.
Don’t deliver the transcript as an image or a scanned PDF — it must be
selectable, machine-readable text.
Keep the transcript in sync whenever the audio is re-edited.
Video with no audio description of key visuals
WCAG 2.2 · 1.2.5AA1.2.3AEN 301 549Section 508
Captions cover what is heard; audio description
covers what is seen. When important information appears only on screen —
an action, a facial reaction, on-screen text, a chart, a scene change — a blind or
low-vision viewer misses it entirely unless it is narrated. Audio description fills
the natural pauses in the dialogue with a spoken account of those key visuals. At
Level AA, 1.2.5 requires audio description for all prerecorded video; at Level A,
1.2.3 can be met with either audio description or a full text alternative (a
description document) that conveys the same information. The best long-term fix is
to write scripts and shoot with describable pauses in mind.
Bad
The video has captions for the dialogue but the on-screen action is never
described. A blind viewer hears “…and that’s the one we’ll ship” with no idea
which option was pointed at (1.2.5).
A described version is offered. The cleanest approach is a separate audio-described
rendition of the video (extra narration mixed in), exposed as a second source or
a clearly labelled alternate player.
good-described.html
<video controls>
<source src="demo.mp4" type="video/mp4">
<track kind="captions" src="demo.en.vtt" srclang="en" default>
</video>
<p>
<a href="demo-described.mp4">
Watch the audio-described version of this demo
</a>
</p>
Code
Where a described rendition isn’t available, a full text alternative — a
description document covering dialogue and key visuals — meets 1.2.3 at
Level A. The kind="descriptions" track exists for description text,
though screen-reader support for speaking it aloud is still limited.
text-alternative.html
<h3>Described transcript</h3>
<p>[Dana points to the second mock-up on the whiteboard.]</p>
<p><strong>Dana:</strong> And that’s the one we’ll ship.</p>
<p>[The screen cuts to the launch dashboard, all metrics green.]</p>
How to fix
Identify every key visual not conveyed by the soundtrack — actions,
on-screen text, charts, scene changes.
Write description for those visuals to fit the natural pauses in the
dialogue.
Produce an audio-described rendition, or — to meet 1.2.3 at Level A —
publish a full text alternative that includes the visual information.
Plan describable gaps when scripting and editing; retro-fitting description
into wall-to-wall dialogue is far harder.
Make the described version or text alternative easy to find, with a clear
link right next to the original video.
Recap
Add synchronised captions to every video with audio using a
<track kind="captions">; captions include speaker IDs and
meaningful non-speech sounds, not just dialogue (1.2.2).
Give every audio-only file a complete, navigable transcript in
the page — the only alternative for a podcast or recorded call (1.2.1).
Provide audio description of key visuals — actions, text on
screen, scene changes — so blind users get the information sighted viewers see
(1.2.5 at AA; a media alternative satisfies 1.2.3 at A).
Caption and describe your media and you satisfy WCAG, EN 301 549,
Section 508, and ADA Title II together — the same artefacts meet them all.