1.2.2 Captions (Prerecorded)
Captions must be provided for all prerecorded audio content in synchronized media, enabling deaf and hard-of-hearing users to access the audio information.
What this rule means
WCAG 1.2.2 mandates that prerecorded synchronized media — videos with audio tracks — must include captions. Captions are text versions of the spoken dialogue and meaningful sound effects, synchronized with the media timeline. They differ from subtitles in that captions also describe non-speech audio cues like music, laughter, or environmental sounds.
This criterion applies to all prerecorded video content that includes an audio track. It does not apply to audio-only or video-only content (those are covered by 1.2.1) or live content (covered by 1.2.4).
Why it matters
Approximately 466 million people worldwide have disabling hearing loss. Without captions, these users are completely excluded from video content. Captions also benefit people watching in sound-sensitive environments, non-native speakers, and users with cognitive disabilities who process written text more effectively.
From a legal standpoint, captioning is one of the most commonly cited accessibility requirements. Failure to provide captions has been the basis for numerous accessibility lawsuits, particularly in education and entertainment.
Captions are not optional — they are an essential part of any video content strategy and a legal requirement in most jurisdictions.
Related axe-core rules
- video-caption — Ensures <video> elements have a <track> element with kind="captions".
Automated scanning detects the absence of caption tracks but cannot evaluate caption quality, synchronization accuracy, or completeness. Always complement automated testing with manual review.
How to test
- Play each prerecorded video on the page and enable captions.
- Verify that captions are synchronized with the spoken audio within 1-2 seconds.
- Confirm that all dialogue is accurately transcribed, including speaker identification when multiple speakers are present.
- Check that meaningful non-speech sounds (e.g., [door slams], [phone rings], [soft music]) are described.
- Ensure captions do not obscure important visual content and are readable against the video background.
- Run axe-core or a similar automated tool to flag any <video> elements missing a caption track.
How to fix
The standard approach is to use the HTML <track> element with a WebVTT caption file:
<video controls>
<source src="/product-demo.mp4" type="video/mp4" />
<track
kind="captions"
src="/product-demo-en.vtt"
srclang="en"
label="English Captions"
default
/>
</video>
A WebVTT file follows this format:
WEBVTT
1
00:00:01.000 --> 00:00:04.500
[Upbeat intro music]
2
00:00:05.000 --> 00:00:08.200
Welcome to our product demo.
Today we will walk through the dashboard.
3
00:00:09.000 --> 00:00:12.800
<v Sarah>Let me show you the analytics panel.
4
00:00:13.500 --> 00:00:16.000
[Mouse click sound]
Here you can see real-time data.
For dynamically loaded video players (e.g., custom React players), ensure the caption track is programmatically associated:
const video = document.querySelector("video");
const track = video.addTextTrack("captions", "English", "en");
track.mode = "showing";
track.addCue(new VTTCue(0, 4.5, "[Upbeat intro music]"));
track.addCue(new VTTCue(5, 8.2, "Welcome to our product demo."));
Common mistakes
- Relying solely on auto-generated captions without human review — accuracy rates can be as low as 60-70%.
- Omitting speaker identification when multiple people are talking.
- Failing to caption non-speech audio such as music, sound effects, or silence used for dramatic effect.
- Using open captions burned into the video with poor contrast or tiny fonts.
- Providing captions only in one language when the video audience is multilingual.
- Setting caption timing too fast for comfortable reading speed (aim for 3 words per second maximum).