Skip to main content
Perceivable WCAG 1.2.5

1.2.5 Audio Description (Prerecorded)

Audio description must be provided for all prerecorded video content in synchronized media, describing important visual information not available from the audio track alone.

Level AA Serious WCAG 2.0 (new) WCAG 2.1 WCAG 2.2

What this rule means

WCAG 1.2.5 is a Level AA escalation of criterion 1.2.3. While 1.2.3 allows either an audio description OR a full media alternative, 1.2.5 specifically requires an audio description track for prerecorded synchronized media. A text alternative alone does not satisfy this criterion.

Audio description inserts narration into natural pauses in the dialogue to describe significant visual content — settings, actions, body language, graphics, and scene changes — that is not otherwise communicated through the existing audio.

Why it matters

For blind and low-vision users, audio description transforms an incomplete audio experience into a comprehensive one. Without it, they may hear dialogue but miss the visual context that gives that dialogue meaning. For example, in a training video, a speaker might say "as you can see here" while pointing at a chart — without audio description, the chart content is lost.

Audio description also benefits users with cognitive disabilities who may find it easier to process information when visual content is reinforced verbally.

Related axe-core rules

No axe-core rules specifically test for audio description presence. Automated tools cannot determine whether important visual content exists that requires description. Manual testing is essential for this criterion.

How to test

  1. Identify all prerecorded videos with synchronized audio on the page.
  2. Determine whether the video contains important visual information not conveyed through dialogue or narration.
  3. Check for an audio description track in the video player.
  4. Play the video with audio description enabled and verify it describes key visual content during natural pauses.
  5. Confirm the audio description does not conflict with or obscure the primary audio track.
  6. Verify the audio description covers scene changes, on-screen text, charts, and significant actions.

How to fix

Create a secondary audio track that includes the original audio plus narrated descriptions. Use the <track> element for delivery:

<video controls>
  <source src="/demo-video.mp4" type="video/mp4" />
  <track
    kind="captions"
    src="/demo-captions.vtt"
    srclang="en"
    label="English Captions"
    default
  />
  <track
    kind="descriptions"
    src="/demo-audiodesc.vtt"
    srclang="en"
    label="Audio Description"
  />
</video>

<!-- Alternative: provide a separate version with baked-in description -->
<p>
  <a href="/demo-video-described.mp4">
    Watch version with audio description
  </a>
</p>

When creating audio descriptions, follow these scripting guidelines:

  • Describe only what is visually apparent — do not interpret or editorialize.
  • Use present tense: "Sarah walks to the whiteboard" not "Sarah walked to the whiteboard."
  • Identify speakers and new characters when they first appear.
  • Describe on-screen text verbatim or summarize if time is limited.
  • Prioritize information that is essential to understanding the content.

Example audio description script snippet:

WEBVTT

1
00:00:02.000 --> 00:00:05.000
A woman in a blue blazer stands at a podium
in a modern conference room.

2
00:00:12.000 --> 00:00:15.500
A slide reads: "Q3 Revenue: $4.2 million,
up 18% from Q2." A green upward arrow
is displayed next to the figure.

3
00:00:28.000 --> 00:00:31.000
She gestures to a pie chart showing
market share by region: Americas 45%,
Europe 30%, Asia 25%.

Common mistakes

  • Offering only a text transcript instead of an actual audio description track at Level AA.
  • Cramming too much description into short pauses, making the narration rushed and hard to follow.
  • Omitting description of on-screen text, assuming viewers can read it.
  • Using inconsistent terminology for recurring visual elements across the description.
  • Failing to describe who is speaking when camera angles change.

Resources