WCAG 1.2.9: Live Audio-only Text Alternative

What this rule means

WCAG 1.2.9 requires that live audio-only content — such as live radio broadcasts, audio-only podcasts streamed in real time, conference call audio, and emergency audio announcements — provide a real-time text alternative. This is typically delivered as live captions or a real-time text stream that conveys the spoken content as it happens.

This Level AAA criterion extends the concept of live captions (1.2.4, which covers synchronized media) to audio-only live content. The key difference is that there is no video component — only audio being broadcast in real time.

Why it matters

Live audio events exclude deaf and hard-of-hearing users entirely without a text alternative. Unlike prerecorded content where a transcript can be provided after the fact, live audio is time-sensitive — the information has immediate value that diminishes or disappears önce the event is over.

Real-time text alternatives are critical for emergency communications, live news audio feeds, and interactive audio events like radio call-in shows where participation depends on understanding the content as it happens.

A post-event transcript is better than nothing, but it does not satisfy this criterion. The requirement is for real-time access during the live broadcast.

Related axe-core rules

No axe-core rules address live audio-only content. Automated tools cannot detect live audio streams or verify whether real-time text alternatives are being provided. This criterion requires testing during actual live events.

How to test

Identify all live audio-only content on the platform (live radio, audio streams, conference calls).
During a live audio event, check whether a real-time text alternative is displayed.
Evaluate the latency — text should appear within a few seconds of the spoken content.
Verify that the text accurately represents the spoken content, including speaker identification.
Check that the text alternative is accessible via screen readers and other assistive technologies.
Confirm that meaningful non-speech sounds are described in the text stream.

How to fix

Implement a live text stream alongside the audio player:

<div role="region" aria-label="Live audio broadcast with real-time transcript">
  <audio controls autoplay>
    <source src="/live-stream" type="audio/mpeg" />
    Your browser does not support the audio element.
  </audio>

  <div
    id="live-transcript"
    role="log"
    aria-live="polite"
    aria-label="Real-time transcript"
    class="live-transcript-panel"
  >
    <!-- Transcript lines injected in real time -->
  </div>
</div>

Connect to a real-time captioning service via WebSocket:

const transcriptEl = document.getElementById("live-transcript");
const ws = new WebSocket("wss://caption-service.example.com/stream");

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  const line = document.createElement("p");

  if (data.speaker) {
    const speaker = document.createElement("strong");
    speaker.textContent = `${data.speaker}: `;
    line.appendChild(speaker);
  }

  line.appendChild(document.createTextNode(data.text));
  transcriptEl.appendChild(line);

  // Auto-scroll to the latest line
  transcriptEl.scrollTop = transcriptEl.scrollHeight;
};

For a simpler approach using the Web Speech API as a real-time fallback:

const recognition = new webkitSpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;
recognition.lang = "en-US";

const transcriptEl = document.getElementById("live-transcript");
let currentParagraph = null;

recognition.onresult = (event) => {
  if (!currentParagraph) {
    currentParagraph = document.createElement("p");
    transcriptEl.appendChild(currentParagraph);
  }

  const result = event.results[event.results.length - 1];
  currentParagraph.textContent = result[0].transcript;

  if (result.isFinal) {
    currentParagraph = null;
  }
};

recognition.start();

Common mistakes

Providing only a post-event transcript instead of real-time text during the live broadcast.
Using automated speech recognition without monitoring for errors, producing unreliable text.
Not providing speaker identification in the real-time text stream.
Placing the text alternative in a location that is difficult to find or not associated with the audio player.
Failing to describe non-speech audio elements such as music, sound effects, or significant pauses.
Not making the text stream accessible to assistive technologies (missing ARIA roles or live regions).

1.2.9 Audio-only (Live)