AccessScanRun a free scan

Guide

How to Build an Accessible Video Player (WCAG 1.2.x)

An accessible video player is one of the few components where four different disabilities collide in a single piece of UI. A Deaf user needs captions, a blind user needs audio description and a labeled play button, a keyboard user needs to reach every control without a mouse, and everyone needs the right not to be ambushed by sound. Most player libraries get one or two of these right and silently fail the rest.

This guide is for developers shipping the player, not the people producing the video. It maps the WCAG 1.2.x media criteria to concrete markup and behavior, shows where the popular shortcuts break, and gives you a checklist you can verify before merge. The examples assume HTML5 video, but the rules apply equally to a custom React or Web Component wrapper.

The five things an accessible video player must do

Strip away the styling and a conformant player has to satisfy five independent requirements. Each maps to specific success criteria, and you can fail conformance by missing any one of them even if the other four are perfect.

  • Captions for prerecorded audio (1.2.2, Level A) and live audio (1.2.4, Level AA) so dialogue and meaningful sound are readable.
  • A text alternative or audio description for prerecorded video (1.2.3, Level A), escalating to a full audio description at AA (1.2.5).
  • Controls that are fully keyboard operable (2.1.1, A) with no keyboard trap (2.1.2, A) and a visible focus indicator (2.4.7, AA).
  • An accessible name on every control (4.1.2, A) so a screen reader announces "Play, button" rather than "button".
  • No autoplay of audio longer than three seconds without a pause or mute mechanism (1.4.2, A).

Captions and audio description are different obligations, not alternatives. Captions serve people who cannot hear the audio; audio description serves people who cannot see the video. A clip that has captions but no described version of its on-screen-only information still fails 1.2.3 and 1.2.5.

Captions and transcripts: wire up a real track element

Captions must be a real, toggleable text track, not pixels burned into the frame. Burned-in text cannot be resized, restyled for contrast, translated, or turned off, and it collides with the player chrome on small viewports. Use a sidecar WebVTT file and let the browser render it:

<video controls><source src="talk.mp4" type="video/mp4"><track kind="captions" src="talk.en.vtt" srclang="en" label="English" default></video>

  • Use kind="captions" (not kind="subtitles") so non-speech cues like [door slams] and speaker labels are conveyed; subtitles assume the user can hear and only translate dialogue.
  • Set srclang and a human-readable label so the captions menu lists each language correctly.
  • Provide a transcript below the player as well. It satisfies 1.2.3 as a text alternative, is indexable for search, and is the only format usable by deafblind users on a refreshing Braille display.

If you build a custom captions renderer instead of using native rendering, confirm the caption text meets 1.4.4 Resize Text and 4.5:1 contrast against its background. Auto-generated captions from an upload pipeline are a starting draft, never a finished track: they routinely miss punctuation, speaker changes, and homophones, and unedited they do not meet 1.2.2. For the production-side detail on caption quality and audio description scripting, see our video and audio accessibility guide.

Keyboard-operable controls with real labels

The native controls attribute gives you keyboard support and labels for free. The moment you build a custom control bar you own all of it, and this is where most custom players fail an audit. Every button must be a real focusable control, reachable with Tab, operable with Enter and Space, and announced with a meaningful name.

  • Use <button> elements, not <div onclick>. A div is not in the tab order, does not fire on Space or Enter, and exposes no role to assistive tech.
  • Give icon-only buttons an accessible name with aria-label, e.g. aria-label="Play". When state changes, update the label (Play to Pause) or use aria-pressed for toggles like mute.
  • Make the scrubber a real slider: role="slider" with aria-valuemin, aria-valuemax, aria-valuenow, and aria-valuetext announcing a time like "1 minute 42 seconds" rather than a raw second count.
  • Ensure a visible focus ring on every control (2.4.7) and that focus order follows the visual layout. Never remove outlines without a replacement.

Watch for keyboard traps (2.1.2): a custom fullscreen or settings menu that captures focus and offers no Escape route will strand keyboard users. WCAG 2.2 also adds 2.5.8 Target Size (Minimum, AA), which expects controls to be at least 24 by 24 CSS pixels unless an exception applies. The deeper patterns for sliders, menus, and focus management live in our guides on keyboard accessibility and ARIA best practices.

No autoplay with sound, and audio description

WCAG 1.4.2 Audio Control is unambiguous: if audio plays automatically for more than three seconds, you must provide a way to pause or stop it, or to control its volume independently of the system. The reliable engineering answer is to never autoplay with sound. A screen reader user cannot find your hidden mute button while the auto-playing audio drowns out their speech synthesizer.

  • If a video must autoplay for design reasons (a muted hero loop), keep it muted and decorative, and still expose a pause control to respect 2.2.2 Pause, Stop, Hide for moving content.
  • Honor prefers-reduced-motion for any autoplaying background video so motion-sensitive users are not forced to watch it.

Audio description covers information shown only on screen: on-screen text, a chart pointed at silently, an actor's gesture. Where natural pauses in dialogue leave room, a standard described track (1.2.5, AA) is the goal; where they do not, an extended description or a full text alternative is required. In the player, surface description as a selectable audio track or a second video source, and expose it through the same labeled controls as captions so users can find and toggle it.

Test it before you ship

Media accessibility cannot be fully automated, but a short manual pass catches the common regressions. Run this on every player you ship:

  • Tab through the player from the page above it. Every control receives focus in logical order, shows a visible ring, and you can leave via keyboard without a trap.
  • With the mouse unplugged, play, pause, scrub, mute, change captions, and enter and exit fullscreen using only the keyboard.
  • Turn on a screen reader (VoiceOver or NVDA) and confirm each control announces a name and role, and that state changes are spoken.
  • Load the page and confirm nothing plays audio automatically. Toggle captions and the transcript and verify both match the audio.
  • Check caption and control contrast at 4.5:1 (3:1 for large text and UI components) and confirm captions still render when text is zoomed to 200%.

Automated tooling still earns its place by catching the cheap, high-volume mistakes: a missing accessible name, a div used as a button, insufficient contrast, a missing track element. Run a free scan with AccessScan to flag those across the whole page, then do the keyboard and screen-reader pass by hand. For how media fits alongside the rest of your obligations, see our accessibility checklist.

Check your site against AccessScan

See your issues ranked by impact in seconds — free.

Run a free accessibility scan

FAQ

What is the difference between captions and audio description in a video player?

Captions are time-synced text of the dialogue and meaningful non-speech audio, for users who cannot hear (WCAG 1.2.2 and 1.2.4). Audio description is an additional narration of on-screen-only visual information, for users who cannot see (1.2.3 and 1.2.5). They serve different disabilities, so a fully conformant player needs both, not one or the other.

Can I use autoplay if the video is muted?

Yes. WCAG 1.4.2 only restricts audio that plays automatically for more than three seconds. A muted autoplaying video is allowed, but you should still expose a pause control for moving content (2.2.2) and honor prefers-reduced-motion. Never autoplay with sound, because a screen reader user cannot easily find a control to stop it.

Are auto-generated captions enough to meet WCAG?

No, not unedited. Auto-generated captions are a useful first draft but routinely miss punctuation, speaker changes, technical terms, and non-speech sounds, so they do not meet 1.2.2 on their own. Treat them as a starting point and edit them for accuracy and timing before publishing.

Do I need a transcript if the video already has captions?

A transcript is strongly recommended and is often the cleanest way to satisfy 1.2.3 as a text alternative for prerecorded video. It is also the only format usable by deafblind users on a Braille display, and it is indexable for search. Captions cover hearing access during playback; a transcript covers text-alternative and Braille access.

More guides