EquitableDocs Document Accessibility Guide

What automated checking can and cannot find

Overview

A clean automated result means no machine-found problems, not an accessible document

An accessibility tool ran over your document and returned a green result. That is good news, but it is a narrow piece of news. It means the tool looked for the problems it knows how to find by itself and did not find any. It does not mean a reader with a print disability can use your document.

Roughly two-thirds of the things that have to be checked in a PDF can be checked by software. The rest need a person to look and decide. So a clean automated result tells you that the part a machine can judge is in order. It says nothing about the part only a person can judge.

A "print disability" is any condition that makes standard printed or on-screen text hard to use, for example blindness, low vision, or dyslexia. A "screen reader" is software that reads a document aloud or sends it to a braille display.

The machine checks the plumbing, a person checks the meaning

Software is very good at checking whether the required parts of a document are present and correctly built. Is there a title? Does every image have alternative text attached? Are the paragraphs tagged? These are real, important checks, and a machine can answer them quickly and reliably.

What software cannot do is judge whether those parts mean the right thing. A title can exist and still not say what the document is. Alternative text can be present and still describe the wrong thing, or nothing useful at all. The machine confirms the parts are there. A person confirms the parts make sense to a reader.

A green light from a tool is the start of the check, not the end of it.

In depth

The 87, 47, and 2 split

The Matterhorn Protocol is a document from the PDF Association, a trade body for PDF software makers. It takes the rules of PDF/UA, the accessibility standard for PDF, and turns each rule into a specific failure condition that someone can actually test. For more on how the standards fit together, see the topic on PDF/UA and the Matterhorn Protocol.

Matterhorn version 1.1 lists 31 checkpoints and, inside them, 136 failure conditions. Each failure condition is one precise way a document can fall short. The protocol sorts these 136 conditions by who can realistically check them:

  • 87 are realistically machine-checkable. Software can open the file and give a clear yes or no.
  • 47 need human judgement. A person has to read the content and decide.
  • 2 have no defined test at all.

So when a tool reports a clean result, the most it can be telling you is that the 87 machine-checkable conditions are met. The 47 that need a person, and the 2 with no test, are still open. That is why two-thirds checked still leaves a third unchecked.

This split is best-practice guidance, not an absolute rule. A skilled auditor with advanced tools may automate part of a condition that is marked for human judgement, and some machine-marked conditions still deserve a human spot-check. The numbers describe the realistic boundary, not a fixed wall.

A title that exists versus a title that identifies the document

Here is the boundary in its simplest form, using the document title.

PDF/UA requires every document to have a title set in its metadata, the hidden information stored inside the file. Matterhorn checkpoint 06 covers this. Software can open your file and confirm that a title field is present and is not empty. That is a machine check, and it passes or fails cleanly.

Now suppose your document is a biology syllabus, and the title field contains the word "Document." The machine sees a non-empty string of text and reports success. A screen reader user opens the file and hears "Document," which tells them nothing. The title that should be there is something like "Introduction to Biology, Fall 2026." Only a person can look at the title and judge whether it clearly identifies the document. The machine checked that a title exists. A person has to check that the title means something.

Alt text that is present versus alt text that is correct

The same boundary appears with images, and it is the example you will meet most often.

PDF/UA requires meaningful images to carry alternative text, a short written description that a screen reader reads aloud in place of the image. Matterhorn checkpoint 13 covers graphics. Software can confirm that a figure has alternative text attached to it. That is a machine check.

Suppose your document has a chart showing enrolment rising over five years, and its alternative text reads "image001.png," the original file name. The machine sees that alternative text is present and reports success. A screen reader user hears "image001.png" and learns nothing about the enrolment trend. The alt text is present but not correct. Only a person can read the description and judge whether it actually conveys what the chart shows. A machine cannot even tell whether the image is meaningful at all, or whether it is a decorative border that should have been marked as an artifact, the label for content a screen reader is meant to skip. That decision needs a human eye.

Where machine checking stops

Across the 47 human-judgement conditions, one idea comes up again and again: whether a tag fits the meaning of the content it wraps. A "tag" is a label in the document's structure that says what a piece of content is, for example a heading, a list, or a table. The machine can confirm that a tag exists and is the right kind of object. It cannot confirm that the tag belongs on that content.

Consider a heading. Software can confirm that a heading tag is present. It cannot tell whether the tag sits on a real heading like "Course Schedule," or on an ordinary sentence that happens to be in bold type. Nor can it tell when a real heading has been left as a plain paragraph and so is missing its tag. A reader navigating by headings will feel both mistakes at once: false headings that lead nowhere, and real headings that the screen reader never announces. The tag is present and valid to a machine, and wrong to a person.

The 2 conditions with no defined test sit even further out. The PDF Association could not write a single test that would reliably pass or fail every case, so a human auditor has to evaluate them from context. These are not failures of the tools. They are places where no general test exists.

Why veraPDF green and PAC green are different claims

Two tools come up constantly, and it helps to know exactly what each green result claims.

veraPDF is an open-source validation engine. By design, it implements only the machine-verifiable part of the standard. A veraPDF pass means "no machine-detectable PDF/UA failures." It does not check whether your alt text is accurate, whether your headings are real headings, or whether your reading order makes sense. veraPDF will happily pass a document whose every image has alt text reading "image001.png," because the required part is present and that is all veraPDF set out to check.

PAC, the PDF Accessibility Checker from axes4, runs the machine checks and then goes one step further: it gives a person tools to do the human checks. It includes a screen-reader preview that shows how the document would be read aloud, and a structure-tree view that shows the tag order a screen reader follows. PAC green is a stronger claim than veraPDF green, because PAC supports the human review as well as the machine pass. But even PAC cannot decide the 47 human-judgement conditions for you. It can only make them faster and clearer to evaluate. A person still has to look.

There is a further wrinkle. Different tools implement different amounts even of the 87 machine-checkable conditions. So a pass from one tool is not the same statement as a pass from another, and neither one, on its own, equals conformance. Conformance means all 136 failure conditions are met, and some of those can only be judged by a person. When a report says "passed," the honest follow-up questions are: passed which checks, in which tool, and who checked the meaning?

Reference detail

The 136 failure conditions by who can check them

Group Count What it means for a result
Realistically machine-checkable 87 Software can return a clear pass or fail without a person
Needs human judgement 47 A person must read the content and decide
No defined test 2 No general test exists; a human auditor evaluates from context
Total failure conditions 136 Across 31 checkpoints in Matterhorn 1.1

The split is best-practice guidance, not a rigid boundary. The Matterhorn 1.1 figures are 87 machine-checkable, 47 human-judgement, and 2 with no defined test.

The two paired examples side by side

Checkpoint What the machine can confirm What only a person can confirm
06 Metadata (the title) A title exists and is not empty The title clearly identifies the document
13 Graphics (alt text) Alternative text is present on the figure The alternative text correctly describes the figure

What a green result from each tool actually claims

Tool What it runs What a green result means What it does not mean
veraPDF The machine-verifiable subset only, by design No machine-detectable PDF/UA failures That the document is accessible
PAC (axes4) The machine checks, plus a screen-reader preview and structure-tree view for the human checks The machine checks pass and human-review tools are provided That the 47 human-judgement conditions are met automatically

Different tools implement different amounts even of the 87 machine-checkable conditions, so a pass from one tool is not the same claim as a pass from another, and neither equals conformance.

The standards behind the split, by identifier

Standard Identifier Role in this topic
PDF 1.7 ISO 32000-1:2008 The PDF file format being checked
PDF/UA-1 ISO 14289-1:2014 The accessibility profile whose rules are tested
WCAG 2.0 ISO/IEC 40500:2012 The reader-outcome standard; the current WCAG version is 2.2 (2023), and the ISO mapping covers 2.0
Matterhorn Protocol 1.1 PDF Association application note, 2021-04 The source of the 31 checkpoints and 136 failure conditions

PDF/UA conformance covers WCAG for the PDF's own page content, but WCAG adds checks PDF/UA does not test, colour contrast being the main one. So even a full machine-and-human PDF/UA pass does not on its own confirm WCAG colour-contrast conformance.

Authoritative sources


  1. PDF Association, "The Matterhorn Protocol 1.1" https://pdfa.org/resource/the-matterhorn-protocol/ 2021 

  2. PDF Association, "PDF/UA in a Nutshell" https://pdfa.org/resource/pdfua-in-a-nutshell/ 2024 

  3. W3C, "Web Content Accessibility Guidelines (WCAG)" https://www.w3.org/WAI/standards-guidelines/wcag/ 2023 

  4. W3C, "Understanding WCAG" https://www.w3.org/WAI/WCAG21/Understanding/ 2023 

  5. veraPDF Consortium, "veraPDF Documentation" https://docs.verapdf.org/ 2015 

  6. axes4, "PDF Accessibility Checker (PAC)" https://pac.pdf-accessibility.org/ 2024 

  7. International Organization for Standardization, ISO 14289-1:2014 (PDF/UA-1) 2014