EquitableDocs Document Accessibility Guide

PDF/UA and the Matterhorn Protocol

Overview

Four standards do four different jobs

Four standards sit behind an accessible PDF, and each does a different job.

ISO 32000 is the PDF file format itself. It defines how text, images, fonts, and structure are stored in a file. It makes a PDF possible, but it requires no accessibility features.

PDF/UA, which is ISO 14289, is the accessibility profile for PDF. It takes the raw format and says exactly how to build the file so it is usable by everyone: tag the headings, give images alternative text, set a real title, declare the language.

The Matterhorn Protocol turns the PDF/UA rules into testable failure conditions. Instead of "the document shall have a title," it asks "does a title exist, is it empty, is it just the file name," so a person or a piece of software can actually check.

WCAG, the Web Content Accessibility Guidelines from the W3C, defines what the result must achieve for a real reader. It is broader than PDF and applies to all digital content.

A house makes the relationship concrete. ISO 32000 is the catalogue of raw materials: what lumber, nails, and glass are. PDF/UA is the building code: every entrance must have a ramp, every staircase a handrail, every doorway wide enough for a wheelchair. The Matterhorn Protocol is the inspector's checklist: it turns the code into measured tests, a ramp slope of no more than one to twelve, checked with a level. WCAG is the livability standard: can a person in a wheelchair actually cook in the kitchen and reach the cabinets. It is not about houses specifically; it applies to apartments, offices, and parks alike.

A PDF can pass an automated check and still fail a reader

A university posts a PDF syllabus on its website. The automated checker returns a green light. The file has a title in its metadata, every image has alternative text, and every paragraph is tagged. But when a screen reader user opens it, the title reads as "Document." Every image is described as "image." The reading order jumps from the course name on page one to a footer on page three, then back to the description, because a floating text box confused the software that created the file.

The file passed the machine check. It failed the reader. Machine checking and accessibility are not the same thing.

In depth

Reading the chain from the file up to the reader

Think of the four standards as a chain, with the raw file at the bottom and the human reader at the top.

At the bottom is ISO 32000. This is the PDF file format itself. It defines how to store text, images, fonts, and structure in a file. It is like the grammar of a language. It tells you how to form a sentence, but it does not tell you what the sentence must mean, or whether it is true.

PDF/UA, which is ISO 14289, profiles the PDF format. It says: if you want to call this file accessible, you shall use the structure tree, you shall tag headings as headings, you shall provide alternative text for images, and so on. It is the accessibility dialect of the PDF language. PDF/UA is the PDF specific means of reaching an accessible outcome.

The Matterhorn Protocol takes every "shall" in PDF/UA and asks: how would you know if this requirement failed? It lists 31 checkpoints and 136 specific failure conditions. This is what makes the standard practical for auditors and software developers. Without Matterhorn, a standard that says "the document shall be readable" would be too vague to test. Matterhorn turns that into concrete questions about the file.

WCAG is the separate, broader standard that asks: can a real person with a disability actually use this content? PDF/UA conformance covers WCAG for the PDF's own page content, but WCAG adds a few things that PDF/UA does not test. Colour contrast is the main example. A PDF can be fully compliant with PDF/UA and still have text that is too faint for a reader with low vision to see, because WCAG's contrast rules are not part of the PDF/UA test set.

Imagine an invoice. ISO 32000 lets you place the words "Invoice," the table of items, and the total anywhere on the page, in any order, with no structure at all. PDF/UA says you must tag "Invoice" as a heading, the table as a table with header rows, and the total as a paragraph. Matterhorn says "if the table lacks header row tags, that is a failure condition." WCAG says "if the text is light grey on white, a person with low vision cannot read the total, even if the tags are perfect."

A PDF has two layers. The first layer is what you see on the screen. The second layer is the structure tree, sometimes called the tag tree. This tree is invisible to a sighted reader, but it is the only thing a screen reader sees. In the visual layer, a designer might place a sidebar note at the top of a page using a text box. In the tag tree, that text box might appear last, after the main article, because the software that created the PDF wrote it to the file in that order. A machine can verify that the tag tree exists and that every item is labeled. Only a person can verify that the order in the tag tree matches the logical reading order a human expects.

The three things PDF/UA actually constrains

PDF/UA constrains three things: the file itself, the software that reads it, and the assistive technology that translates it for the user.

The file must contain a correct structure tree, proper tags, readable fonts, and accurate metadata. A concrete file failure is a font that is embedded but subsetted incorrectly. The letter "e" might draw correctly on screen, but its internal code might map to an unmapped character. When a screen reader reads it, the word "electric" becomes a string of gibberish or silence. A machine can detect that the Unicode mapping is missing. That is a machine checkable failure.

The reader software must respect those tags and present the content in the right order. A concrete software failure is a PDF viewer that ignores the tag tree entirely and reads text in the raw drawing order. The user hears the page footer before the page header because the footer was drawn last. This is not a failure of the file. It is a failure of the software. PDF/UA constrains the software by requiring that it support certain features, but the standard cannot force every software developer to comply.

The assistive technology must be able to extract the text and structure correctly. A concrete assistive technology failure is a screen reader that does not recognize table header attributes. The user can hear the table cells, but cannot ask "what is the header for this column?" because the screen reader never read the header tags. Again, this is outside the file itself.

If any one of these three fails, the reader with a print disability has a problem. A perfectly tagged file is still useless if the screen reader cannot read it. This is why PDF/UA is not just about the file. It is about the whole system.

Matterhorn turns shall into here is how it fails

PDF/UA is written in the language of standards. It says things like "the document shall contain a title." The Matterhorn Protocol turns that sentence into a testable question.

For example, PDF/UA says the document shall contain a title. Matterhorn checkpoint 06 asks: does the document have a title in its metadata dictionary? Is the title empty? Is it identical to the file name rather than describing the document? Each of these questions becomes a failure condition. This is one of 31 checkpoints.

Across all 31 checkpoints, Matterhorn defines 136 failure conditions. Of these, 87 are realistically machine checkable. Software can look at the file and give a clear yes or no. Another 47 need human judgement. A person must look at the content and decide. Two conditions have no defined test at all.

This split is best practice guidance, not an absolute rule. Some tools automate more than others, and some conditions that look machine checkable still need a person to verify the result. For example, a machine can check whether a required tag exists. But if the standard says the tag must be used correctly, the line between machine check and human judgement becomes blurry.

Where the machine stops and a person starts

A machine can check the plumbing. Only a person can check the meaning.

Consider checkpoint 06, the document title. Software can open the file and confirm that a title exists in the metadata. That is a plumbing check. But only a person can judge whether that title clearly identifies the document. If your document title is "Document," the machine sees a nonempty string and may report success. A human sees a syllabus that should be titled "Introduction to Biology, Fall 2026." The file passes the plumbing and fails the reader.

Consider checkpoint 13, which governs figures and images. Software can confirm that a figure has alternative text attached. That is a plumbing check. But only a person can judge whether that text correctly describes the figure. A chart whose alternative text says "image" has alt text, but it fails the reader. Only a person can also judge whether the thing is even a meaningful figure, or just a decorative border that should have been marked as an artifact. A machine cannot know whether a line across the top of the page is a meaningful graphic or just visual decoration.

Consider checkpoint 14, which governs headings. Software can confirm that a heading tag exists. But only a person can judge whether the tag is on real heading text, or on a normal sentence that happens to be in bold type. A machine sees the tag "Heading" and is satisfied. A human sees the text "Click here for more information" tagged as a heading and knows it is wrong. Conversely, a human sees a real heading, "Course Schedule," formatted in bold but tagged as a normal paragraph, and knows the tag is missing.

Many of the 47 human judgement conditions defer to one master idea: whether a tag is semantically appropriate for the content it wraps. A machine can see that a tag says "Heading." Only a person can decide whether that tag belongs on a real heading, or on a paragraph that just happens to be in large type. This question, "is this the right tag for the meaning," is why human checking never goes away.

Why veraPDF green and PAC green do not mean the same thing

Different tools implement different subsets of the 87 machine checkable conditions. This means a pass from one tool is not the same as a pass from another.

veraPDF implements only the machine verifiable subset by design. It is an open source validation engine. A veraPDF pass means "no machine detectable PDF/UA failures." It does not mean the document is accessible. It does not check whether your alternative text is accurate, whether your headings are real headings, or whether your reading order makes sense. veraPDF will pass a document where every image has alt text that reads "image001.png" because the plumbing is present.

PAC runs the machine checks and then gives a person tools for the human checks: a screen reader preview, a structure tree view, and other inspections. PAC green is a stronger claim than veraPDF green, but even PAC cannot automate the 47 human judgement conditions. It can only make them easier to evaluate. PAC might flag the "image001.png" alt text in its human review panel, though the machine check tab may still show green.

Neither tool, on its own, equals conformance. Conformance requires that all 136 failure conditions are met, and some of those can only be judged by a person. When you see a report that says "passed," you must ask: passed which checks, in which tool, and who checked the meaning?

Reference detail

The four standards by their identifiers

Standard Identifier Role
PDF 1.7 ISO 32000-1:2008 The file format itself
PDF/UA-1 ISO 14289-1 The accessibility profile for PDF
WCAG 2.0 ISO/IEC 40500:2012 The user outcome standard
Matterhorn Protocol 1.1 PDF Association application note, April 2021 Testable failure conditions for PDF/UA-1

Matterhorn Protocol 1.1, published April 2021, is not an ISO standard. It is an application note published by the PDF Association.

The 31 checkpoints, grouped by what they govern

Group Checkpoints What they govern
Tagged content and structure 01 Real content tagged, 02 Role mapping, 09 Appropriate tags, 20 Optional content, 30 XObjects whether content is tagged, with valid mapped tags in a sound structure tree
Document setup, metadata, language 06 Metadata, 07 Dictionary, 11 Declared natural language, 27 Navigation the document title and metadata, the declared language, and bookmarks
Text and fonts 10 Character mappings, 12 Stretchable characters, 31 Fonts recoverable text, embedded fonts, and Unicode mapping
Content types 13 Graphics, 14 Headings, 15 Tables, 16 Lists, 17 Mathematical expressions, 18 Page headers and footers, 19 Notes and references the structure elements that carry meaning
Interactive content and media 21 Embedded files, 22 Article threads, 23 Digital signatures, 24 Non-interactive forms, 25 XFA, 28 Annotations, 29 Actions forms, links and annotations, actions, attachments, and signatures
Perception and safety 03 Flickering, 04 Colour and contrast, 05 Sound, 08 OCR validation, 26 Security flashing, colour, audio, scanned text, and security that must not block assistive technology

The checkpoints cited most often in audits are 13 Graphics, 14 Headings, 15 Tables, 16 Lists, and 31 Fonts. Checkpoint 09, Appropriate tags, governs the soundness of the tag structure, including how tables and lists are nested. Checkpoint 04, Colour and contrast, exists in Matterhorn but contrast itself is largely judged by the WCAG rules rather than the PDF/UA machine set.

How to read a Matterhorn failure condition reference

Each failure condition in Matterhorn 1.1 can be referenced in two ways.

The index form is a simple number: for example, 01-006. This means checkpoint 01, failure condition 006.

The clause reference form ties the failure directly to PDF/UA-1: for example, UA1:7.1-1. This means PDF/UA-1, clause 7.1, paragraph 1.

Each condition also carries a scope and a marking. The scope tells you where to look: the whole document, a single page, or a single object. The marking tells you whether the condition is realistically machine checkable, needs human judgement, or has no defined test.

The machine or human marking is guidance, not a rigid boundary. A skilled auditor with advanced tools may automate some conditions that are marked for human judgement, and some machine marked conditions still benefit from a spot check.

The conditions that can never be automated

Two of the 136 failure conditions have no defined test. One relates to digital signatures. The other relates to navigation.

These two conditions exist because the PDF Association could not define a single test that would reliably pass or fail every case. A human auditor must evaluate them based on context and intent. For example, whether a digital signature interface is accessible depends on how the specific signature technology interacts with the reader software, and no single file based test can cover all implementations.

For the other 47 human judgement conditions, the common thread is semantic appropriateness. A machine can confirm that a tag exists and is syntactically valid. It cannot confirm that the tag matches the meaning of the content it wraps. This is why the question "is this the right tag for the meaning" recurs throughout the protocol.

Authoritative sources


  1. PDF Association, "The Matterhorn Protocol 1.1" https://pdfa.org/resource/the-matterhorn-protocol/ 2021 

  2. International Organization for Standardization, ISO 14289-1:2014 (PDF/UA-1) 2014 

  3. International Organization for Standardization, ISO 32000-1:2008 (PDF 1.7) 2008 

  4. W3C, "Web Content Accessibility Guidelines (WCAG) 2.0" https://www.w3.org/TR/WCAG20/ 2008 

  5. veraPDF Consortium, "veraPDF Documentation" https://docs.verapdf.org/ 2015 

  6. axes4, "PDF Accessibility Checker (PAC)" https://pac.pdf-accessibility.org/ 2024 

  7. PDF Association, "PDF/UA" https://www.pdfa.org/pdfua/ 2024