Accessible documents & PDF

A PDF carries two layers: the visible page you see, and an invisible tag tree underneath that tells assistive technology what each piece of the page actually is — a heading, a paragraph, a list, a figure, a table cell — and in what order to read them. The visual layer is for sighted users; the tag tree is the document for everyone else. PDF/UA (ISO 14289) is the standard that defines what that tag tree must contain, and it lines up with the same WCAG criteria you apply to web pages. When the tags are missing, wrong, or describe a picture instead of text, the document falls apart for anyone not reading it with their eyes.

This lesson works through the four defects behind the majority of real-world document failures: a PDF with no tags at all, images with no alternative text, a long document with no real heading structure, and a scanned page that is only a picture of text. Each one is fixed not by patching the PDF by hand but by authoring the source correctly — in Word, InDesign, or PowerPoint — so a proper tagged PDF exports from it.

What you’ll learn

How to export a fully tagged PDF with a correct reading order instead of an untagged jumble; how to give every informative image alt text and mark decorative ones as artifacts; how to build real H1H6 heading tags and bookmarks from heading styles in the source; and why a scanned, image-only PDF needs OCR plus tagging — or an accessible alternative — before anyone using assistive technology can read it.

Standards this lesson maps to
Standard Criterion Level What it requires
PDF/UA-1 ISO 14289-1 A PDF must be fully tagged, with a logical structure tree and reading order, real text, and described images.
PDF/UA-2 ISO 14289-2 (PDF 2.0) The updated edition for PDF 2.0, with refined tag semantics, namespaces, and associated files.
WCAG 2.2 1.1.1 Non-text Content A Informative images carry a text alternative; decorative images are marked as artifacts so they’re ignored.
WCAG 2.2 1.3.1 Info and Relationships A Headings, lists, tables, and reading order are conveyed by tags, not by visual layout alone.
WCAG 2.2 2.4.1 Bypass Blocks A Heading tags and bookmarks let users skip ahead and navigate a long document by structure.
WCAG 2.2 2.4.6 Headings and Labels AA Headings describe their section, and the heading levels follow a correct, nested order.
WCAG 2.2 1.4.5 Images of Text AA Text is real, selectable text — not a picture of text — so it can be read, resized, and reflowed.
EN 301 549 10 Non-web documents (incorporates WCAG) European harmonised standard; clause 10 applies the WCAG A/AA set to documents such as PDF.
Section 508 502 / 504 (incorporates WCAG A & AA) US federal electronic documents must meet WCAG 2.0 Level A and AA, including tagging and alt text.
ADA Title II WCAG 2.1 AA (DOJ rule) AA US state/local government web content and documents must conform to WCAG 2.1 AA.

The four problems we’ll fix

Each card below isolates one common document defect. Because a PDF isn’t HTML, the Bad and Good examples show the document’s underlying structure — the PDF logical tag tree, or the source markup that exports to a tagged PDF — written as escaped, non-running code so it can’t affect this page. For every issue you get a plain-language statement of the problem, those examples, the copyable Code, and an ordered fix.

Untagged PDF with no structure

PDF/UA-1 WCAG 2.2 · 1.3.1 A EN 301 549 Section 508

An untagged PDF has no logical structure tree at all. The text is still there visually, but nothing tells assistive technology which run of characters is a heading, which is a paragraph, where a list starts, or what order to read the page in. A screen reader falls back to guessing reading order from the position of content on the page, which in a multi-column or boxed layout produces an unordered jumble — a footer read before a heading, two columns interleaved line-by-line. PDF/UA-1 requires a complete tag tree, and the fix is to export a tagged PDF from the source rather than printing a flat one.

Bad

There is no structure tree — the page is a bag of positioned text and graphics with no roles and no reading order. This is what “Print to PDF” or an untagged export produces.

untagged-tag-tree.txt
Tags panel: (No Tags Available)

Document content is only loose page objects:
  /Page
    BT … "Quarterly Report"  … ET   <!-- looks like a title, but no tag -->
    BT … "Revenue rose 8%…"  … ET   <!-- looks like a paragraph, no tag -->
    BT … "Page 1 of 12"      … ET   <!-- footer, may be read first -->

Reading order: inferred from x/y position → unreliable

Good

A tagged export builds a logical structure tree: a Document root with real H1, P, and L tags in the intended reading order. Now a screen reader reads heading, then body, then list — in order.

tagged-tag-tree.txt
<Document>
  <H1>Quarterly Report</H1>
  <P>Revenue rose 8% over the previous quarter.</P>
  <L>
    <LI><LBody>North region: up 12%</LBody></LI>
    <LI><LBody>South region: flat</LBody></LI>
  </L>
</Document>
<!-- "Page 1 of 12" is an Artifact, outside the reading order -->

Code

You almost never write tags by hand — you author the source so the export tags it. Use real styles in Word and turn on tagged export; the structure carries across automatically.

source-to-tagged-pdf.txt
Word source (real styles, not manual formatting):
  Heading 1 → "Quarterly Report"
  Normal    → "Revenue rose 8%…"
  List Bullet → "North region…", "South region…"

Export → "Best for electronic distribution and accessibility"
  ☑ Document structure tags for accessibility   (tagged PDF)
  ☑ Create bookmarks using: Headings

InDesign: set the Articles panel order, then
  File ▸ Export ▸ Adobe PDF (Print) ▸ ☑ Create Tagged PDF

How to fix

  1. Author the source with real paragraph and list styles — never with manual spacing and font changes that carry no meaning.
  2. Export a tagged PDF: in Word choose the accessibility-preserving option; in InDesign tick “Create Tagged PDF”. Avoid plain “Print to PDF”.
  3. Open the Tags panel (or run a checker) and confirm the document is tagged and not “No Tags Available”.
  4. Verify the reading order in the tag tree matches the intended order, and that page numbers, headers, and footers are marked as artifacts.

Images in the document with no alternative text

PDF/UA-1 WCAG 2.2 · 1.1.1 A EN 301 549 ADA Title II

A figure tag with no alternative text is silent. When a chart, logo, photo, or infographic is placed in a document and exported as a Figure tag without an /Alt entry, a screen reader either skips it or announces a bare “graphic” — and any information carried only by that image is lost. Two cases need handling separately: an informative image needs alt text that conveys its meaning, while a decorative image (a divider, a background flourish) should be marked as an artifact so it is removed from the reading order entirely rather than announced as empty.

Bad

The image is tagged as a figure but has no alternative text. The chart’s data is conveyed only visually, so it is unavailable to anyone using a screen reader.

figure-no-alt.txt
<Figure>
  <!-- placed image of a bar chart, no /Alt entry -->
</Figure>

Screen reader announces: "graphic"   <!-- no meaning conveyed -->

Good

The figure carries an Alt attribute that conveys what the image communicates. For a data chart, the alt summarises the takeaway; the full data can also be offered as a real table nearby.

figure-with-alt.txt
<Figure Alt="Bar chart: revenue rose from $1.2M in Q1
              to $1.6M in Q4, up 33% across the year.">
  <!-- placed image of the bar chart -->
</Figure>

Screen reader announces the full alt text.

Code

A purely decorative image must be taken out of the reading order, not given empty alt. Tag it as an Artifact so assistive technology ignores it. In the source you set both with “Edit Alt Text” / “Mark as decorative”.

decorative-artifact.txt
Decorative divider → Artifact (removed from reading order):
  <Artifact> … decorative rule … </Artifact>

In the source (Word / PowerPoint):
  Right-click image ▸ "View Alt Text"
    • Informative → type a concise description
    • Decorative  → ☑ "Mark as decorative"

InDesign: Object ▸ Object Export Options ▸ Alt Text
  • Source: Custom → description, or
  • Set the image as an artifact for decoration

How to fix

  1. Decide for each image whether it is informative or decorative — that single choice drives everything else.
  2. Give every informative figure alt text that conveys its meaning, not its file name; summarise charts and offer the underlying data as real text or a table.
  3. Mark decorative images as artifacts (or “decorative” in the source) so they’re removed from the reading order rather than announced as empty.
  4. Don’t leave alt text as the image’s filename or “image” — that’s noise, not information.
  5. Run an accessibility check and confirm no figure is reported as missing alternate text.

No real heading structure or bookmarks

PDF/UA-1 WCAG 2.2 · 1.3.1 A 2.4.1 A 2.4.6 AA EN 301 549

In a long document, headings are how everyone navigates — but only if they are real headings. Text that is merely made big and bold looks like a heading yet exports as an ordinary P tag, so a screen reader user can’t pull up a list of headings or jump between sections, and there are no bookmarks to move through the document. The page becomes one undifferentiated scroll. The fix is to apply real heading styles in the source so they export as nested H1H6 tags and generate a bookmark tree — and to keep the levels in order without skipping.

Bad

What looks like a section heading is just a large, bold paragraph. It exports as a plain P, so it isn’t in the headings list and produces no bookmark.

fake-headings.txt
<P><Span style="18pt bold">1. Introduction</Span></P>
<P>Body text…</P>
<P><Span style="18pt bold">2. Methodology</Span></P>

Bookmarks panel: (empty)
Headings list:  (no headings found)

Good

Real heading styles export as nested heading tags, in order, with no skipped levels. The structure now drives both the headings list and the bookmark tree.

real-headings.txt
<H1>Annual Accessibility Report</H1>
  <H2>1. Introduction</H2>
    <P>Body text…</P>
  <H2>2. Methodology</H2>
    <H3>2.1 Sampling</H3>
    <H3>2.2 Tools</H3>

<!-- One H1 per document; no jump from H1 to H3 -->

Code

You get this from the source by using its heading styles and enabling bookmark generation on export — never by manually enlarging text.

headings-and-bookmarks.txt
Word source:
  Apply "Heading 1", "Heading 2", "Heading 3" styles
  (Home ▸ Styles) — not bold + bigger font.

Export to PDF:
  ☑ Create bookmarks using: Headings
  ☑ Document structure tags for accessibility

Result in the PDF:
  • Heading styles → H1…H6 structure tags
  • Bookmarks panel mirrors the heading outline (2.4.1)

How to fix

  1. Apply real heading styles in the source for every section title; don’t fake a heading with large bold text.
  2. Keep the levels nested and in order — one H1 for the document title, then H2, H3 without skipping a level (2.4.6).
  3. On export, enable “Create bookmarks using headings” so a long document gains a navigable bookmark tree (2.4.1).
  4. Write headings that actually describe their section, then check the tag tree: the heading list and bookmarks should match the visible outline.

Scanned, image-only PDF

WCAG 2.2 · 1.4.5 AA 1.1.1 A PDF/UA-1 EN 301 549

A scanned document is a photograph of paper: each page is one big image, and the “text” on it is just pixels. There is no selectable, searchable, taggable text underneath at all, so a screen reader finds nothing to read, the content can’t be resized or reflowed without blurring, and it fails both Images of Text (1.4.5) and Non-text Content (1.1.1). Tagging alone can’t rescue it because there is no text to tag. The fix is to recover real text with OCR and then tag the result — or, where the scan is poor, to provide an accessible HTML or properly tagged alternative.

Bad

The whole page is a single scanned image. Selecting text selects nothing; a screen reader has only an undescribed graphic, so the entire document is unreadable.

image-only-scan.txt
/Page
  /XObject /Image  (full-page scan, e.g. 2480×3508 px)

No text layer. Select-all selects nothing.
Tag tree (if any): <Figure> with no /Alt → "graphic"
Reflow: unavailable (it's a picture)

Good

OCR recognises the characters and adds a real text layer, which is then tagged into a proper structure. Now the text is selectable, searchable, reflowable, and read aloud in order.

ocr-then-tagged.txt
After OCR (recognised text layer) + tagging:

<Document>
  <H1>Notice of Public Meeting</H1>
  <P>The council will meet on 14 March at 7 p.m.</P>
</Document>

Text is now selectable, searchable, and reflowable.

Code

Run recognition, fix what OCR got wrong, then tag — or give people a clean accessible alternative. Where the original source still exists, re-exporting a tagged PDF from it beats OCR every time.

recover-real-text.txt
Acrobat: Scan & OCR ▸ "Recognize Text"
  → adds a searchable text layer to the scan
Then: All tools ▸ Prepare for accessibility ▸ Autotag,
  and fix reading order + alt text by hand.

Always proofread OCR output (it misreads characters).

Better alternative when available:
  • Re-export a tagged PDF from the original source, or
  • Publish an accessible HTML version of the content.

How to fix

  1. If the original source still exists, re-export a tagged PDF from it instead of working from the scan — that gives the cleanest real text.
  2. Otherwise run OCR to add a real text layer, then proofread it: OCR routinely misreads characters and merges columns.
  3. Tag the recognised document and correct the reading order, headings, and alt text by hand.
  4. Where a scan is too poor to OCR reliably, publish an accessible HTML or freshly tagged alternative and link people to it.
  5. Confirm the result has selectable text and resizes without turning to mush (1.4.5).

Recap

  • Tagged — export a fully tagged PDF from the source, so a logical structure tree describes every paragraph, list, and table (PDF/UA-1, 1.3.1).
  • Reading order — check that the tag order matches the intended reading order, not the order shapes happen to sit on the page (1.3.1).
  • Alt — give every informative figure a text alternative and mark decorative images as artifacts (1.1.1).
  • Headings — use real heading styles so they become H1H6 tags and bookmarks; never fake them with big bold text (1.3.1, 2.4.1, 2.4.6).
  • Real text — ship selectable, taggable text; OCR and tag any scan, or provide an accessible alternative (1.4.5, 1.1.1).

The same structural fixes satisfy PDF/UA, WCAG, EN 301 549, Section 508, and ADA Title II at once — author the source correctly and the tagged PDF meets them all.