1Intake and classification
Document enters the queue. The system detects STEM content via keyword and structure analysis and routes it to the STEM pipeline.
AI classifierFastAPI
STEM textbooks are the hardest documents to remediate. Standard tools cannot handle equations, multi-level tables, or multi-column research layouts. Our eight-stage pipeline is designed specifically for this content.
Standard auto-tagging tools fail on STEM content. Equations become unreadable strings. Complex tables lose their header associations. Multi-column layouts break reading order. Students using screen readers hear noise instead of mathematics.
A screen reader cannot interpret an image of an equation. Every equation needs three things: plain-language alt text, MathML markup for braille displays, and the original LaTeX source for re-use.
STEM tables often have multi-level headers, merged cells, and spanning rows. Without correct header-cell associations, a screen reader cannot tell a student which column or row a value belongs to.
Research papers, lab manuals, and many textbooks use two-column layouts. Auto-taggers frequently read across columns instead of down them, producing garbled output for screen reader users.
Every STEM document passes through all eight stages. Automation handles bulk processing. Trained Community of Practice members with STEM expertise handle what automation cannot.
Development status: The STEM pipeline is in active development. We are accepting early university partner enquiries now, with full pipeline availability planned for later in 2026. Get in touch to discuss your institution's needs.
Document enters the queue. The system detects STEM content via keyword and structure analysis and routes it to the STEM pipeline.
AI classifierFastAPI
Scanned documents are processed with OCR. Complex layouts use a fallback engine for better accuracy on multi-column and table-heavy pages.
OCRmyPDFTesseract 5PaddleOCR
All equations are identified and extracted. Each equation is converted to LaTeX and MathML. Display equations and inline equations are flagged separately.
Mathematical OCR engineLaTeXMathML
The document is auto-tagged with structural markup: headings, paragraphs, lists, tables, and figures. STEM-specific rules handle equation placement and reading order.
Auto-tag engineAI alt-text
The system generates a work order and assigns tasks to Community of Practice members with STEM training: equation alt text, table markup, complex figure descriptions, and reading order review.
Task queueSTEM members
A Community of Practice member with STEM training writes a plain-language description of every equation. This is what a screen reader will announce. MathML is attached for braille display rendering.
Human-writtenMathML
Full PDF/UA compliance check. Screen reader spot-test by an experienced member. Cross-validation for university deliverables.
veraPDFNVDAPAC
The remediated document is delivered via the portal, email, or WhatsApp. A compliance certificate is attached for university deliverables. EPUB and tactile diagram formats are available on request for STEM content.
DownloadEmailWhatsAppEPUBTactile
Mathematical meaning is too easy to misrepresent with automation alone. The plain-language description is always written by a Community of Practice member with STEM training.
Written by a Community of Practice member with STEM training. Describes what the equation says in natural language. This is what screen readers announce.
Example: "The quadratic formula: x equals negative b plus or minus the square root of b squared minus 4ac, all divided by 2a."
Generated from the LaTeX output and embedded in the PDF. Allows advanced screen readers and braille displays to render the equation natively in mathematical notation.
Stored in document metadata and in the The Accessible Documents Initiative database. Available for institutions who want to re-use equations in other formats, LMS platforms, or accessible EPUB exports.
A screen reader user navigates tables cell by cell. Without correct header associations, the data is meaningless. We apply different rules for different table types.
Trigonometric values, physical constants, periodic table sections. Row 1 headers marked with column scope. Column 1 headers marked with row scope. All blank cells tagged as empty.
Rule: TH Scope=Col and TH Scope=Row.
Common in research papers and lab reports. Each header span gets correct ColSpan or RowSpan attributes. Complex associations are documented in the work order for the reviewing member.
Rule: ColSpan and RowSpan attributes.
Answer grids, exercise boxes, and visual arrangements that look like tables but contain no data relationships. These are marked as artifacts or restructured as lists. Never tagged as data tables.
Rule: Artifact or restructured as list.
The STEM pipeline is available to university partners. It handles the content that standard remediation services cannot.
Mathematics, physics, chemistry, engineering, economics, and biology textbooks. Lab manuals. Research papers. Exam papers (UPSC, state boards, JEE, NEET). Any document with equations, complex tables, or multi-column layouts.
PDF/UA compliant output with MathML equations. EPUB export and tactile diagram formats for braille-ready delivery. Dedicated Community of Practice member assignment with STEM expertise. MathML delivery option for LMS integration (Moodle, Canvas, Blackboard). Annual STEM accessibility training included for partner institutions.
We are working with early university partners to shape the STEM pipeline. Get in touch to tell us about your content and we will tell you exactly how we can help.