Article

Why two PDF/A validators are better than one

Single-engine PDF/A conformance results are not audit-grade. Why dual-engine validation matters — and how to run it free at gpdf.com/validator/.

A PDF either conforms to PDF/A-3 or it doesn’t. So why would anyone run two validators on the same file?

Short answer: because the spec is large enough that two correct implementations can disagree on the edges, and an audit-grade workflow treats single-engine “Pass” as a yellow light, not a green one. Here’s the long version.

PDF/A is a stack of negotiated rules, not a single algorithm

PDF/A is defined across multiple ISO 19005 parts (PDF/A-1, PDF/A-2, PDF/A-3, PDF/A-4), each with sub-conformance levels (b, a, u, e, f), and each of those builds on top of the underlying PDF specification (ISO 32000). The combined surface area is several thousand pages of normative text.

A few examples of where conforming implementations have historically diverged:

  • Transparency in PDF/A-2/3: allowed under specific conditions; the conditions are written precedurally and different validators implement the check differently.
  • Embedded color profiles: when is an ICC profile “required” vs “recommended”? Different validators have called the same file “Pass” and “Fail” on this axis.
  • Embedded file metadata in PDF/A-3: AFRelationship, /AF references, the embedded XMP — the rules are well-specified but the enforcement strictness varies.
  • Font subsetting: PDF/A requires all fonts to be embedded with their actual encoding. Edge cases around CID-keyed fonts with partial subsets have caused validator disagreements.

These aren’t bugs. They’re the natural consequence of a complex specification being implemented by independent teams from a normative text. The conservative position — and the position taken by most regulated industries — is to require multiple independent confirmations.

The reference engine + the second opinion

veraPDF is the reference implementation maintained by the PDF Association — the standards body that publishes PDF/A. It’s open-source, audited by industry working groups, and its rule set is the canonical interpretation of the ISO 19005 text. If veraPDF says “Pass”, that’s the strongest signal you can get from a single engine.

But “strongest single-engine signal” is not the same as “audit-passing evidence”. Auditors at regulated institutions — banks, healthcare records archives, government records offices — frequently require a second independent confirmation because:

  • veraPDF’s interpretation of a rule could differ from another validator the auditee uses internally, leading to a downstream rejection.
  • A bug in any single engine (even the reference) cannot be detected by running that same engine twice.
  • The procurement principle “two independent confirmations” is broadly applied across compliance domains; PDF/A inherits that expectation from its archival use cases.

The second-engine choice depends on what’s available:

  • Adobe Acrobat Preflight is paid and closed-source — fine as a confirmation engine but limits who can re-verify.
  • callas pdfaPilot is paid and the de-facto enterprise choice but again limits independent re-verification.
  • A second open-source engine — there are a few, mostly less complete than veraPDF.

What we did at gPdf was build our own engine in Rust + WebAssembly as a deliberate “independent re-implementation” — same spec, same rules, written from scratch by a different team. When both engines pass the same file, the conclusion is far stronger than either alone could provide. When they disagree, you have a clear bug to investigate (in one of them, or in the file).

The validator that puts both on one URL

We host both at gpdf.com/validator/ — free, no login, runs the file through veraPDF AND our edge engine in parallel, returns both reports side-by-side. The use cases:

  • You generate a PDF/A and want to ship it: drop into the validator, both pass, attach the JSON reports as QA evidence. Done.
  • One engine fails, the other passes: you have a precise bug — diff the reports, find the offending field. Often it’s something subtle like a misaligned XMP timestamp or a missing /AF reference in a PDF/A-3.
  • Both fail: the file is genuinely broken; fix at the source.
  • Auditing an incoming archive batch: drop randomly-sampled PDFs in, log the report URLs, attach to the audit work paper. “We verified with veraPDF and an independent engine” is a stronger claim than “we ran our vendor’s checker.”

The file you upload never leaves the request — the engines run in-memory on Cloudflare Workers and the file is discarded after the report is rendered. No login, no persistence, no quota.

The same pattern, generalized

This isn’t just about PDF/A. The “two independent confirmations” pattern extends to:

  • Factur-X / ZUGFeRD e-invoices: gpdf.com/validator/ runs Mustang (mustangproject.org) for the embedded EN 16931 CII XML, alongside the PDF/A check above. (Validating ZUGFeRD with Mustang — what passes, what fails covers that workflow.)
  • TLS certificates: every modern CA log gets cross-checked by multiple monitors.
  • Build reproducibility: two independent rebuilds from the same source should produce byte-identical outputs.

The compliance world has been doing this for decades. PDF/A is just catching up.

TL;DR

Single-engine “Pass” is a yellow light. Dual-engine “Pass” is green. Both engines free at validator — drop your file, get two reports, attach to your QA evidence. If gPdf generated the file, the validator is the public receipt that the API delivered on its compliance claim.

If you’re building on the gPdf API, the E-invoice API reference (§5) shows how to emit Factur-X / ZUGFeRD PDF/A-3 directly. The validator then confirms it externally. Two engines, one upload — that’s the audit-grade pattern.