Choosing a PDF generation API looks easy until you start. There are ~40 vendors, the marketing pages all sound similar, and you’ll only learn the real tradeoffs after a few thousand documents in production.
This is a checklist you can take to a vendor evaluation — vendor-agnostic, pulled from the actual incidents teams hit during procurement and post-mortem. Eight questions; if you can’t get a clear answer to all of them, you don’t have enough information to choose.
1. What’s your input format — HTML, JSON, or a template DSL?
The single most important question. The answer determines what your team will write — and what they’ll be debugging at 2am.
- HTML/CSS (Puppeteer, DocRaptor, Prince): familiar, infinitely flexible, expensive at runtime, hard to make deterministic.
- JSON / structured data (gPdf): cheap to render, byte-identical, requires writing a small mapper from your data model to the document model.
- Template DSL (PDFKit, ReportLab, Apache PDFBox): full control, full responsibility — you’ll be writing pagination, layout, font fallback yourself.
There’s no wrong answer. There’s a wrong answer for your team. Ask your engineers which model they’d rather debug a 3-hour pagination bug in.
2. What’s the cold-start latency — and is it predictable?
Some renderers boot in microseconds (anything WASM or native binary). Some boot in seconds (Chromium-based). The difference is invisible until you have a traffic spike.
What to ask the vendor:
- “What is your p99 latency for the first request to a cold worker?”
- “How long after my last request before a worker becomes ‘cold’ again?”
- “Do you publish a status page with cold-start data?”
If they can’t answer the first one with a number, assume it’s bad.
3. How is per-render cost modeled?
Three flavours, in order of how often they bite you:
- Per-page pricing (Anvil at $0.10/PDF, DocRaptor at $89/100K): predictable, easy to budget, expensive at scale.
- Subscription tiers with overage (gPdf at $5–12/mo + $0.00005/page over): cheap at any volume, harder to project for usage you’ve never tested.
- Compute-based pricing (self-hosted Puppeteer on Lambda): you eat the compute bill directly, including cold-starts and Chromium memory.
Calculate your ACTUAL bill at three traffic levels (current, 5×, 50×) before signing. The shape of the cost curve matters more than the headline number.
4. Is the output deterministic?
Determinism — same input → same bytes — sounds academic until you need it.
You need it when:
- You diff PDFs in CI to catch unintended template changes.
- You retain documents under e-invoice / tax law (the PDF you store and the PDF you re-render must match).
- You hash the PDF for archival integrity.
- You version-control the rendered output for legal review.
Browser-based renderers (Puppeteer, anything Chromium) are NOT deterministic across patch versions. Native binary renderers (Prince, gPdf) usually are. Ask explicitly: “Will my output bytes change if you ship a renderer update?“
5. How does the renderer handle fonts, especially CJK and RTL?
This is the question that has cost more careers than any other in PDF land.
The failure mode is consistent: you launch in your home market, fonts are fine. Six months later you expand to a market that uses a script your renderer doesn’t have glyphs for. PDF starts emitting ▢▢▢▢ boxes. Customer escalates. Your team spends two sprints adding fonts to a Dockerfile.
Questions to ask:
- “Which scripts are bundled at no extra config? (Latin, CJK, Cyrillic, Devanagari, Arabic, Hebrew?)”
- “What happens when an unknown glyph is encountered — fallback or tofu?”
- “Can I add custom fonts at request time, or do I have to deploy them ahead of time?”
- “Do you support RTL text shaping?”
A good answer: “We embed NotoSans CJK and a Noto fallback set; unknown glyphs fall through to Noto Symbols.” A bad answer: “Yeah, we support fonts.”
6. What compliance profiles are supported?
If your business might ever:
- Issue invoices in the EU (Factur-X / ZUGFeRD / EN 16931, mandatory in DE/FR/IT/PL by 2026)
- Archive documents under SOX, HIPAA, or GDPR retention rules (PDF/A)
- Submit medical records (PDF/A-3 with attached XML)
- Embed digital signatures (PAdES)
…then ask which compliance profiles the renderer supports natively. The bad answer: “you can run another tool to convert afterward”. That’s a multi-step pipeline you now own.
The good answers usually look like a single flag — for example, gPdf takes settings.profile: "pdfa-3b" plus a settings.e_invoice block with standard: "factur_x" and an embedded CII XML. Built-in is dramatically less ops than bolt-on.
7. Is rendering stateless? Where do my documents go after they’re rendered?
Two questions, related.
Stateless rendering means the request comes in, the PDF is emitted, nothing is stored. You handle persistence yourself (S3, your DB, whatever). This is what you want for compliance-heavy workloads — the renderer is never a custodian of your data.
Stateful rendering means the vendor stores the PDF (often on their CDN) and gives you a signed URL. Convenient for casual workflows (e.g. “send the customer a link”), problematic for regulated workflows (now there’s a third party with a copy of every document you ever rendered).
Ask:
- “Is rendering stateless by default?”
- “Where (geographically) is the document stored if you store it?”
- “How long is it retained?”
- “Can I get a written guarantee of stateless rendering for compliance review?”
If the answer is hand-wavy, your privacy/legal team is going to make this an issue 9 months in.
8. What happens when the renderer fails — and how do I find out?
Every renderer fails sometimes. The questions are:
- How does failure surface? A 500 with a stack trace? A 4xx with a structured error? An empty PDF?
- What’s the retry policy? Is it idempotent? Are you charged for failed renders?
- What instrumentation does the vendor provide? A status page? Webhooks for incidents? p50/p99 dashboards by region?
- Is there a synthetic probe — does the vendor run their own monitoring against the public endpoint, or are they relying on you to file the ticket?
A quick test: visit the vendor’s status page right now. If it doesn’t exist, isn’t real-time, or shows “all systems operational” with no detail, that’s the level of reliability transparency you’ll get post-purchase.
(For reference: gPdf publishes /status with synthetic probe data + Cloudflare Analytics over the trailing 7 days.)
How gPdf scores against the eight
Since this is our blog and you’ll suspect we tilted the questions, here’s our honest scorecard:
| # | Question | gPdf answer |
|---|---|---|
| 1 | Input format | JSON DocumentRequest (structured data) |
| 2 | Cold start | 5–20 ms (V8 isolate, no browser) |
| 3 | Cost model | $0/$5/$8/$12 per month; $0.00005/page overage |
| 4 | Determinism | Byte-identical, guaranteed across the same engine version |
| 5 | Fonts | NotoSans CJK + Latin fallback embedded |
| 6 | Compliance | PDF/A-1b/2b/3b/4 + Factur-X / ZUGFeRD attachment built-in |
| 7 | Stateless | Yes, contractually — no document storage anywhere |
| 8 | Failure & visibility | Public status page with 7-day trend; structured 4xx/5xx; idempotent |
Where we lose: Q1, if your input is genuinely HTML you can’t refactor (e.g. user-generated reports, legacy templates). For that, DocRaptor or Prince are the right answer.
TL;DR
Don’t ask “which is the best PDF API”. Ask the eight questions, score the answers, and pick the vendor that lines up with your actual workload. The team that lost a procurement to a slightly cheaper rival because they got blindsided by question #5 nine months later will tell you the same thing.
If your workload happens to align with how gPdf is built, the Playground takes 30 seconds to evaluate. If it doesn’t, we’ll cheerfully point you at the right tool — usually DocRaptor for HTML-shaped problems, Prince for self-hosted, or Puppeteer if your input is truly arbitrary web pages.