If you render a few hundred PDFs a day on a single backend Lambda or a small Kubernetes pod, the architecture doesn’t really matter. Anything works. Anything is fast enough.
Things change at scale. Once you’re emitting tens of thousands of documents per day — which is roughly any e-commerce platform, logistics carrier, BNPL service, payroll provider, or invoicing platform with even modest traction — three numbers start hurting:
- Cold-start latency, because something is always cold somewhere.
- Regional latency, because your customers are not next to your origin.
- Per-render compute, because you’re paying for it tens of thousands of times per day.
This post walks through how each of these changes shape past ~10K renders/day, and why edge-deployed renderers like gPdf are essentially a different category of solution rather than “the same thing, just faster”.
1. The cold-start tax compounds with concurrency
Cold starts are not just a nuisance — they’re a function of your concurrency curve. The way it usually plays out:
- You provision N=10 warm containers based on average traffic.
- A 3× traffic spike hits (Black Friday, payroll day, end of quarter).
- 20 new containers cold-start to absorb the spike. Each takes 1.5 to 2.5 seconds to boot Chromium / Prince / your runtime.
- For those 30 seconds, those new containers serve at p99 = 2 seconds, dragging the global p99 with them.
- Your downstream timeout budget (probably 5–10s for the whole order pipeline) is now eaten by PDF generation.
This is fine when your traffic is flat. It’s brutal when it’s spiky, and PDF traffic is always spiky — invoices fire at billing-cycle boundaries, labels fire when carriers pick up, statements fire at month-end.
Edge-deployed alternative: a Cloudflare Worker isolate cold-starts in 5–20 ms, not 1.5–2.5 seconds. There’s no container to spin, no JVM/V8 to initialise, no browser to bootstrap — the WASM module loads into a process that’s already alive. Cold-start ceases to be a thing you architect around.
For gPdf specifically, the worst-cold-start observed across our benchmarks is about 12 ms — and that’s only the first request to a freshly assigned isolate. Subsequent requests on the same isolate skip even that.
2. Regional latency is real, even for “fast” requests
Round-trip from Sydney to a us-east-1 origin is 200 ms before your code does anything. From São Paulo to eu-west-1, ~190 ms. From Mumbai to us-east, ~220 ms.
That’s each direction. So a centralised PDF API doing a 300 ms server-side render from a Sydney customer’s perspective looks like:
client → us-east : 200 ms
us-east render : 300 ms
us-east → client : 200 ms
total wall-clock : 700 ms
For an interactive flow (“preview your invoice before sending”) that’s painful. For a high-volume backend job that’s not noticeable.
Edge-deployed alternative: Cloudflare runs in 300+ cities. The closest colo to your Sydney customer is roughly 5 ms away. The same render becomes:
client → SYD colo : 5 ms
SYD render : 4 ms
SYD → client : 5 ms
total wall-clock : 14 ms
That’s a 50× improvement for interactive flows. For backend jobs it’s a wash, but interactive PDF previews (“show me what this looks like before I send”) become free instead of janky.
The hidden second-order benefit: if your PDF API runs at the edge, you can move adjacent logic there too — your check-out PDF preview, your rate-limit, your auth check. Each piece you push to the edge removes a round-trip from your hot path.
3. Per-render compute is the bill that compounds silently
Lambda pricing math at 100K renders/day:
- Puppeteer at ~600 ms wall, 1024 MB memory: ~$240/month just for compute, before egress.
- DocRaptor at $89/100K page tier: ~$2,670/month at 100K/day (= 3M/month).
- gPdf at $5/100K page tier: ~$150/month at 100K/day. Or about $5/month if you happen to land at exactly 100K/month.
The cost gap doesn’t go away as you scale up — it widens. At 1M renders/day:
- Puppeteer infra: ~$2,400/month + ops + on-call
- DocRaptor: ~$26,700/month
- gPdf: $1,500/month flat (5× of the 100K/day tier, assuming you negotiate volume on the public price grid)
A common reaction: “Surely the savings are smaller in practice — there’s something hidden.” In our experience, no. The cost driver in PDF generation is the renderer’s compute footprint. Once you swap a 600 MB Chromium process for a 4 MB WASM module, the per-render cost falls roughly 100×, and your bill follows.
The reason this works without us going broke: the underlying Cloudflare Workers Bundled price is ~$0.50/million requests. With our renderer using ~1.5 ms of CPU per call, the cost-of-goods-sold per render is genuinely sub-cent. We mark it up modestly to get to a sustainable business and you still see the 18× gap.
What “edge-deployed” actually buys you
Three things, none of them about marketing slides:
Predictable latency under any load
Because there’s no per-request boot cost, your p50 and p99 stay close to each other. We typically see p99 within 3× of p50 even at the height of a traffic spike — versus Puppeteer where p99 can hit 10× p50 during cold-start storms.
A single deployable artefact, anywhere
A .wasm module deploys identically to every Cloudflare colo. There’s no “is the Sydney pool warm?” question — every isolate boots the module within milliseconds and serves identically. This is genuinely simpler operationally than maintaining regional Lambda concurrency reservations.
A path to embedding
If you ever want to run gPdf inside a customer’s perimeter (their VPC, their isolated cluster, their air-gapped intranet), the same WASM module works. It’s the difference between “we hosted SaaS” and “we shipped technology that runs anywhere.”
Where this breaks down
Edge isn’t free magic — there are workloads where centralised wins:
- Multi-second renders. If a single PDF takes 30 seconds (huge financial statements, scientific reports), you’re better off on a long-running container with persistent state than fighting CPU caps on the edge.
- Renders that need other databases. If your render needs to JOIN three OLAP tables, you want the renderer next to the database, not at the edge. (Solution: do the JOIN, then fire the JSON to the edge for the actual render.)
- Outputs that need post-processing. Watermarking, signing, archival — if your post-render pipeline is multi-step and stateful, the edge render’s “stateless” property becomes a tax instead of a feature.
For everything else — and that’s the vast majority of B2B invoice/label/receipt traffic — edge wins on every axis that matters.
When to stop tolerating your current setup
A simple checklist. If you can tick three of these, the migration math has tipped:
- Your monthly PDF infrastructure cost crosses $300.
- Your PDF p99 latency exceeds 800 ms during normal traffic.
- You’ve hit a cold-start incident that affected customers.
- You’ve spent more than 4 hours debugging missing CJK / RTL / emoji glyphs.
- You generate PDFs in an interactive flow (preview, on-screen download).
- You operate in more than one geographic region.
The first three of those items together mean you’re paying and hurting. The next three mean a centralised renderer is actively limiting product decisions you could otherwise make.
If any of that sounds familiar, the Playground renders a sample invoice in your browser in under 5 ms — let it speak for itself.