The companion to PDF properties should show your brand, not someone else’s tool — that post made the case for caring about PDF metadata. This one is the operating manual: what each field is for in the PDF specification, who reads it, the common mistakes, and how to verify your output actually shipped what you intended.
gPdf exposes the six standard fields the PDF spec defines for document-level metadata. They live under settings.metadata in the DocumentRequest JSON. Every field is optional — if you don’t set one, gPdf falls back to either the token’s default_metadata (Enterprise policy feature) or a system default.
{
"settings": {
"metadata": {
"title": "...",
"language": "...",
"author": "...",
"subject": "...",
"creator": "...",
"producer": "..."
}
}
}
The rest of this post is one section per field. Each section follows the same shape: what the field is, where it surfaces, common mistakes, rule of thumb. The order is what to fill in first → second → … → last.
title — what the document is
The PDF spec describes this as the “document title.”
Where it surfaces:
- The title bar in PDF viewers (Adobe Reader, Preview, Foxit, Chromium PDF viewer all show it).
- The browser tab when the PDF opens inline (
Content-Disposition: inline). - Search indexes — Spotlight, Outlook, SharePoint, Google Drive’s full-text indexer all read
titleand weight it heavily.
Common mistakes:
- ❌ Setting title to the filename.
invoice-20260318.pdfis the filename. The title should be something a human reads, likeInvoice INV-2026-3401. Filename and title are different concerns; the filename is for filesystems, the title is for viewers and search. - ❌ Leaving title empty. Viewers fall back to filename. The result reads as auto-generated and machine-emitted.
- ❌ Adding the brand into title.
Acme Logistics — Invoice INV-2026-3401clutters the title bar. The brand belongs inauthor, nottitle.
Rule of thumb: title should match the H1 of the rendered page. If your invoice template’s top line is “Invoice INV-2026-3401,” that’s the title.
language — for accessibility, search, and compliance
language is a BCP-47 language tag: en, de, zh-Hans, pt-BR, ar-SA. Set it for every document. Out of the six fields it has the most concrete downstream consequences and the smallest implementation cost — which is why it sits at position 2 rather than buried lower.
Where it surfaces:
- Screen readers — JAWS, NVDA, VoiceOver use it to pick the right phoneme set. An English screen reader reading a
language: "de"PDF will pronounce German words correctly; without the tag it gets the prosody wrong. - Search engines and indexers — affects which locale’s stemming and stopword list applies. A
language: "zh-Hans"invoice gets indexed in Chinese segmentation; a missing tag often defaults to English and the index becomes unusable. - PDF/A compliance — PDF/A-2a and PDF/A-3a (accessibility profiles) require the language tag. Without it, veraPDF validation fails.
Common mistakes:
- ❌ Leaving it unset. Default to “the recipient’s locale,” not “the platform’s default.” Most leaky stacks just don’t write the field; the result is screen readers that mispronounce and search indexes that mis-stem.
- ❌ Using a non-BCP-47 string like
"english"or"EN-US". The PDF spec expects RFC 5646 tags:en,en-US,de,pt-BR. - ❌ Hard-coding the platform’s default (e.g. always
"en") regardless of the document’s actual content language. A Portuguese invoice tagged"en"is worse than an untagged document — it actively misleads the indexer.
Rule of thumb: the tag should match the actual content language. For a customer in Brazil receiving an invoice in Portuguese, set "language": "pt-BR", not "en". For multilingual documents, pick the dominant language and use the Lang attribute on individual content elements for the rest — that’s a tagged-PDF accessibility feature beyond the document-level language field.
author — who owns the document
In the PDF spec, author is “the name of the person or organisation that created the document.” For business PDFs that ship to recipients, the answer is almost always the organisation — but the right shape genuinely varies by context.
Where it surfaces:
- Properties dialog in every PDF viewer, prominently labelled “Author.”
- DMS / archive indexers, often used as a filter.
- PDF/A XMP metadata stream, where it carries into long-term archives.
Common mistakes:
- ❌
"author": "[email protected]"— accidentally leaks the operator’s email into every PDF, ends up in every search index, becomes a long-term PII issue. - ❌
"author": "PDF Generator Service"— internal tool name; means nothing to the recipient. - ❌ Empty — Preview and most viewers literally show “(no author)” in the properties dialog, which reads as “nobody owns this.”
Shapes that work:
- ✅
"author": "Acme Logistics, Inc."— straightforward organisation. - ✅
"author": "Acme Logistics — Billing"— organisation + department, for documents that route to a specific desk. - ✅
"author": "Bridge Capital Partners — Fund III"— useful in finance/legal where attribution is to a specific entity. - ✅
"author": "Maria López, RICS Surveyor"— for single-author publishing (reports, valuations, legal opinions) where the individual IS the editorial attribution.
Rule of thumb: author is the entity the recipient should associate the document with. In a multi-tenant SaaS where the platform generates PDFs on behalf of customers, author should be the customer’s organisation name, not the platform’s name. (The platform’s name belongs in creator — see below.) For consultancy / publishing / legal contexts where individuals are the brand, individuals are fine.
subject — what type of document this is
subject is short-description-of-the-document. Viewers don’t surface it prominently — most users will never see it unless they open the Properties dialog. But document management systems, archive systems, and rules-based email/file routing use it.
Where it surfaces:
- Properties dialog, secondary position.
- DMS routing rules, archive bucketing logic.
- XMP metadata stream (PDF/A).
Common mistakes:
- ❌
"subject": "Invoice for Acme on 2026-03-18 for $4,532.10"— that’s a document-instance description, not a type. It belongs intitle. - ❌ Empty — costs you a free routing hook for downstream systems.
- ❌ Mixing classes inconsistently (
"Invoice"vs"Invoice/2026-03"vs"Monthly invoice") — DMS filters can’t bucket on a moving target.
Shapes that work:
- ✅
"subject": "Invoice" - ✅
"subject": "Monthly account statement" - ✅
"subject": "Shipping label — 4×6 thermal" - ✅
"subject": "Q3 2026 board pack"
Rule of thumb: the right granularity is document class, not document instance. A DMS with thousands of incoming PDFs can route on subject if you give it a consistent vocabulary. Pick a finite set of classes for your platform and never deviate — every invoice your platform generates should have exactly "subject": "Invoice".
creator vs producer — the most-mixed-up pair
This is where most teams stop reading the PDF spec and guess. The spec is precise; the two fields mean different things.
creator— the application that produced the source content (the upstream system that decided what the document should say).producer— the application that produced the PDF bytes (the rendering engine that turned that content into a PDF file).
For a SaaS billing platform generating invoices through a JSON-to-PDF API like gPdf:
creator= the SaaS billing platform with its version. That’s the application that decided this should be an invoice for Acme for $4,532.10.producer= the renderer. By default that’s “gPdf.” But because the rendering layer is infrastructure the SaaS chose, the SaaS can legitimately setproducerto its own platform name — its platform did, in a real sense, produce the PDF bytes by delegating to gPdf as infrastructure.
{
"creator": "Acme Billing Platform v7.2",
"producer": "Acme Billing Platform"
}
Where they surface:
- Properties dialog, both labelled.
pdfinfooutput, side by side.- PDF/A XMP stream (both fields are required to be non-empty in PDF/A).
Common mistakes:
- ❌
creatorset to a Chromium / Mozilla user-agent string. Happens when a headless-browser PDF stack passes the User-Agent intocreatorautomatically. It’s the browser version, not the source-of-truth system. Override it. - ❌
producerleft as the default renderer name. Most teams never override this, so every PDF says “Skia/PDF m120” or “wkhtmltopdf” — see the white-label post for why this matters for B2B. - ❌ Putting the same value in both. Acceptable but wasteful — the two fields exist precisely so a viewer can tell “source app” from “render engine.” Use them.
Rule of thumb: creator is your application name with version (e.g. "Acme Billing Platform v7.2"); producer is your application’s brand or platform name without version (e.g. "Acme Billing Platform"). Both should be values the recipient would recognise.
Empty fields, per-token defaults, downstream surprises
Three implementation details worth knowing before you ship:
- Empty or whitespace-only strings are treated as not provided. Sending
"title": ""is the same as omittingtitle— it doesn’t write an empty string into the PDF, it walks the fallback chain (token default → system default). This is the cause of the most common “I set it, it didn’t take” bug report. - Token policies can strip or default metadata fields. A multi-tenant SaaS using gPdf can set a
default_metadataon each API token so every PDF that token generates carries the customer’sauthorandproducerwithout trusting every developer to set them on each request. The token-level default is the right enforcement layer for “every Acme PDF must say Acme.” - Downstream pipelines may rewrite your metadata. Tools that post-process PDFs after gPdf returns them — Ghostscript without explicit metadata-preservation flags, some enterprise DRM tools, some “PDF optimisers” — can overwrite Producer with their own name and undo the branding you just set. Verify against your actual production pipeline, not just the raw gPdf response.
Verify your metadata
After you implement the changes above, three quick ways to check the PDF actually shipped what you intended:
Command line (macOS / Linux, requires poppler-utils):
$ pdfinfo your-output.pdf | head -10
Title: Invoice INV-2026-3401
Subject: Monthly invoice — 2026-03
Author: Acme Logistics, Inc.
Creator: Acme Billing Platform v7.2
Producer: Acme Billing Platform
Language: en
Acrobat / Adobe Reader: File → Properties → Description tab. All six fields appear, with Title shown in the viewer’s title bar at the top.
macOS Preview: ⌘+I (Get Info). The “PDF” inspector pane shows the same fields.
If any field shows up empty, blank, or with a tool name you didn’t set, walk back through the request body — the most common cause is sending "" (empty string), which the API treats as “not provided” and walks the fallback chain to a default value. The second-most common cause is a downstream pipeline (Ghostscript, DRM, optimiser) overwriting the field after gPdf returned it; test against production, not just the raw render response.
Metadata in PDF/A archival
If you’re rendering for long-term archival with settings.profile: "pdfa-2b" (or -2a, -3a, -3b), metadata stops being optional and becomes load-bearing:
- The
producerfield cannot be empty in a PDF/A-conformant file — at minimum the system default ships. languageis required for the accessibility profiles (PDF/A-2a, PDF/A-3a). Without it, veraPDF validation fails outright.- The XMP metadata stream PDF/A requires is generated automatically from the six fields above; you don’t need to construct it yourself.
title,author,subject,creator,producerandlanguageall ride into the XMP stream, so a downstream archive’s metadata indexer (Preservica, Archivematica) can build its catalog from them without re-parsing the document body.
For an archival document, branded metadata isn’t just brand polish — it’s part of the durability of the artefact. The German customs office, the Brazilian tax authority, or any long-term archive that opens your PDF in ten years will see whatever was in those fields the day you rendered it. Setting them deliberately at render time is the only chance you get.
What gPdf doesn’t expose (yet)
To stay honest about today’s surface: the PDF spec also defines Keywords (free-form search terms) and an XMP metadata stream that supports arbitrary custom key-value pairs. gPdf does not expose either of these in the current API.
If you need to stash arbitrary business data inside the PDF (order UUID, warehouse code, template version), the workarounds today are:
- Set
subjectto a structured short string that downstream systems parse. - Keep the business data in your own database, keyed by filename or content hash.
- Wait — XMP custom fields are on the roadmap, and when they ship they’ll be the right answer for hidden machine-readable workflow context.
Conflating “branded metadata” (the six standard fields, available now) with “custom business metadata” (XMP custom fields, future) is the easiest way to over-promise what’s possible today. Worth keeping them separate in your own planning.
A complete example
A SaaS billing platform (Acme Billing Platform) generating an invoice for a German customer (Müller Versand GmbH), ready to be archived as PDF/A:
{
"settings": {
"profile": "pdfa-3b",
"metadata": {
"title": "Rechnung RE-2026-0412",
"language": "de",
"author": "Müller Versand GmbH",
"subject": "Monatsrechnung — März 2026",
"creator": "Acme Billing Platform v7.2",
"producer": "Acme Billing Platform"
}
}
}
pdfinfo against the resulting PDF:
$ pdfinfo invoice-2026-0412.pdf | head -10
Title: Rechnung RE-2026-0412
Subject: Monatsrechnung — März 2026
Author: Müller Versand GmbH
Creator: Acme Billing Platform v7.2
Producer: Acme Billing Platform
Language: de
Title in German, author as Müller Versand (the customer’s GmbH entity, the recipient of the document), creator as Acme Billing Platform (the editorial system that decided what to put on the page), producer as Acme Billing Platform’s brand, language tagged correctly for the German screen reader and for the German full-text indexer that will later pick this up in Müller’s DMS. PDF/A-3b profile means this set of metadata also gets serialised into the XMP stream for long-term archival.
Nothing in the file properties names gPdf, Chromium, or any tool the customer didn’t choose. Which is exactly the point.
The smallest possible upgrade
If you already POST to /api/v1/pdf/render and your current call has no settings.metadata, the smallest improvement is three lines added to the JSON you already send:
{
"pages": [...],
"settings": {
+ "metadata": {
+ "author": "Your customer's organisation",
+ "producer": "Your platform"
+ }
}
}
Two fields, one new key. Verifiable with pdfinfo in seconds. Once these land, fill in title, language, subject and creator when you have time.
Where this lands
- §4.14.2 Metadata — the API reference for these fields.
- PDF white-labelling (companion post) — the why and the B2B SaaS case.
- PDF/A and Factur-X explained for engineers — relevant if your metadata story includes long-term archival.