Metadata and Taxonomy
Metadata describes content. Taxonomy organises it. Together they determine whether users can find what they are looking for — by browsing, filtering, or searching.
A library without a catalogueue is a room full of books. The books may be excellent, but without a system for describing and organising them, they are accessible only to someone who already knows where to look. Metadata and taxonomy are the catalogueue for content: the descriptions and classifications that make content findable.
Metadata: describing content
Metadata is data about data. For a piece of content, metadata includes everything that describes it without being the content itself: the title, the publication date, the author, the summary, the category, the tags, the reading time, the canonical URL.
Metadata serves two audiences: humans and machines. For humans, metadata is what appears in previews, search results, and navigation — the snippet of information that tells a reader whether this article is worth opening. For machines, metadata is what search engines, RSS readers, social sharing previews, and internal search indices use to understand and index content.
The most consequential metadata for web content is the set that controls how content appears in search engine results and social sharing previews: the page title, the meta description, the Open Graph tags. These are often written as an afterthought, if at all. They should be treated as primary content, because they are the first thing a potential reader sees.
Title and description as reader-facing content
The HTML page title and meta description are typically the reader’s first encounter with a piece of content. They appear in search results, in browser tabs, in link previews on social media. A title that is too generic (“Blog post”) or too long (search engines typically truncate at around 60 characters) fails to communicate what the content offers. A meta description that is also the first sentence of the article, selected by the CMS because no description was written, is a missed opportunity.
Writing metadata as primary content means:
- Page titles that include the key term first and fit within 60 characters
- Meta descriptions that summarise the specific value of the content in 120–160 characters, written as a pitch to a reader deciding whether to click
- Open Graph titles and descriptions that are written for the social context, where they compete with visual content for attention
Open Graph images
The og:image tag specifies the image that appears when a link is shared on social platforms, messaging apps, and Slack. It is among the highest-leverage metadata fields: a compelling og:image can substantially improve link click-through rates compared to a generic fallback or an automatically cropped screenshot.
The og:image should be 1200×630px — the size that renders correctly across most platforms without cropping. It should be content-specific, not a generic site logo: include the article title in the image, ideally with a visual treatment that communicates the content’s subject. Most teams generate og:images programmatically — using a headless browser or an image generation service — so the article title is dynamically included in the image at publish time.
Specify the og:image as an absolute URL including the full https:// origin, since relative URLs are not supported. Keep the file under 1MB — most platforms impose size limits and silently fall back to a generic image when exceeded.
Twitter/X reads twitter:image by preference, but also falls back to og:image when a Twitter-specific tag is absent. LinkedIn, Facebook, Slack, WhatsApp, and iMessage all use the og:image tag. A missing og:image produces whatever the platform finds on the page — often a poorly-cropped thumbnail, a site logo, or nothing.
Taxonomy: organising content
Taxonomy is the system of categories, subcategories, and tags that organises content into meaningful groups. A well-designed taxonomy makes content browsable: a reader who does not know the specific term they are looking for can navigate through categories until they find what they need.
The two most common taxonomic structures are hierarchical categories and flat tags. Categories represent mutually exclusive, nested groupings: an article belongs to one category, which belongs to one parent category. Tags represent non-exclusive attributes: an article can have many tags, and a tag can apply to many articles.
The practical difference: categories are for navigation (“show me all articles about typography”), while tags are for filtering and related content (“show me other articles also tagged #line-height”). Most content systems benefit from both.
Taxonomy design pitfalls
Too many categories — when every article has its own category, the taxonomy is not a system, it is a list. Categories should be broad enough that multiple articles belong to each one.
Inconsistent tagging — if one editor tags an article “UI design” and another tags a similar article “user interface,” the tag system produces disconnected clusters rather than a coherent network. Taxonomies need governance: a controlled vocabulary, or at minimum, a canonical list of approved tags.
Categories that reflect the organisation, not the user — “Marketing,” “Product,” and “Engineering” are organisational divisions. Readers browsing for content about “responsive design” or “typography” do not think in terms of organisational divisions. Taxonomy should reflect how readers think about the subject matter, not how the organisation is structured.
Up next: content and SEO — how writing for humans and writing for search engines are more compatible than commonly assumed.