SEO & structured data – White Tree Digital Docs

SEO on this site is a small, non-negotiable floor wired into the shared Layout.astro plus three pieces of machinery — JSON-LD builders in src/lib/schema.ts, a Sanity-driven sitemap.xml.ts, and a host-aware robots.txt.ts — none of which an editor can accidentally bypass.

The site exists to generate leads, and organic search is the cheapest channel to feed that funnel, so every route ships a correct <title>, canonical, structured data, and the right HTTP status even when content is missing or Sanity is down. The rules below are enforced in code, not left to authoring discipline.

The SEO floor (don't bypass)

Every page must meet a baseline. Most of it lives in one place — src/layouts/Layout.astro — so it can't drift per-route:

One <h1> per page containing the primary keyword. Headings come from Sanity content; the design-best-practices docs (global-rules.md) require a single keyworded H1.
Unique <title> and <meta name="description"> per page — target 50–60 chars for the title and 140–160 for the description.
A <link rel="canonical"> on every page, always built from the normalized siteUrl.
<html lang="en"> always.
A skip link as the first focusable element in <body>: <a href="#main" class="skip-link">Skip to content</a>.
The logo links to / on every page (Header + Footer wordmark).
OG and Twitter meta on every page, with a sitewide og-default.png fallback.

The <head> of Layout.astro renders the title/description/canonical and the social tags directly from props:

<link rel="canonical" href={canonical} />
{noindex && <meta name="robots" content="noindex, nofollow" />}

<title>{resolvedTitle}</title>
{description && <meta name="description" content={description} />}

<meta property="og:title" content={resolvedTitle} />
<meta property="og:url" content={canonical} />
<meta property="og:type" content="website" />
<meta property="og:image" content={ogImageUrl} />

<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content={resolvedTitle} />
<meta name="twitter:image" content={ogImageUrl} />

Title fallback chain

Layout.astro guarantees a <title> even when both the per-page Sanity seo and the site's defaultSeo are empty:

// SEO floor: every page ships a <title>, even when the route passed none.
const resolvedTitle = title ?? siteSettings?.siteName ?? 'White Tree Digital';

Each route resolves its own title/description before passing them down — the pattern across routes is seo = doc.seo ?? siteSettings?.defaultSeo, then seoTitle = seo?.metaTitle ?? doc.title. So the order is: per-document seo.metaTitle → site defaultSeo.metaTitle → the document's own title → (at the layout) siteName → the literal 'White Tree Digital'. The <meta name="description"> and the Twitter/OG description tags only render when a description actually resolves — there's no empty-string description.

Canonical origin

Canonicals (and og:url, and the URLs baked into JSON-LD) are all built as `${siteUrl}/path`. siteUrl comes from src/lib/site.ts and is normalized to never end in a slash:

export const siteUrl = (import.meta.env.PUBLIC_SITE_URL ?? '').replace(/\/+$/, '');

Always build URLs as ${siteUrl}/path

PUBLIC_SITE_URL has shipped with a trailing slash before (it was set that way in the Cloudflare dashboard), which produced https://host//path canonicals and og:urls. The strip in site.ts makes every consumer immune — keep using siteUrl rather than reading PUBLIC_SITE_URL directly. See Environment variables.

noindex on preview hosts

Two surfaces must never be indexed: any *.workers.dev host (it would compete with the canonical domain as duplicate content) and the staging build (a published-content mirror that only exists for Sanity's Presentation Tool). Layout.astro emits the noindex meta for either:

const noindex =
  import.meta.env.PUBLIC_NOINDEX === 'true' || Astro.url.hostname.endsWith('.workers.dev');

This is belt-and-suspenders with robots.txt (below) — the per-page <meta name="robots" content="noindex, nofollow"> covers the case where a crawler reaches a page without reading robots first. PUBLIC_NOINDEX is set to true only on the staging Cloudflare project; the public project leaves it unset. See Two-build split and Deployment.

schema.org JSON-LD builders

src/lib/schema.ts exports plain builder functions that each return a JSON-LD object. Routes assemble an array of these and pass it to Layout.astro as the schema prop; the layout serializes each block into its own <script type="application/ld+json">:

{
  schemaBlocks.map((block) => (
    <script is:inline type="application/ld+json" set:html={JSON.stringify(block)} />
  ))
}

There are five builders. Note that the "Article" builder actually emits @type: 'BlogPosting', and breadcrumbs are passed in by the route rather than derived:

Builder	`@type`	Applied on
`organizationSchema`	`Organization`	every page (home, `[slug]`, services, blog/portfolio index + detail, audit)
`serviceSchema`	`Service`	`/services/[slug]`
`articleSchema`	`BlogPosting`	`/blog/[slug]`
`breadcrumbListSchema`	`BreadcrumbList`	`/services/[slug]`, `/blog/[slug]`
`faqPageSchema`	`FAQPage`	home and `/services/[slug]` when an `faqSection` has items

Organization

organizationSchema({siteSettings, siteUrl}) pulls name/description/email/telephone from the siteSettings singleton and maps siteSettings.social[].url into sameAs. The postal address is hardcoded (Indianapolis, IN, US) — WTD is a one-person Indianapolis studio, so it's a constant, not an editable field:

address: {
  '@type': 'PostalAddress',
  addressLocality: 'Indianapolis',
  addressRegion: 'IN',
  addressCountry: 'US',
},

It is added on essentially every route — including the homepage, the catch-all [slug].astro, both index pages, and /free-website-audit — because Organization markup is valid sitewide.

Service

serviceSchema({service, siteSettings, siteUrl}) builds a Service node from the service doc: name/serviceType from service.title, description from service.summary, a url of `${siteUrl}/services/${service.slug.current}`, areaServed of the United States, and a provider Organization named from siteSettings.siteName.

Article (BlogPosting)

articleSchema({post, canonical, siteSettings}) emits a BlogPosting: headline from post.title, description from post.excerpt, datePublished from post.publishedAt, and image via ogImageUrl(post.coverImage). The author (@type: Person) is omitted entirely when post.author?.name is absent, and the publisher Organization name falls back to 'White Tree Digital'.

breadcrumbListSchema(items) takes an explicit [{name, url}] array and stamps position from the index. The route supplies the trail — e.g. on a service page:

breadcrumbListSchema([
  {name: 'Home', url: `${siteUrl}/`},
  {name: 'Services', url: `${siteUrl}/services`},
  {name: service.title, url: canonical},
]),

FAQPage

faqPageSchema(items) maps an array of {question, answer} into Question / acceptedAnswer nodes. Routes only add it when a faqSection is present and has items, gathering them off the page's sections[]:

const faqItems =
  service.sections?.flatMap((s) => (s._type === 'faqSection' ? (s.items ?? []) : [])) ?? [];
// ...
...(faqItems.length ? [faqPageSchema(faqItems)] : []),

og:image and JSON-LD images come from urlFor — guard on .asset

ogImageUrl() in src/lib/image.ts returns undefined when an image field has no resolvable asset, and the lower-level urlFor() throws on an asset-less ghost image (e.g. an image whose upload was cleared but whose alt remains). An unhandled throw in a section's frontmatter blanks the whole page. Always guard on ?.asset, never on the object truthiness. See Images & fonts.

Dynamic SSR sitemap

src/pages/sitemap.xml.ts is a server endpoint, not a static file. @astrojs/sitemap was removed because it can only enumerate routes known at build time, and this site's content routes are SSR — published slugs aren't visible at build. Instead, the endpoint queries published Sanity slugs at request time so the sitemap stays current with every publish, matching the site's no-rebuild content model:

{
  "pages": *[_type == "page" && defined(slug.current)]{"slug": slug.current, _updatedAt},
  "services": *[_type == "service" && defined(slug.current)]{"slug": slug.current, _updatedAt},
  "posts": *[_type == "post" && defined(slug.current)]{"slug": slug.current, _updatedAt},
  "portfolioPosts": *[_type == "portfolioPost" && defined(slug.current)]{"slug": slug.current, _updatedAt},
  "homepageUpdated": *[_id == "homepage"][0]._updatedAt,
  "hasBlogIndex": defined(*[_id == "blogIndex"][0]),
  "hasPortfolioIndex": defined(*[_id == "portfolioIndex"][0])
}

It always uses the published sanity client — drafts never belong in a sitemap. The output includes /, /free-website-audit, the two index pages (only when their singletons exist), and one <url> per published page / service / post / portfolioPost, each stamped with <lastmod> from _updatedAt. The response sets Cache-Control: public, max-age=0, s-maxage=3600 (one hour at the edge).

Host-aware robots.txt

src/pages/robots.txt.ts is likewise a dynamic SSR endpoint that varies its body by host. Preview hosts (the staging build with PUBLIC_NOINDEX=true, or any *.workers.dev host) disallow everything; the canonical public build allows crawlers and advertises the sitemap:

const isPreviewHost =
  import.meta.env.PUBLIC_NOINDEX === 'true' || url.hostname.endsWith('.workers.dev');

const body = isPreviewHost
  ? `User-agent: *\nDisallow: /\n`
  : `User-agent: *\nAllow: /\n\nSitemap: ${siteUrl}/sitemap.xml\n`;

In the static public build this prerenders to a file; in the staging build it runs per request. Either way the disallow is correct for that deployment, and it pairs with the per-page noindex meta in Layout.astro.

503, not 404, on a Sanity outage

A CDN blip must never get pages dropped from Google's index. So every content route returns 503 with a Retry-After header when the Sanity fetch throws — distinct from a genuine not-found, which returns the branded 404. The homepage is representative:

try {
  data = (await getHomepage(await getRouteClient(Astro.cookies))) ?? EMPTY;
} catch (err) {
  // Upstream outage, not a missing page — 503 keeps Googlebot from
  // de-indexing the homepage during a Sanity blip.
  console.warn('[homepage] Sanity fetch failed:', err);
  return new Response(null, {status: 503, headers: {'Retry-After': '60'}});
}

The same 503 + Retry-After: 60 guard wraps the fetch in index.astro, [slug].astro, services/[slug].astro, blog/index.astro, blog/[slug].astro, portfolio/index.astro, and portfolio/[slug].astro. The sitemap endpoint uses the same pattern. A real not-found (a slug with no matching published doc) instead falls through to 404.astro — branded, with funnel-recovery CTAs — returned with a 404 status. See Routing & pages and Launch & operations.

Distinguish the outage path from the not-found path

The 503 only fires on a thrown fetch (network/CDN failure). A successful fetch that returns no document is a real 404 and must stay a 404 — returning 503 for genuinely missing slugs would keep dead URLs out of the "remove from index" path. Keep the try/catch (503) and the if (!doc) return 404 checks separate, exactly as the routes do today.

OG / Twitter defaults

Every page emits og:title, og:url, og:type="website", og:image, and the Twitter summary_large_image card with title and image. The image resolves per page to seo.ogImage (via ogImageUrl()), falling back to a sitewide static asset:

const ogImageUrl = ogImage ?? `${siteUrl}/og-default.png`;

public/og-default.png is the WTD-branded fallback so social shares never break, even on a page with no seo.ogImage. The OG/Twitter description tags render only when a description exists (same as the meta description), so an empty SEO field never produces an empty social description.

A few SEO-adjacent footguns live in neighboring systems:

Stega breaks string equality. In draft preview, Sanity string fields carry invisible characters; any string that drives logic (a variant switch, an object-key lookup) must be stegaClean()'d first. This matters wherever SEO-relevant rendering branches on a Sanity string. See Visual editing.
GTM is the only analytics injection point. The GA4 + HubSpot tags load through the hardcoded GTM container in Layout.astro — never add a direct GA4/HubSpot script tag. See HubSpot & lead capture.
Performance is a separate page. Page-weight, request-count, and Core Web Vitals budgets that affect ranking are documented under Performance budgets. Heading order and the broader accessibility floor are on Accessibility.

Where this lives

Concern	File
SEO floor, title fallback, canonical/OG/Twitter meta, noindex flag, JSON-LD `<script>` emission	`website/src/layouts/Layout.astro`
Normalized canonical origin (`siteUrl`)	`website/src/lib/site.ts`
JSON-LD builders (Organization/Service/BlogPosting/BreadcrumbList/FAQPage)	`website/src/lib/schema.ts`
og:image / JSON-LD image URLs (asset-guarded)	`website/src/lib/image.ts`
Dynamic SSR sitemap	`website/src/pages/sitemap.xml.ts`
Host-aware robots.txt	`website/src/pages/robots.txt.ts`
Per-route schema assembly + 503-on-outage	`website/src/pages/index.astro`, `[slug].astro`, `services/[slug].astro`, `blog/[slug].astro`, `portfolio/[slug].astro`, the two `index.astro`s
Branded not-found	`website/src/pages/404.astro`
SEO floor & launch checklist	`website/CLAUDE.md` ("SEO floor"), `_PRODUCTION LAUNCH ROADMAP.md` (Phase B)