AI Automation
Internal Linking on Autopilot: The Payload + Claude Pipeline We Wire for Archives Past 400 Articles
Most editorial teams stop linking internally after article 50. Here is the Payload afterChange hook, pgvector retrieval, and Claude scorer we wire so suggestions land in the editor — never auto-published — with one-click insert into Lexical.

*Internal linking dies at article 50. Below is the publish-hook pipeline we wire so it does not — without taking the editor out of the loop.*
Every Payload content team we onboard past the 200-article mark has the same problem, and they almost never name it out loud: internal linking is broken. The first fifty articles got hand-linked by the founding editor. Articles 51 through 180 got linked when someone remembered. Articles 181 onward are essentially orphans — they exist in the archive, they rank for nothing because nothing points at them, and the editor who would have linked them has long since moved on to producing the next 20 pieces this month.
The cost is real and measurable: orphaned articles bleed organic traffic, internal PageRank stays concentrated on the same six pillar pages, and editors waste 15–30 minutes per draft hunting for relevant links in a CMS search bar. At 40 articles a month that is one full editorial day, every month, on a task a retrieval system can do in 800ms. We have wired this pipeline six times now on Payload projects between 300 and 4,000 articles, and it has paid for itself inside the first quarter every time.
This post is the architecture, the schema, the hook signature, the SQL we ship, the Claude prompt shape, the token math at three archive sizes, and — the part most AI-on-CMS posts skip — the editor experience and the boundary we refuse to cross.
The boundary: suggestions in the editor, never auto-published links
Before any architecture, the rule. Every team that asks us for this pipeline asks the same follow-up: "can it just insert the links itself on publish?" The answer is no. Not because the model cannot do it — Claude Sonnet can absolutely produce anchor text and pick a target URL — but because the failure mode is invisible and compounds.
What we do automate aggressively: the retrieval, the ranking, the anchor-text proposals, the inline UI inside the Lexical editor, and the metrics on what gets accepted. The editor's job shrinks from "hunt for links" to "approve or reject six pre-scored candidates per draft" — usually under 90 seconds of work.
Architecture in one pass
The pipeline has five moving parts, all running next to the Payload install — no external vector DB, no separate inference service, no Pinecone subscription:
Payload afterChange hook on the `articles` collection fires when a draft is saved or updated.
BullMQ job debounces and queues an embedding + suggestion task (so saving a draft 12 times in 5 minutes does not run the pipeline 12 times).
Embedding worker generates a vector for the draft body, upserts it into an `articleEmbeddings` collection backed by pgvector.
Retrieval layer runs cosine similarity in Postgres with a recency boost and topic-cluster filter, returns 8 candidates.
Claude scorer receives the draft excerpt + 8 candidates, returns 3–6 ranked suggestions with proposed anchor text, which a custom Payload field renders inline.
Everything lives in the same Postgres database Payload already uses. The vector column is a pgvector `vector(1536)` next to the article metadata. No separate infrastructure to monitor, back up, or pay for.
Schema: keeping embeddings next to Payload, not in a vector DB
Half the AI-on-CMS architectures we audit start with "we put the embeddings in Pinecone." For an archive under 100k documents, that is operational overhead in exchange for nothing. Postgres with pgvector handles 5M+ rows of 1536-dim vectors comfortably on a single 4-vCPU instance. The collection looks like this:
import type { CollectionConfig } from 'payload'
export const ArticleEmbeddings: CollectionConfig = {
slug: 'article-embeddings',
admin: { hidden: true },
access: { read: () => true, create: () => false, update: () => false },
fields: [
{
name: 'article',
type: 'relationship',
relationTo: 'articles',
required: true,
index: true,
},
{
name: 'embedding',
type: 'json', // pgvector column added via migration; Payload sees jsonb
required: true,
},
{
name: 'contentHash',
type: 'text',
required: true,
index: true, // skip re-embedding if hash unchanged
},
{
name: 'topicCluster',
type: 'select',
options: ['product', 'engineering', 'opinion', 'tutorial', 'news'],
index: true,
},
{
name: 'wordCount',
type: 'number',
},
{
name: 'lastEmbeddedAt',
type: 'date',
},
],
}The migration that adds the actual `pgvector` column and index — Payload does not natively know about vector types, so we run this as a custom migration alongside the generated ones:
-- migrations/20250112_add_pgvector.sql
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE article_embeddings
ADD COLUMN embedding_vec vector(1536);
-- HNSW outperforms IVFFlat for our read pattern (frequent inserts,
-- small-k cosine queries). m=16, ef_construction=64 is the sweet
-- spot for archives between 1k and 50k documents.
CREATE INDEX article_embeddings_vec_hnsw
ON article_embeddings
USING hnsw (embedding_vec vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX article_embeddings_cluster_recent
ON article_embeddings (topic_cluster, last_embedded_at DESC);The afterChange hook: debouncing is the part most posts skip
The naive version of this hook re-embeds and re-scores on every save. Editors save constantly — autosave, manual saves, draft-as-they-type. On one project, we counted 47 saves on a single 1,200-word draft before publish. Without debouncing, that is 47 embedding calls and 47 Claude calls per article. The hook queues; a debounced worker actually runs:
import type { CollectionAfterChangeHook } from 'payload'
import { Queue } from 'bullmq'
import { createHash } from 'node:crypto'
const suggestionQueue = new Queue('link-suggestions', {
connection: { url: process.env.REDIS_URL! },
})
export const queueLinkSuggestions: CollectionAfterChangeHook = async ({
doc,
previousDoc,
operation,
req,
}) => {
// Only run on drafts in review or above; published articles are frozen.
if (doc._status === 'published') return doc
if (operation === 'create' && !doc.body) return doc
const bodyText = extractPlainText(doc.body) // strip Lexical to text
const contentHash = createHash('sha256').update(bodyText).digest('hex')
// Skip if the body did not meaningfully change.
if (previousDoc?.contentHash === contentHash) return doc
await suggestionQueue.add(
`article-${doc.id}`,
{ articleId: doc.id, contentHash, tenantId: req.user?.tenant },
{
// Debounce: jobs with the same id within 30s collapse into one.
jobId: `article-${doc.id}`,
delay: 30_000,
removeOnComplete: 100,
removeOnFail: 50,
attempts: 3,
backoff: { type: 'exponential', delay: 5_000 },
},
)
return doc
}The `jobId` collision is the load-bearing detail. BullMQ treats a re-added job with the same id as the existing one — so 47 saves in 5 minutes become one job that runs 30 seconds after the last save. We learned this the hard way on a project where the first version of this hook generated $340 in Claude calls in a single afternoon during an editorial sprint.
Retrieval: pgvector cosine + recency + topic cluster
Cosine similarity alone gives you semantically near articles, which is mostly what you want — but "semantically near" includes the article you wrote two years ago that has since been superseded. We layer two filters on top: a recency boost (decay over 18 months) and a topic-cluster constraint (an engineering post should not be suggesting links to opinion pieces). The SQL we actually ship:
-- Returns 8 candidate articles for a given draft embedding.
-- $1 = draft embedding vector, $2 = topic cluster, $3 = draft article id
SELECT
a.id,
a.title,
a.slug,
a.excerpt,
ae.topic_cluster,
-- Cosine distance (lower = more similar)
(ae.embedding_vec <=> $1::vector) AS distance,
-- Recency boost: articles within 18 months score better
EXTRACT(EPOCH FROM (NOW() - a.published_at)) / 86400.0 AS days_old,
-- Composite score: similarity + recency decay
(ae.embedding_vec <=> $1::vector) +
LEAST(0.15, EXTRACT(EPOCH FROM (NOW() - a.published_at)) / 86400.0 / 3650.0)
AS composite_score
FROM article_embeddings ae
JOIN articles a ON a.id = ae.article
WHERE a._status = 'published'
AND a.id != $3
AND ae.topic_cluster IN ($2, 'tutorial') -- allow cross-link to tutorials
AND a.published_at > NOW() - INTERVAL '36 months'
ORDER BY composite_score ASC
LIMIT 8;p95 query time on a 4,200-article archive with the HNSW index: 18ms. We feed those 8 candidates to Claude and ask it to pick 3–6, propose anchor text, and rank them. The model is allowed to reject candidates — that is deliberate.
The Claude prompt: structured outputs, anchor constraints, and the rejection slot
We send the draft excerpt (first 600 words) and the 8 candidates as a structured input. The prompt asks for structured JSON output using the Anthropic tool-use schema, with explicit constraints on anchor text. The constraints matter more than the prompt poetry:
Anchor text must appear verbatim in the draft body (the editor will click-insert; the anchor must already exist as a phrase).
Anchor text must be 2–7 words, not a full sentence, not a single common word.
Anchor text must not be promotional ("learn more", "click here", "our pricing page") — banned list of 40 phrases.
The model may return 0–6 suggestions. Returning fewer is fine; padding is worse than silence.
Each suggestion includes a confidence score (0–1) and a one-line rationale the editor sees on hover.
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
export async function scoreSuggestions(
draftExcerpt: string,
candidates: Candidate[],
): Promise<Suggestion[]> {
const response = await client.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1500,
tools: [
{
name: 'propose_internal_links',
description:
'Propose 0–6 internal link suggestions. Anchor text MUST appear verbatim in the draft. Reject candidates that are off-topic, outdated, or only superficially similar.',
input_schema: {
type: 'object',
properties: {
suggestions: {
type: 'array',
maxItems: 6,
items: {
type: 'object',
properties: {
candidateId: { type: 'string' },
anchorText: { type: 'string', minLength: 4, maxLength: 60 },
confidence: { type: 'number', minimum: 0, maximum: 1 },
rationale: { type: 'string', maxLength: 140 },
},
required: ['candidateId', 'anchorText', 'confidence', 'rationale'],
},
},
},
required: ['suggestions'],
},
},
],
tool_choice: { type: 'tool', name: 'propose_internal_links' },
messages: [
{
role: 'user',
content: buildPrompt(draftExcerpt, candidates),
},
],
})
return extractToolUse(response, 'propose_internal_links').suggestions
}Token economics at three archive sizes
The single question every CTO asks before signing off: what does this cost per article, and where does it break? Measured on our last three Payload builds, all using Claude Sonnet 4.5 for scoring and text-embedding-3-small for vectors:
400-article archive, ~25 new drafts/month: roughly $0.04–$0.07 per draft scored. Monthly Claude + embeddings spend lands $4–$8.
2,000-article archive, ~40 new drafts/month + occasional re-scoring on edits: $0.06–$0.11 per draft. Monthly spend $10–$22.
10,000-article archive, ~60 new drafts/month + quarterly full re-embed: $0.08–$0.14 per draft live, plus a one-off ~$30 re-embed run every 3 months when we tune the prompt or upgrade the embedding model.
Embedding cost is negligible at every size — it is the scorer (the 8-candidate Claude call per debounced draft) that dominates.
The economics never get scary. They do not scale with archive size — they scale with publishing velocity. A team shipping 500 articles a month would pay 10× a team shipping 50, regardless of how big the back catalog is. That is the right shape for a content-ops budget line.
The editor experience: a custom Payload field in the Lexical sidebar
Architecture is half the work. The other half is making the suggestions land somewhere an editor will actually use them. We ship a custom Payload field component — registered as a sidebar field on the `articles` collection — that polls the suggestion job status, renders the 3–6 candidates inline, and on click inserts an anchor link into the Lexical body at the exact location where the anchor phrase already exists:
'use client'
import { useField, useDocumentInfo } from '@payloadcms/ui'
import { useEffect, useState } from 'react'
import { useLexicalComposerContext } from '@lexical/react/LexicalComposerContext'
export const LinkSuggestionsField: React.FC = () => {
const { id } = useDocumentInfo()
const [editor] = useLexicalComposerContext()
const [suggestions, setSuggestions] = useState<Suggestion[]>([])
const [status, setStatus] = useState<'idle' | 'pending' | 'ready'>('idle')
useEffect(() => {
if (!id) return
const poll = setInterval(async () => {
const res = await fetch(`/api/link-suggestions/${id}`)
const data = await res.json()
setStatus(data.status)
if (data.status === 'ready') {
setSuggestions(data.suggestions)
clearInterval(poll)
}
}, 3000)
return () => clearInterval(poll)
}, [id])
const insertLink = (s: Suggestion) => {
editor.update(() => {
// Find the anchor phrase in the editor state, wrap it in a LinkNode
// pointing to s.targetUrl. Skip if already linked.
wrapAnchorInLink(editor, s.anchorText, s.targetUrl)
})
// Fire telemetry: which suggestion was accepted, confidence, position.
void recordAcceptance(id, s)
}
if (status === 'pending') return <p>Scoring suggestions…</p>
if (!suggestions.length) return <p>No suggestions yet — save the draft.</p>
return (
<ul className="link-suggestions">
{suggestions.map((s) => (
<li key={s.candidateId}>
<button onClick={() => insertLink(s)}>
Insert "{s.anchorText}" → {s.targetTitle}
</button>
<span title={s.rationale}>{(s.confidence * 100).toFixed(0)}%</span>
</li>
))}
</ul>
)
}The acceptance telemetry is not optional. We log which suggestions get inserted, which get ignored, and which get explicitly dismissed — that becomes the feedback loop for prompt tuning every quarter.
Production monitoring: drift, regressions, and the Sentry alert that pays for itself
What we watch in production after this ships, and what alerts in Sentry:
Suggestion acceptance rate — healthy range is 35–55%. Below 25% for two consecutive weeks triggers a prompt review.
Average suggestions per draft — should stabilise around 4. If it drops to 1–2 consistently, the model is being too conservative or the topic cluster filter is too tight.
Embedding drift — quarterly cosine-distance distribution check; if the mean shifts more than 0.08 between quarters, the editorial style has changed enough that we re-prompt the scorer.
Claude API errors and tool-use validation failures — alerted at first occurrence (these are nearly always a prompt regression after we change the schema).
Job queue depth — if BullMQ backlog exceeds 20 jobs, we autoscale the worker; sustained backlog usually means Redis is undersized.
We hit a memorable regression on one project: a Payload version bump silently changed the shape of the Lexical JSON body, and our `extractPlainText` helper started returning empty strings. The hook still fired, the worker still ran, the embedding still got generated — for an empty document. Suggestions cratered to zero. The Sentry alert on "average suggestions per draft below 1" caught it within 36 hours. Without it, three weeks of drafts would have shipped unlinked before anyone noticed. Now every project ships with that alert on day one.
When this is the wrong tool
We do not pitch this pipeline to every Payload client. Three situations where we tell teams to skip it:
Newsrooms under 50 articles. A human can hand-link the whole archive in a morning. The pipeline costs more in setup than it saves for the next year.
Highly regulated verticals (medical, financial, legal) where every internal link is a content-compliance question. The model-proposed anchor text becomes a review burden, not a time saver.
Teams without an editor in the loop. This pipeline assumes someone reviews the suggestions. If the plan was "set it and forget it," we recommend a different tool — and a different conversation about what your content team actually needs.
What a 3-week rollout looks like on an existing Payload install
Week one: schema migration, pgvector install, backfill embeddings for the existing archive (a one-time job, typically 20–90 minutes wall-clock for archives under 5,000 articles). Week two: the hook, the BullMQ worker, the scorer, the prompt tuning against a sample of 30 drafts the editorial team flags as "this is what good links look like." Week three: the custom field component, the acceptance telemetry, the Sentry alerts, editor training (45 minutes — the UI is one panel).
The whole thing slots into an existing Payload install without touching the rest of the content model. We have done it on Payload 3 projects running on Vercel and on self-hosted Postgres on Hetzner; the architecture does not change.
If you are evaluating Payload for a content team and want the full picture — collections, hooks, editorial UX, and the AI workflows we layer on top — See how we ship Payload CMS builds end-to-end.
If you are sitting on a 400+ article archive and watching organic traffic flatten while editors burn hours hunting for links — tell us what you are wiring up. Send us the content model, the rough archive size, and the publishing velocity; we will tell you whether this pipeline fits, what we would change for your setup, and what we would skip.
On every Payload project past 300 articles we now ship this pipeline by default — afterChange hook, BullMQ debouncing, pgvector retrieval with topic-cluster filter, Claude scorer with structured outputs, Lexical sidebar field, acceptance telemetry, the Sentry alerts. It pays for itself the first month against editor hours saved, and the second month against the orphaned-article traffic it un-strands. The boundary stays the same on every install: suggestions in the editor, never auto-published links.
// After the call
Questions operators ask next
Does this pipeline work with Payload Local API writes, or only when editors save through the admin?
Both. The afterChange hook fires on any write — admin UI, Local API, or REST. We add an `if (req.context?.skipLinkSuggestions)` short-circuit so bulk imports and migrations can opt out, otherwise a 10,000-article import would queue 10,000 scorer jobs.
How does it handle multi-tenant Payload installs where suggestions must stay within a tenant?
The retrieval SQL takes a `tenantId` parameter and filters at the WHERE clause. The embedding table has a tenant-scoped index. We have shipped this on a 3-tenant publishing platform; the only gotcha is making sure the BullMQ job carries the tenant id, since the worker runs outside Payload's request context.
Will pgvector hold up at 50,000+ articles, or do we need a dedicated vector DB?
Yes, comfortably. With an HNSW index (m=16, ef_construction=64) on a 4-vCPU Postgres instance, p95 cosine queries on 50k 1536-dim vectors stay under 30ms. We would consider a dedicated vector DB above ~2M documents or if you need c
Pull quote
Suggestions in the editor, never auto-published links. The day a model writes anchor text into your live archive without a human approving it is the day your editorial brand starts decaying in ways you cannot see.