Citelayer® Content-Signal Headers — Control How AI Uses Your Content

5 min read

Updated Mar 19, 2026

What Content-Signal Headers Are

When an AI crawler reads your content, it doesn’t automatically know your preferences. Should it use your articles to train future AI models? Should it cite your content in AI-generated search results? Should it include your pages in retrieval-augmented generation (RAG) systems that assemble answers on the fly?

The contentsignals.org standard provides a way to express those preferences in machine-readable form, as an HTTP header that travels with every page response. Content-Signal Headers don’t replace copyright law or licensing agreements — but they do give AI systems that choose to respect them a clear, unambiguous statement of your intent.

Citelayer® implements this standard directly. You configure your preferences in the plugin settings, and the plugin adds the appropriate Content-Signal header to every frontend response.

The Three Signals

The Content-Signal header carries up to three distinct signals, each corresponding to a specific use case:

AI Training (`ai-train`)

This signal controls whether AI companies may use your content as training data for building or improving their models.

ai-train: no — You do not grant permission for AI training use.
ai-train: yes — You explicitly permit AI training use.

Citelayer® default: no

The default reflects a reasonable conservative stance: training an AI model on your content creates a potentially permanent, largely untrackable use. Unless you have a specific reason to allow it, the default keeps your options open.

Search Indexing (`search`)

This signal addresses AI-powered search systems — platforms like Perplexity, ChatGPT’s web browsing feature, and similar tools that index content to answer user queries.

search: no — You do not want AI search systems to index your content.
search: yes — You allow AI search indexing.

Citelayer® default: yes

For most sites, AI search indexing is desirable. It’s how your content gets cited when users ask AI assistants questions relevant to what you’ve published. The default reflects this.

AI Responses and RAG (`ai-input`)

This signal applies to retrieval-augmented generation — situations where an AI system retrieves relevant content from its index and incorporates it when composing a response to a user prompt.

ai-input: no — You prefer your content not be used directly in AI-generated responses.
ai-input: yes — You allow your content to be used as input for AI responses.

Citelayer® default: yes

Being cited in AI responses is generally what site owners want. The default allows it.

What the Header Looks Like

With Citelayer®’s defaults configured, every frontend page response includes a header like this:

Content-Signal: ai-train=no, search=yes, ai-input=yes

You can verify this using your browser’s developer tools (Network tab → select a request → response headers) or a tool like curl:

curl -I https://yoursite.com/ | grep -i content-signal

Changing Your Settings

Navigate to Citelayer® → Settings → Markdown (the Content-Signal settings live under the Markdown tab in the admin). You’ll see three toggles corresponding to the three signals. Enable or disable each according to your preferences.

Changes take effect immediately for new requests. If your site uses server-side or CDN caching, cached responses may still carry the old header until caches are purged.

Relationship to robots.txt and Meta Tags

Content-Signal Headers work alongside other AI-visibility signals, not instead of them. Understanding the relationship:

robots.txt controls whether AI bots are allowed to fetch your pages at all. If you block GPTBot in robots.txt, it won’t read your pages regardless of what Content-Signal headers say.

Meta robots tags (like ) tell crawlers not to index specific pages. A noindex page with permissive Content-Signal headers presents a contradiction — the meta tag is likely to be respected by well-behaved crawlers.

Content-Signal Headers operate at a higher level: they express usage permissions for content that has already been fetched and indexed. Think of the three layers as: robots.txt (access), meta tags (indexing), Content-Signal (usage rights).

For a complete technical audit of how these settings interact on your site, run the AI Readiness Scanner. The scanner’s Content-Signal check verifies both that the feature is configured and that the header appears in actual live responses.

How Citelayer® Verifies the Header is Working

The AI Readiness Scanner includes a specific Content-Signal check that goes beyond checking your settings. It makes a live HTTP HEAD request to your homepage using wp_remote_head() and inspects the actual response headers. This catches situations where a caching layer, CDN, or server configuration strips headers before they reach crawlers.

If the scanner shows a warn status for Content-Signal (configured but not confirmed in live response), check your CDN or server caching configuration. Most CDNs can be configured to pass through custom headers, but some require explicit allow-lists.

Honest Caveats

Compliance with Content-Signal headers is voluntary. There is no enforcement mechanism. A company that doesn’t implement support for the standard, or that ignores it, will crawl and use your content regardless of what your headers say.

That said, the standard is gaining traction. Major AI labs have publicly acknowledged awareness of emerging content permission standards, and the operational cost of implementing header respect is low. The contentsignals.org specification is open and documented.

Setting these headers costs you nothing and takes effect immediately. On balance, it’s worth doing.