Forward-Link Manifest (FLM)

Technical Specification for an Extended llms.txt Manifest

Introduction

A Forward-Link Manifest (FLM) is an extension to the llms.txt file that explicitly lists external URLs that a website owner endorses as trustworthy and authoritative for LLM-powered crawlers.

Core Concept: "Think of an extended llms.txt manifest that not only tells an LLM-powered crawler where it may (or may not) go, but also hands it a pre-approved set of outbound URLs—'forward links'—that the site owner vouches for."

Purpose and Benefits

The FLM provides several key technical benefits:

  • Authority & Provenance: Large-language-model search agents have difficulty deciding which external sources are trustworthy. If the site author enumerates "forward links," the agent can treat them as higher-confidence citations.
  • Improved Attribution: The crawler can quote or paraphrase from whitelisted pages knowing the site owner explicitly endorsed them, reducing the likelihood of LLM-generated hallucinations.
  • Fine-Grained Crawl Control: While llms.txt provides basic guidance for LLM crawlers, an FLM also says "these Y URLs are recommended next hops; treat them as canonical or related."
  • Efficient Crawling: A single machine-readable file is lighter to fetch than crawling every page just to extract anchors.

FLM File Structure

The FLM file (typically named flm.txt) follows a similar structure to llms.txt with additional directives for forward links:

# "flm.txt" (or any name you choose; user-agent string makes it discoverable)
User-agent: llm-search-bot
# Normal robots semantics
Disallow: /drafts/
Allow: /

# Forward links block
Forward: https://example.org/whitepaper.pdf
Forward: https://partner.example.com/api-spec
Forward: https://doi.org/10.1234/some-journal-article

# Hashes (optional integrity check)
Digest-SHA256: https://example.org/whitepaper.pdf 517f2e...
Digest-SHA256: https://partner.example.com/api-spec 9bafcd...

File Location

The FLM can be referenced in multiple ways:

  • As a standalone file at a well-known URI (e.g., /flm.txt)
  • Referenced from llms.txt with a directive like:
    Forward-manifest: /flm.txt
  • Via HTTP headers

Directive Specification

Forward

Specifies external URLs explicitly endorsed by the site owner.

Forward: absolute-URL
  • Must be an absolute URL
  • Indicates the site owner certifies this URL as relevant/authoritative
  • Can appear multiple times

Digest-SHA256

Optional cryptographic integrity check for linked content.

Digest-SHA256: URL HASH
  • Ensures the linked content hasn't been tampered with
  • Allows crawlers to verify content integrity without fetching
  • Hash should be the SHA-256 digest of the resource content

Include

References existing sitemap files to avoid duplication.

Include: sitemap.xml

Expire

Indicates when the crawler should re-verify or discard the link.

Expire: 2024-12-31

Standard llms.txt Directives

All standard llms.txt directives remain valid and can be included in the FLM file for completeness.

Crawler Behavior with FLM

When an LLM-powered search bot encounters an FLM file, it performs the following steps:

  1. Fetch FLM File: Retrieves the flm.txt file first.
  2. Parse llms.txt Rules: Respects standard llms.txt directives to understand the site's preferences for LLM crawlers.
  3. Process Forward Directives: Reads the Forward directives, recognizing these URLs as explicitly endorsed by the site owner.
  4. Trust via Certificates: Due to the cryptographic certificates provided (Digest-SHA256), the crawler does not need to visit these URLs to verify their trustworthiness. The certificates already confirm the integrity and authenticity of the linked resources.
  5. Citation Preference: When responding to user queries, the crawler prioritizes citing these pre-verified, trusted links, enhancing the accuracy and reliability of its responses.

Trust Metadata

Crawlers should maintain metadata about forward links including:

  • Source domain that vouched for the link
  • Timestamp of endorsement
  • Integrity verification status
  • Expiration date (if specified)

Implementation Guidelines

For Site Owners

  1. Create an flm.txt file in your site's root directory
  2. List only external URLs you genuinely endorse
  3. Consider adding SHA-256 hashes for critical resources
  4. Update the manifest when endorsed resources change
  5. Keep the file size reasonable (suggested limit: 1MB)

For Crawler Developers

  1. Check for FLM file existence before standard crawling
  2. Parse and validate all directives
  3. Implement hash verification for resources with digests
  4. Track trust relationships in your index
  5. Weight forward-linked resources higher in relevance scoring

Relationship to Existing Standards

Aspect llms.txt sitemap.xml FLM
Purpose LLM crawler guidance Internal page discovery External link endorsement
Scope Own site Own site External sites
Trust signals No No Yes
Integrity checks No No Yes (optional)
Important: FLM is NOT a replacement for llms.txt or sitemap.xml. It complements these existing standards by addressing the specific needs of LLM-powered crawlers for external link endorsement.

Technical Considerations

Security Considerations

  • Abuse Prevention: Spammers might "forward" to low-quality sites. Search engines will still need reputation scoring.
  • Trust Verification: Crawlers should verify that the FLM file is served from the claimed domain.
  • Hash Validation: When digests are provided, crawlers should validate them to ensure content integrity.

Performance Considerations

  • File Size: Large FLM files defeat the efficiency goal. Consider pagination or limiting entries.
  • Caching: FLM files should be cacheable with appropriate HTTP headers.
  • Update Frequency: Balance between fresh data and crawler efficiency.

Versioning and Evolution

  • Content Changes: What happens when a linked page changes? Hash digests help but require maintenance.
  • Format Evolution: Consider versioning the FLM format for future extensions.
  • Backward Compatibility: Ensure older crawlers can safely ignore unknown directives.

Standardization Path

To gain traction, the following steps are recommended:

  1. Major LLM search vendors should agree on directive names and semantics
  2. Publish a formal specification with IETF or similar body
  3. Provide reference implementations and validators
  4. Establish best practices for different use cases
Note: The FLM specification is currently a proposal. Implementation details may evolve based on community feedback and real-world usage.