Introduction
A Forward-Link Manifest (FLM) is an extension to the llms.txt
file that explicitly lists external URLs that a website owner endorses as trustworthy and authoritative for LLM-powered crawlers.
Purpose and Benefits
The FLM provides several key technical benefits:
- Authority & Provenance: Large-language-model search agents have difficulty deciding which external sources are trustworthy. If the site author enumerates "forward links," the agent can treat them as higher-confidence citations.
- Improved Attribution: The crawler can quote or paraphrase from whitelisted pages knowing the site owner explicitly endorsed them, reducing the likelihood of LLM-generated hallucinations.
- Fine-Grained Crawl Control: While
llms.txt
provides basic guidance for LLM crawlers, an FLM also says "these Y URLs are recommended next hops; treat them as canonical or related." - Efficient Crawling: A single machine-readable file is lighter to fetch than crawling every page just to extract anchors.
FLM File Structure
The FLM file (typically named flm.txt
) follows a similar structure to llms.txt
with additional directives for forward links:
# "flm.txt" (or any name you choose; user-agent string makes it discoverable)
User-agent: llm-search-bot
# Normal robots semantics
Disallow: /drafts/
Allow: /
# Forward links block
Forward: https://example.org/whitepaper.pdf
Forward: https://partner.example.com/api-spec
Forward: https://doi.org/10.1234/some-journal-article
# Hashes (optional integrity check)
Digest-SHA256: https://example.org/whitepaper.pdf 517f2e...
Digest-SHA256: https://partner.example.com/api-spec 9bafcd...
File Location
The FLM can be referenced in multiple ways:
- As a standalone file at a well-known URI (e.g.,
/flm.txt
) - Referenced from
llms.txt
with a directive like:Forward-manifest: /flm.txt
- Via HTTP headers
Directive Specification
Forward
Specifies external URLs explicitly endorsed by the site owner.
Forward: absolute-URL
- Must be an absolute URL
- Indicates the site owner certifies this URL as relevant/authoritative
- Can appear multiple times
Digest-SHA256
Optional cryptographic integrity check for linked content.
Digest-SHA256: URL HASH
- Ensures the linked content hasn't been tampered with
- Allows crawlers to verify content integrity without fetching
- Hash should be the SHA-256 digest of the resource content
Include
References existing sitemap files to avoid duplication.
Include: sitemap.xml
Expire
Indicates when the crawler should re-verify or discard the link.
Expire: 2024-12-31
Standard llms.txt Directives
All standard llms.txt
directives remain valid and can be included in the FLM file for completeness.
Crawler Behavior with FLM
When an LLM-powered search bot encounters an FLM file, it performs the following steps:
- Fetch FLM File: Retrieves the
flm.txt
file first. - Parse llms.txt Rules: Respects standard
llms.txt
directives to understand the site's preferences for LLM crawlers. - Process Forward Directives: Reads the
Forward
directives, recognizing these URLs as explicitly endorsed by the site owner. - Trust via Certificates: Due to the cryptographic certificates provided (
Digest-SHA256
), the crawler does not need to visit these URLs to verify their trustworthiness. The certificates already confirm the integrity and authenticity of the linked resources. - Citation Preference: When responding to user queries, the crawler prioritizes citing these pre-verified, trusted links, enhancing the accuracy and reliability of its responses.
Trust Metadata
Crawlers should maintain metadata about forward links including:
- Source domain that vouched for the link
- Timestamp of endorsement
- Integrity verification status
- Expiration date (if specified)
Implementation Guidelines
For Site Owners
- Create an
flm.txt
file in your site's root directory - List only external URLs you genuinely endorse
- Consider adding SHA-256 hashes for critical resources
- Update the manifest when endorsed resources change
- Keep the file size reasonable (suggested limit: 1MB)
For Crawler Developers
- Check for FLM file existence before standard crawling
- Parse and validate all directives
- Implement hash verification for resources with digests
- Track trust relationships in your index
- Weight forward-linked resources higher in relevance scoring
Relationship to Existing Standards
Aspect | llms.txt | sitemap.xml | FLM |
---|---|---|---|
Purpose | LLM crawler guidance | Internal page discovery | External link endorsement |
Scope | Own site | Own site | External sites |
Trust signals | No | No | Yes |
Integrity checks | No | No | Yes (optional) |
llms.txt
or sitemap.xml
. It complements these existing standards by addressing the specific needs of LLM-powered crawlers for external link endorsement.
Technical Considerations
Security Considerations
- Abuse Prevention: Spammers might "forward" to low-quality sites. Search engines will still need reputation scoring.
- Trust Verification: Crawlers should verify that the FLM file is served from the claimed domain.
- Hash Validation: When digests are provided, crawlers should validate them to ensure content integrity.
Performance Considerations
- File Size: Large FLM files defeat the efficiency goal. Consider pagination or limiting entries.
- Caching: FLM files should be cacheable with appropriate HTTP headers.
- Update Frequency: Balance between fresh data and crawler efficiency.
Versioning and Evolution
- Content Changes: What happens when a linked page changes? Hash digests help but require maintenance.
- Format Evolution: Consider versioning the FLM format for future extensions.
- Backward Compatibility: Ensure older crawlers can safely ignore unknown directives.
Standardization Path
To gain traction, the following steps are recommended:
- Major LLM search vendors should agree on directive names and semantics
- Publish a formal specification with IETF or similar body
- Provide reference implementations and validators
- Establish best practices for different use cases