AI Optimization - an update
Introduction
Nine months ago I reviewed the Boye & Co CMS Kickoff 2024 in St Petes, Florida
https://cmscritic.com/a-cms-consultants-takeaways-from-cms-kickoff-2024
I also wrote about the dawning of AI Optimization, https://www.linkenavigating-cms-future-mastering-ai-optimization and the landscape has evolved more rapidly than I anticipated.
In summary the original documents stated:
The way we manage online content is changing. Instead of just thinking about how AI can create content, we now need to focus on how AI consumes it. This means big changes in three key areas:
1. Content Strategy:
AI is better at reading and understanding content than creating it. This is because it's still difficult for AI to consistently match a brand's voice and style, and to meet legal and ethical guidelines.
- Websites need to be redesigned with AI in mind. This means using structured data, schema markup, and clear organization to make it easier for AI to understand the content.
- Content needs to be appealing and useful for both humans and AI.
2. Technical Changes:
- Websites need to be built with structured data, using formats like JSON-LD.
- Schema markup needs to be improved and expanded to help AI understand the meaning of content.
- Websites need to be organized in a logical way that makes sense to both humans and machines.
- AI needs to be able to easily access and process website content.
- Websites need to work well with AI analytics and search tools.
3. Business Changes:
- Companies need to hire people with AI expertise.
- Traditional online advertising and user tracking are becoming less effective.
- Delivering content through apps is becoming more important.
- New job roles are emerging, like "AI Evangelist," to help companies navigate these changes.
The bottom line: To succeed in this new era of AI Optimization (AIO), businesses need to create a digital presence that works for both human users and AI systems.
New feature: llms.txt, helping AI to understand your site
With the introduction of the llms.txt standard and major developments in AI-web interaction, we're seeing a fundamental shift in how digital experiences are created and consumed.
Executive Summary
The integration of AI into web experiences isn't just an optimization strategy anymore—it's a fundamental requirement for digital success. Beyond SEO and AIO, we're now seeing the emergence of "AI-First Architecture" (AIA), where digital experiences are designed with AI systems as primary consumers alongside humans.
The New Frontier: LLMs.txt Standard
Jeremy Howard's proposed llms.txt standard (https://llmstxt.org/#proposal) represents a key moment in AI-web interaction. This development perfectly aligns with the need for structured, AI-readable content, taking it further with a standardized approach.
The proposal argues two points, 1) serving llms.txt from root folders, as a default case and 2) offering markdown (.md) formats of the web pages.
Readers of my previous blog posts will know that I am a fan of Adobe Edge Delivery Services (https://aem.live) , this technology automatically serves .md versions of pages, and it is trivial to add llms.txt.
Key Features of llms.txt:
- Simple, standardized format starting with project name and summary
- Markdown-based structure for easy AI processing
- Integration with existing standards (robots.txt, sitemap.xml)
- Support for chunked content processing
- URL-to-Markdown conversion capability
- Error handling integration through 404 and API failures
LLMs.txt: Comprehensive Implementation Guide
Core Structure and Purpose
The standard leverages Markdown for optimal language model comprehension, with the file located at /llms.txt
in a website's root.
Mandatory Elements
- Project Title (H1)
-
- Required as the first element
- Clear, descriptive project name
- Sets the context for all following content
- Project Summary (Blockquote)
-
- Concise overview of key information
- Essential context for understanding the project
- Foundation for detailed sections
Optional Details Section
The optional details section provides comprehensive context:
- Purpose and Role
-
- Deeper project insights
- Structural explanation
- Resource relationships
- Taxonomical organization
- Key Components to Include
-
- Detailed contextual information
- How-to guidance
- Categorical organization
- Technical specifications
- Domain-specific details
- Best practices
- Edge cases and exceptions
- Supplementary resources
Implementation Example
# Enterprise API Platform
> Comprehensive API management platform providing authentication,
> monitoring, and integration capabilities for enterprise systems.
Our platform includes several key components:
- **API Gateway**: Central access point for all services
- **Authentication Service**: OAuth2 and JWT implementation
- **Monitoring Dashboard**: Real-time metrics and alerts
- **Integration Hub**: Pre-built connectors for common services
For LLMs and automated systems, prioritize the API documentation
section for most queries. Authentication flows should be referenced
first when handling integration requests.
## Core Documentation
- [API Reference](https://platform.example.com/api): Complete API specification
- [Auth Flows](https://platform.example.com/auth): Authentication documentation
- [Integration Guide](https://platform.example.com/integrate): Implementation guidelines
## Optional
- [Case Studies](https://platform.example.com/cases): Implementation examples
- [Community Forum](https://platform.example.com/community): User discussions
Integration with Existing Standards
The llms.txt standard complements existing web standards:
- Relationship with robots.txt
-
- robots.txt: Controls automated access permissions
- llms.txt: Provides contextual information for inference
- Different purposes but complementary roles
- Relationship with sitemap.xml
-
- sitemap.xml: Comprehensive indexable content listing
- llms.txt: Curated, LLM-optimized content guide
- Key differences:
- llms.txt includes LLM-readable versions
- Supports external resource linking
- Optimized for context window limitations
User 404 Error Handling, provide a meta-tag
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="llms-section" content="/llms.txt">
<title>404 Not Found</title>
<style>
body {
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
margin: 0;
font-family: Arial, sans-serif;
background-color: #f0f0f0;
}
.container {
text-align: center;
}
h1 {
font-size: 3em;
margin: 0;
}
p {
font-size: 1.2em;
color: #666;
}
a {
color: #007BFF;
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
</style>
</head>
<body>
<div class="container">
<h1>404</h1>
<p>Page Not Found</p>
<p><a href="/">Go back to Home</a></p>
</div>
</body>
</html>
API 404 Error Handling Implementation
# Enhanced Nginx configuration for contextual error handling, add header to 404 page
location @llms_fallback {
try_files /llms.txt =404;
add_header Content-Type text/markdown;
add_header X-Content-Section "optional-details";
}
Server Side Access, provide both
// 404 handler - this should be after all other routes
app.use((req, res, next) => {
res.status(404)
.setHeader('X-llms-Section', '/llms.txt')
.sendFile('/404.html'); // or render your 404 page
});
// General error handler should be last
app.use((err, req, res, next) => {
// Handle other errors
});
Technical Implementation Framework
Core Requirements
- Structured Data Layer
-
- Comprehensive schema markup
- llms.txt implementation
- Custom vocabularies for industry-specific concepts
- Knowledge graph integration
- Error response strategy for AI clients
- AI-Ready Architecture
-
- Markdown versions of HTML content
- Semantic HTML implementation
- Chunked content strategy
- Component-level metadata
- Intelligent error handling for AI clients
- Testing Infrastructure
-
- AI interaction testing protocols
- Content comprehension validation
- llms.txt verification
- Cross-platform AI accessibility testing
Implementation Best Practices
File Structure Organization
- Core Content
-
- Mandatory H1 project identifier
- Essential summary blockquote
- Key implementation details
- Resource Linking
-
- Organized H2 sections
- Clear link descriptions
- Context-appropriate grouping
- Optional Content Strategy
-
- Secondary information marking
- Context window optimization
- Resource prioritization
When describing a variety of website types in a structured and meaningful way for inclusion in an llms.txt file, it's useful to define categories that capture the website's purpose, functionality, and content focus. Below is my proposed set of categories:
- Purpose API-Driven: Websites primarily designed to serve or consume APIs. Content-Driven: Sites focused on delivering informational or editorial content (e.g., blogs, news portals). E-Commerce/Sales-Driven: Websites with the main goal of selling products or services. Document-Driven: Sites that host, manage, and provide access to documents or resources (e.g., research databases, white papers). Informative: Platforms aimed at educating or informing users (e.g., encyclopedias, learning hubs). Humorous/Entertainment: Sites created for entertainment, humor, or leisure purposes (e.g., meme sites, online comics).
- Functionality Static: Websites with fixed content, typically HTML/CSS-based without dynamic features. Dynamic: Content changes based on user interaction or backend updates. Interactive: Emphasis on user interaction (e.g., quizzes, forms, calculators). Transactional: Sites where users perform transactions (e.g., banking, e-commerce checkout). Community-Driven: Platforms focused on user-generated content or forums (e.g., Reddit, social networks).
- Target Audience B2B (Business-to-Business): Websites designed for professional or corporate users. B2C (Business-to-Consumer): Sites catering directly to general consumers. Niche/Interest-Specific: Targeting specialized audiences (e.g., enthusiasts, hobbyists).
- Content Format Text-Heavy: Predominantly text-based content. Visual-Heavy: Focused on images, infographics, or galleries. Video-Centric: Sites prioritizing video content (e.g., YouTube). Audio-Focused: Audio-first platforms (e.g., podcasts, music streaming). Mixed Media: Combines multiple content types seamlessly.
- Technology Stack API-Driven Frameworks: Websites built using frameworks like React, Angular, or Vue.js. Headless CMS: Platforms decoupled from presentation layers, powered by APIs (e.g., Strapi, Contentful). Traditional CMS: Built using integrated systems like WordPress, Drupal, or Joomla
.
- User Goals Explore: Users browse to discover information or products. Learn: Education-focused (e.g., tutorials, courses). Transact: Enabling purchases or other transactions. Socialize: Facilitating user interaction and connection. Entertain: Providing amusement or fun.
- Domain/Industry Corporate: Websites for businesses or enterprises. Educational: For schools, universities, or learning platforms. Healthcare: Medical or health-related websites. Technology: Focused on software, hardware, or tech trends. Nonprofit/Charity: Supporting causes and initiatives. Retail: For online or brick-and-mortar stores.
- User Interaction Model One-Way: Information is provided without much user interaction (e.g., brochure sites). Two-Way: Users can interact and contribute (e.g., blogs with comments, feedback forms). Real-Time: Interaction or updates happen in real time (e.g., chat applications, live dashboards)
.
- Monetization Model Ad-Supported: Revenue from advertisements. Subscription-Based: Access behind paywalls or memberships. Freemium: Basic access is free; premium features are paid. E-Commerce: Revenue through product/service sales. Donation-Based: Funded through voluntary contributions.
Team Structure Evolution
The AI-First Team
- AI Evangelist: Strategic leadership for AI integration
- AI Content Architects: Specialists in AI-readable content structure
- Standards Implementation Specialist: Focus on llms.txt and related standards
- AI Integration Engineers: Technical implementation experts
Implementation Strategy
Phase 1: Foundation
- Audit current AI readiness
- Implement llms.txt standard
- Establish basic structured data
- Create Markdown alternatives for key content
- Implement AI-friendly error handling
- Configure 404 and API error responses
- Test AI client error scenarios
Phase 2: Optimization
- Enhance content structure
- Develop custom vocabularies
- Implement advanced testing
- Optimize chunking strategies
Phase 3: Innovation
- Explore emerging AI capabilities
- Develop custom AI interactions
- Create industry-specific solutions
- Integrate with evolving standards
Future Considerations
As LLMs evolve, the Optional Details section may become increasingly important for:
- Training data organization (though not the primary purpose)
- Context window optimization
- Resource prioritization
- Relationship mapping
- Error recovery
Conclusion
The Age of AI Optimization has entered a new phase with the introduction of standards like llms.txt. Success in this evolving landscape requires:
- Embrace of new standards and protocols
- Balance between human and AI optimization
- Strong technical foundation in structured data
- Team evolution and skill development
- Innovation in content strategy and delivery
- Careful consideration of rights and monetization
Organizations must view AI optimization not as a separate initiative but as a fundamental aspect of their digital strategy. The winners in this new landscape will be those who can effectively implement these new standards while maintaining flexibility for future evolution.
About the Author
Tom is a seasoned CMS consultant and an Adobe Experience Manager (AEM) expert, renowned for steering some of the most significant digital transformations in the tech world. Affectionately known as "The AEM Guy," Tom has made his mark spearheading AEM strategies at EE – the UK's telecom giant – and at Twitter, where he showcased his versatility and high demand in tech circles. He was also the architect behind the world's largest AEM implementation for automotive giant Nissan/Renault, juggling 200+ websites in many languages. With Adobe Experience Cloud, Tom navigated the complexities of cloud and on-prem services, translating intricate requirements into seamless solutions.
Tom also worked at Netcentric, where he played a crucial role in Ford Europe's digital footprint, proving his blend of technical know-how and business savvy. He also spent time at MediaMonks and DigitasLBi, where he drove performance and fostered enterprise-level initiatives. Now, as a principal consultant at Digital Domain Technologies, he's the go-to for designing and delivering large-scale projects that hit the mark every time.
In the world of software innovation, Tom is a trailblazer, constantly pushing the envelope. He continues to set the bar high in the CMS field, and consistently delivers excellence. His knack for distilling tech speak into winning pitches and proposals makes him a standout during pre-sales and client presentations. Tom's work ethic and innovative solutions have bagged him numerous awards for creative, cost-saving digital solutions. His leadership mantra: mentor, guide, and represent, ensuring every team member shines. Tom is also a member of the Boye & Company CMS Experts
This article reflects the state of AI Optimization as of November 2024, incorporating detailed implementation guidance from llmstxt.org. For the complete specification and latest implementation guidelines, visit https://llmstxt.org/#proposal.