I recently saw a live demo of Agentic AI being used to make a flight reservation on a very popular airline site. Just like a host at an improv event, the crowd was asked for random locations and dates to provide the Agent. The presenter was very engaging, making sure to take us on a side conversation as a necessary distraction while we all waited, watching in real-time all the behind-the-scenes action of an AI agent attempting to guess it's way through specifying location and dates. After about 5 minutes, the crowd interrupted the presenter with gasps then an exuberant applause as the AI Agent was finally able to get past the very first step in making a reservation. Out the corner of my eye, I saw one of the people that worked on the Agent pump their fist in victory. The presenter acknowledged that it was pretty slow, but was hopeful they could reach human-like speeds within 6 months to a year. Everyone cheered on the success of the Agent for recognizing the date fields and invoking some date pickers. It was like celebrating a child pooping in a proper toilet for the first time. Definitely a joyous occasion to be commended.
As an assistive technology, there is hope that AI Agents can perform tasks that usually present barriers to people with disabilities. There is promise here, a worthy cause. But something makes me feel that the AI agents of the future are actually too late.
AI as an accelerator
If there is one true promise of AI, it is that it should function as an accelerator for human creativity, discovery, and productivity instead of only replacing humans. That might not be true in some cases and intentions but in one such case of acceleration, it did not take long for AI to discover that web content is pretty bad and hard to understand. Take for example, the proposal to standardize on using a /llms.txt
file that provides information to help LLMs use a website at inference time. Here is the problem statement as described:
Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.
While websites serve both human readers and LLMs, the latter benefit from more concise, expert-level information gathered in a single, accessible location. This is particularly important for use cases like development environments, where LLMs need quick access to programming documentation and APIs.
This proposal acknowledges that the web is complex with excessive navigation, ads, and JavaScript. It also states that only LLMs (not humans) would benefit from more concise information in a single, accessible location to improve content ingestion. The solution is to create additional markdown files to help describe content to LLM's. Fascinating!
The standard goes on to list some guidelines to create effective llms.txt files:
- Use concise, clear language.
- When linking to resources, include brief, informative descriptions.
- Avoid ambiguous terms or unexplained jargon.
- Run a tool that expands your
llms.txt
file into an LLM context file and test a number of language models to see if they can answer questions about your content.
Robustness of the web
Imagine how much easier and faster delivering Agentic AI features could be if we didn't have to use up massive amounts of energy, melting AI chips, to somehow perceive and guess what or how tasks could be accomplished on the web? Or if we didn't have to incur the cost of providing extra meta-data and information to explain everything? What if there was a way that AI could easily glean relevant controls, expected interactions, and predictable outcomes? What if content was robust enough that it could be interpreted by a wide variety of (AI) agents? Perhaps some tags or hints to provide additional context?
This is what all the accessibility old-heads have been screaming about. Robustness is an important accessibility principle, with the goal of being able to "Maximize compatibility with current and future user agents, including assistive technologies." (Source: Web Content Accessibility Guidelines 2.2, W3C World Wide Web Consortium Recommendation 12 December 2024, Guideline 4.1 Compatible. https://www.w3.org/TR/WCAG22/#compatible)
Everything old is new again
The problems that AI an LLM's are running into are not new, they have been known to exist for a long time now. Really smart humans (let's assume well-intentioned *wink wink*) created these complex HTML pages with navigation, ads, and JavaScript that the really smart AI can't seem to handle. So, the really smart humans have clamored for simplicity and clarity to help the really smart AI deal with the overly-complex internet. Is that what they call job security?
AI fanboys might not want to hear it, but HTML and ARIA (when implemented responsibly) has long been able to make things easier for humans, while still providing flexibility, privacy, and autonomy. In fact, revisiting the guidelines for "effective" /llms.txt
files, you may or may not realize that they are similar to interpretations of very key WCAG guidelines:
- Use concise, clear language AKA WCAG 3.1.1 - Reading Level (AAA)
- When linking to resources, include brief, informative descriptions AKA WCAG 2.4.4 Link Purpose (In Context) (Level A)
- Avoid ambiguous terms or unexplained jargon AKA WCAG 3.1.3 - Unusual Words (Level AAA)
Like Bill Murray trapped in Punxsutawney it seems as technology keeps experiencing the same problems over and over again. Every new generation stumbles upon themselves as they race towards (sometimes perceived) innovation, ignoring elders along the way because "it's different this time!" Some things don't change. Some things are not new, they may just only be new to you. I'll accept details, capabilities, and maybe even usage might be different but long-standing principles still apply. (thinking emoji)
So the intelligence really is artificial, huh?