Measurement15 min read

Mentions Are the New Page Two

Raw mentions are the new page two; use recommendation-weighted visibility to measure commercial influence over misleading Share of Voice in the AI search era.

Why AI SEO Needs Better Metrics Than Share of Voice

The marketing industry is moving quickly to rebrand SEO for the AI search era.

That is understandable. Buyers are asking ChatGPT, Gemini, Perplexity, Copilot, and Google AI Overviews for product recommendations, vendor comparisons, pricing guidance, and trust signals before they ever visit a website. Agencies are trying to help clients adapt, and the language of the moment is “AI SEO,” “GEO,” “AEO,” “LLM visibility,” and “AI optimization.”

But there is a measurement problem hiding underneath much of this new language.

A lot of AI SEO reporting is still built around the easiest thing to count: mentions.

  1. How often did the brand appear?
  2. What was its AI share of voice?
  3. Which sources were cited?
  4. Did the brand show up more often than last month?

Those are useful inputs. They are not enough.

The mistake is treating LLMs like search result pages when they behave more like recommendation engines. AI systems do not just list brands. They summarize, compare, rank, qualify, warn, cite, and exclude. They can mention a company because they recommend it. They can also mention a company because they are explaining why a buyer should choose someone else.

That distinction changes everything.

In LLM answers, a mention can be the new page-two ranking: technically visible, but commercially weak. In some cases, it can be worse than weak. It can be a negative signal dressed up as visibility.

The SEO industry learned to trust tool-provided KPIs

For more than two decades, SEO agencies have built their operating systems around third-party data platforms. That was rational. Traditional SEO became a mature discipline with established workflows: keyword research, ranking position, backlinks, referring domains, technical audits, content gaps, traffic, conversions, and competitive visibility.

Platforms like Ahrefs and Semrush helped standardize that work. Ahrefs now describes Brand Radar as an AI visibility tool that monitors brand visibility across AI answers, YouTube, Reddit, and other surfaces, with a large database of search-backed prompts and metrics such as AI Share of Voice, mentions, citations, search demand, and visibility gaps. Semrush has also moved aggressively into AI visibility, with tools for AI mentions, citations, prompt tracking, competitor gaps, brand sentiment, and AI visibility benchmarking across platforms.

That work is valuable. The industry needs broad prompt databases, visibility tracking, citation mapping, and competitive monitoring.

But the next layer is different.

Traditional SEO reporting often starts from the platform’s available KPIs. That habit can become dangerous in LLM discovery because the easiest KPI to export is not always the most important thing to know.

Counting whether a brand appears in an AI answer is useful. But the commercial question is deeper:

  • Was the brand actually recommended?
  • Was the mention positive, neutral, or negative?
  • Was the brand ranked first, included in the top three, or merely named in passing?
  • Was it used as a trusted option, a comparison anchor, a legacy reference, or a cautionary example?
  • Which competitor captured the recommendation?
  • What buyer-intent cluster did the prompt belong to?
  • What monthly value was actually captured?

If those questions are not answered, the report is not measuring AI recommendation performance. It is measuring AI presence.

Those are not the same thing.

The prompt set is the strategy

The first problem with many AI visibility reports starts before the first mention is counted.

It starts with the prompt set.

In traditional SEO, keyword research is the foundation of strategy. In LLM discovery, prompt research plays the same role, but with higher stakes. A weak prompt set can make a brand look strong while measuring almost nothing of commercial value.

A prompt like:

“What is [Brand]?”

does not carry the same business meaning as:

  • “Best medical alert system for seniors”
  • “Compare medical alert systems”
  • “Cheapest medical alert system with fall detection”
  • “Which provider has the lowest monthly fee?”
  • “What is the best alternative to [Brand]?”

Those are very different buyer moments.

A serious AI optimization report should separate vanity prompts from high-intent buying clusters. For example, in one medical alert systems analysis, the three free-report clusters were Discovery & Ranking, Head-to-Head Comparison, and Pricing / Cost Evaluation. The pricing cluster was explicitly decision-stage, focused on cost, monthly fees, subscription terms, affordability, plan value, and contract considerations.

That matters because pricing prompts are not neutral awareness moments. They are buying moments. If an AI answer mentions a brand in a pricing prompt because the brand is expensive, opaque, or less flexible than competitors, that mention should not be counted as a win.

The first question should not be:

“How often were we mentioned?”

The first question should be:

“Were we measuring the right prompts in the first place?”

Share of voice only becomes useful after the prompt universe is clean.

Mentions are not recommendations

AI share of voice is often treated as the headline number in AI visibility reporting. It usually tells you how often a brand appears compared with competitors.

That can be helpful. It can show whether a brand is included in AI-generated answers at all. It can reveal competitive gaps. It can show whether a brand is part of the model’s category memory.

But mention share does not tell you whether the mention helped.

A brand mention can mean:

  • “This is one of the best options.”
  • “This is a known company in the space.”
  • “This company is more expensive.”
  • “This company is often compared with better alternatives.”
  • “This company lacks an important feature.”
  • “Users may want to consider competitors instead.”
  • “This is the legacy brand people ask about before choosing someone else.”

All of those are mentions. Only some are commercially valuable.

This is where raw AI share of voice can mislead executives. It compresses very different forms of visibility into one number. It can make a brand look stronger simply because the brand is frequently discussed, even when the discussion is neutral or negative.

In search, ranking on page two is technically visibility. But no serious SEO team would celebrate page-two rankings as market leadership. In AI answers, mentions can function the same way. A brand may be present, but not persuasive. Present, but not recommended. Present, but below the real shortlist.

That is why “mentions are the new page two” is more than a metaphor. It is a warning.

The Life Alert lesson: when the same data told two different stories

During methodology testing, we saw how dangerous this can become.

A medical alert systems report built on ARS / share-of-voice-style logic made Life Alert look strong. It showed the brand as #2 overall and described it as leading the Head-to-Head Comparisons and Pricing Evaluation clusters. The report assigned modeled monthly value to those clusters and treated pricing as a strong defensive position.

But the same report contained a warning sign. Its sentiment section said Life Alert’s sentiment profile was overwhelmingly neutral-to-negative, with minimal positive positioning and an absence of positive sentiment in the data.

That contradiction revealed the problem.

The methodology had counted presence as performance. It also allowed first tracked-company mention order to become recommendation order when no explicit ranking was present. In a category where a legacy brand is often mentioned first as context, that can create a false positive.

The deeper recommendation analysis told a very different story. Across 919 high-intent observations on six AI platforms and 10 buying-moment clusters, Life Alert captured 0% AI recommendation rate and 0% Top-3 ranking rate. When present, it was frequently framed as a cautionary benchmark rather than the preferred choice.

That was not a small scoring adjustment.

It changed the entire business conclusion.

The ARS-style report suggested:

Life Alert is strong in comparison and pricing prompts.

The corrected interpretation was:

Life Alert is visible in comparison and pricing prompts, but that visibility is often not positive recommendation capture.

That difference matters because a company reading the first report could believe everything is fine. In reality, AI systems may be using the company’s brand recognition to steer buyers toward competitors.

That is the dangerous part.

Pricing prompts expose the weakness fastest

Pricing is where the flaw becomes obvious.

In a discovery prompt, a mention may indicate awareness. In a comparison prompt, a mention may indicate consideration. But in a pricing prompt, sentiment and framing become critical.

A pricing mention can mean:

Positive:

  • “This company offers strong value and transparent monthly pricing.”

Neutral:

  • “This company provides medical alert services, and pricing depends on the plan.”

Negative:

  • “This company is often considered expensive or less transparent; buyers may prefer these alternatives.”

All three are mentions. Only the first should receive recommendation value.

If a report counts all three as positive visibility, it creates false confidence. The brand appears to be “winning” pricing because it appears often, even though the model may be discussing pricing concerns, hidden fees, contract limitations, or better alternatives.

In pricing prompts, a negative mention is not weak visibility.

It is buyer leakage.

The platform layer is useful, but incomplete

This is where the industry needs nuance.

The problem is not that Ahrefs, Semrush, or similar platforms are bad. They are doing important work. They are building the first large-scale infrastructure layer for AI visibility.

Ahrefs says Brand Radar tracks AI visibility across hundreds of millions of search-backed prompts and supports analysis across AI Overviews, AI Mode, ChatGPT, Copilot, Gemini, Perplexity, and other surfaces. Semrush says its AI Visibility Toolkit can benchmark brand visibility, analyze mentions and sentiment, discover prompts and topics, track prompts, audit AI readiness, and identify competitive gaps. That is useful.

But agencies should not confuse the platform layer with the full strategy layer.

At broad scale, the easiest thing to compute is whether a brand appeared. The harder layer is interpreting what the answer means. That requires:

  • prompt classification
  • sentiment scoring
  • framing analysis
  • recommendation validity
  • ranking inside the answer
  • citation interpretation
  • commercial value modeling

Some platforms are already moving into parts of this. Semrush, for example, describes AI sentiment and brand perception features in its AI visibility tools. But even when platforms provide some of these signals, agencies still need to design the right prompt clusters, validate the outputs, interpret the answer context, and connect the metrics to the client’s actual buying journey.

The major tools can help agencies collect and organize the data. They cannot replace the agency’s responsibility to understand what the data means.

The next generation of AI optimization will not be built on who can export the cleanest share-of-voice chart.

It will be built on who can interpret the answer.

The hidden cost of real LLM measurement

There is another reason this gap exists: real LLM measurement is computationally and analytically heavy.

Prompt collection is only step one.

To produce recommendation-grade intelligence, every prompt answer may need to be evaluated for:

  1. prompt intent
  2. prompt polarity
  3. company presence
  4. company sentiment
  5. company framing
  6. whether the company was actually recommended
  7. where the company ranked in the recommendation set
  8. whether the brand was included as a comparison anchor
  9. whether the brand was used as a cautionary reference
  10. which citations shaped the answer
  11. which competitor captured the recommendation
  12. what the prompt was worth commercially

That is a very different workload from counting mentions.

At the scale of hundreds of millions of prompts, deeper semantic classification becomes a major product, infrastructure, and cost problem. Large platforms may get there over time. Some are already adding pieces. But agencies cannot wait for the perfect dashboard to exist.

Today, the agency has to do the heavy lifting.

That means building high-intent prompt clusters, separating vanity visibility from buying-moment visibility, validating sentiment, checking whether recommendation rank is real, and translating the results into business implications.

The agencies that skip this step will produce attractive reports that may be directionally wrong.

What better AI optimization reporting should include

Better AI optimization reporting starts with prompt discipline.

Before agencies count mentions, they need to separate vanity prompts from high-intent buying clusters. Once the prompt set is commercially meaningful, the report should measure multiple layers.

1. Prompt filtering and high-intent cluster design as the foundation

Not every prompt deserves equal weight.

  • “Tell me about this brand” is not the same as “best provider,”
  • “compare providers,”
  • “lowest monthly cost,”
  • “most trusted option,”
  • or “alternative to this brand.”

The prompt set determines whether the report measures market influence or brand trivia.

2. Prompt intent as the context layer

Each prompt should be classified by buyer stage and purpose:

  • Discovery
  • Comparison
  • Pricing
  • Trust
  • Alternatives
  • Complaints
  • Implementation
  • Support
  • Brand-specific defense

A mention in a pricing prompt does not mean the same thing as a mention in a general awareness prompt.

3. Mention share as a visibility input

Mentions still matter. They show whether the brand is present in AI answers. But presence is the starting point, not the outcome.

Mention share should answer:

“Are we in the conversation?”

It should not automatically answer:

“Are we winning?”

4. Sentiment and framing as quality controls

Every mention should be classified as positive, neutral, or negative.

Then it should be framed by role:

  • Leader
  • Strong option
  • Acceptable option
  • Alternative
  • Legacy reference
  • Comparison anchor
  • Cautionary example
  • Brand to replace

This is the layer that prevents a report from counting a negative pricing mention as a win.

5. Recommendation rank as the commercial signal

The most important question is not:

“Were we mentioned?”

It is:

“Were we actually recommended, and where did we rank?”

Rank 1, Top 3, and average recommended rank are much closer to commercial influence than raw share of voice.

If a company is not in the recommendation set, it should not receive recommendation value just because it appeared in the answer.

6. Citation architecture as the source layer

AI answers are shaped by the sources models retrieve, cite, and synthesize. Agencies should map which domains, pages, reviews, forums, editorial sources, and third-party discussions are reinforcing or weakening the brand’s position.

This part matters because LLM visibility is not built only on the company’s website. It is built on the broader evidence layer surrounding the brand.

Your own internal report framework already points in this direction: AI systems recommend, rank, compare, frame, and exclude companies, so the report has to measure more than broad visibility.

7. Monthly value as the executive layer

Executives do not only need to know whether the brand appeared. They need to know what the recommendation position is likely worth.

That does not mean pretending AI visibility can be attributed with perfect revenue precision. It means estimating directional monthly recommendation value based on prompt demand, commercial intent, rank position, and value-per-query assumptions.

The best commercial reporting should separate:

  • Captured recommendation value
  • Competitor-captured value
  • Neutral visibility
  • Negative visibility
  • At-risk demand
  • Full-report opportunity

That is how AI visibility becomes boardroom-relevant.

A better standard: recommendation-weighted visibility

The industry does not need to throw away SEO. It does not need to throw away Ahrefs, Semrush, citation analysis, content strategy, technical audits, or share-of-voice tracking.

It needs to stop treating the first measurable layer as the final truth.

A better standard is recommendation-weighted visibility.

That means a brand earns commercial credit only when it is positively recommended in a meaningful buyer-intent prompt.

A simple version looks like this:

  1. Negative mention = -1
  2. Neutral mention = 0
  3. Positive recommendation = +1

Only positive, valid recommendations should receive rank credit or captured value.

That shift prevents the most dangerous reporting error in AI SEO: telling a company it is winning because it is frequently mentioned, when AI systems are actually using those mentions to recommend competitors.

What companies should ask their AI SEO agency

Companies buying AI optimization services should ask better questions.

Not just:

“How often are we mentioned?”

But:

  • “Which prompts are included in the analysis?”
  • “Are those prompts high-intent buying moments or vanity prompts?”
  • “Are mentions classified as positive, neutral, or negative?”
  • “Are we actually recommended, or just mentioned?”
  • “What is our Rank 1 rate?”
  • “What is our Top 3 recommendation rate?”
  • “What is our average recommended rank?”
  • “Which competitors are being recommended when we are not?”
  • “Which citations shape the answer?”
  • “Where are we visible but negatively framed?”
  • “How much monthly recommendation value are competitors capturing?”

Those questions separate AI reporting from AI theater.

What agencies should do next

Agencies do not need to pretend the old SEO playbook is useless. It is not.

The best AI optimization work will still need many SEO fundamentals:

  • Technical accessibility
  • Indexable content
  • Helpful comparisons
  • Strong brand entities
  • Authoritative third-party coverage
  • Reviews
  • Community discussion
  • Citation-worthy pages
  • Consistent positioning
  • Clear product and pricing information

But the measurement model has to evolve.

AI optimization cannot be judged only by whether a brand appears. It has to be judged by whether the brand is trusted, recommended, ranked, and supported by credible evidence inside the buying moment.

The irony is that many AI optimization agencies tell clients that LLMs reward clarity, credibility, evidence, and honesty.

Agencies will have to apply the same standard to their own reporting.

If they sell vanity metrics, the market will eventually notice. Clients will compare reports against actual AI answers. They will ask why “high share of voice” did not translate into recommendations. They will ask why pricing visibility did not produce pricing advantage. They will ask why a brand with strong mentions is still losing the shortlist.

That is when the agencies with better measurement will separate from the agencies that simply put “AI” in front of SEO.

The future of AI SEO is not more dashboards

The future of AI SEO is not just more mentions, more charts, or more visibility scores.

It is better interpretation.

The next phase will be won by the teams that can answer:

  1. Which prompts matter?
  2. Which brands are recommended?
  3. Which brands are merely mentioned?
  4. Which brands are being warned against?
  5. Which sources shape the recommendation?
  6. Which competitors capture the value?
  7. What should the company do about it?

That is the difference between a visibility report and a recommendation intelligence report.

The first tells you whether you showed up.

The second tells you whether AI helped you win the buyer.

And in the LLM era, that is the metric that matters.

Want Recommendation Intelligence Without Building the Whole System?

If this sounds like a lot to build, that is because it is.

LLM Authority Index was created to provide this measurement layer for agencies and companies that want more than basic AI visibility metrics. We analyze high-intent prompt clusters, separate mentions from recommendations, score sentiment and framing, track Rank 1 and Top 3 recommendation position, map citation architecture, and estimate monthly recommendation value.

Agencies can use LLM Authority Index as a white-label intelligence layer behind their own AI optimization services. Companies can use it directly to understand where they are being recommended, where competitors are capturing value, and where visibility may be misleading.

The goal is not another vanity dashboard. It is a clearer answer to the question that matters most:

When buyers ask AI what to choose, does your brand win?

See how the framework applies to your market.

Get an AI Market Intelligence Report and see how AI is shaping consideration, comparison, and recommendation in your category.