GEO Auditor

Reference

GEO Glossary

Authoritative definitions of Generative Engine Optimization terms — the vocabulary of AI search visibility.

DEF

Generative Engine Optimization (GEO)

The practice of making a website visible and citable to AI-powered search engines that generate direct answers — such as ChatGPT, Perplexity, Google Gemini, Claude, and Bing Copilot.

Unlike traditional SEO, which optimizes for a ranked list of links, GEO optimizes for a citation decision: whether an AI engine includes your content in its synthesized answer, and whether it attributes that content to your brand by name. GEO was formally named in a 2023 paper by researchers at Columbia University and IIT Delhi, which found that specific writing and structural techniques increase AI citation frequency by up to 40%. The signals that drive GEO performance — AI crawler access, brand entity recognition, content citability, and structured schema — are almost entirely distinct from traditional ranking factors like domain authority or keyword density.

DEF

GEO Score

A composite metric from 0 to 100 that measures a website's overall visibility to AI-powered search engines across six dimensions: AI crawler access, content citability, brand authority, content E-E-A-T, technical health, and schema markup.

A GEO score is not a single measurement — it aggregates performance across all the signals that determine whether AI search engines can read, understand, trust, and cite a site. A score of 0–30 is Critical: the site is effectively invisible to AI search. 31–50 is Poor: significant barriers exist. 51–70 is Fair: the site appears occasionally but inconsistently. 71–85 is Good: the site is well-optimized and cited regularly. 86–100 is Excellent: the site is a strong, consistent AI citation source. GEO score is independent of Google PageRank, domain authority, or any traditional SEO metric.

DEF

AI Crawler (AI Bot)

A web crawler operated by an AI search platform that fetches and reads web pages to build the knowledge base used for generating AI search answers.

Each major AI platform operates its own crawler. The 14 most important are: GPTBot and OAI-SearchBot (OpenAI/ChatGPT), ClaudeBot and anthropic-ai (Anthropic/Claude), PerplexityBot (Perplexity), Google-Extended (Google Gemini and AI Overviews), Applebot-Extended (Apple Intelligence), Amazonbot (Amazon/Alexa), FacebookBot (Meta AI), Bytespider (ByteDance), CCBot (Common Crawl), Googlebot, and Bingbot. Like all web crawlers, AI crawlers obey robots.txt access rules. Blocking them — accidentally or intentionally — prevents the corresponding AI platform from reading or citing your content.

DEF

Citability

A measure of how likely an AI search engine is to extract and quote a specific passage from a web page when constructing an answer.

Citable content has four characteristics: it makes a specific, verifiable claim (not a vague generalization); it is self-contained (the passage makes sense without surrounding context); it has high statistical density (numbers, percentages, concrete data); and it is structurally clear (short paragraphs, question-based headings, definition patterns). Research from Columbia University and IIT Delhi quantified the effect: content optimized for these signals is cited by AI engines up to 40% more frequently than unoptimized content of equivalent quality.

DEF

Brand Entity Recognition

The degree to which AI models recognize a brand as a distinct, verified entity in their knowledge graph — enabling named attribution when citing that brand's content.

AI language models are trained on structured knowledge graphs (Wikidata, Wikipedia, Google Knowledge Graph) in addition to unstructured web text. When an AI model cites content, it can either quote it anonymously or attribute it by name. Named attribution — "According to [Brand]…" — requires the model to have enough entity confidence to link the content to a known organization. That confidence comes from: presence in Wikidata, a complete LinkedIn company page, consistent name/URL/description across the web, and a sameAs property in the site's Organization schema pointing to these profiles.

DEF

Knowledge Graph

A structured database of entities — people, organizations, places, concepts — and the relationships between them, used by AI models to resolve entity identities and verify factual claims.

The most important knowledge graphs for GEO are Wikidata (open, community-maintained), Google Knowledge Graph (powers Google Search and Gemini), and Wikipedia. When an AI model is asked about a topic, it cross-references its training data against these graphs to confirm entity identities and relationships. A brand that appears in Wikidata with a verified website URL, founding date, and industry classification is treated as a known, verifiable entity. A brand that appears only as unstructured text on web pages is treated as an unknown quantity.

DEF

sameAs

A Schema.org property on an Organization or Person that links a website's identity to the same entity in external authoritative databases — most importantly Wikidata, Wikipedia, and LinkedIn.

The sameAs property is the most direct way to connect your website to your knowledge graph presence. It tells AI engines: "This website and this Wikidata entry refer to the same organization." Without sameAs links, an AI model must infer the connection from unstructured text, which is less reliable. A complete sameAs array might include links to Wikidata, Wikipedia, LinkedIn, Crunchbase, GitHub, and social profiles. Each additional link increases entity confidence. Adding sameAs to an existing Organization schema takes approximately five minutes.

DEF

llms.txt

A plain-text file at the root of a website (yourdomain.com/llms.txt) that provides a structured, Markdown-formatted summary of the site's content, purpose, and key pages — designed specifically for AI language models to read.

Proposed by Jeremy Howard (fast.ai) in 2024, llms.txt is analogous to robots.txt but serves a different purpose: instead of access rules, it provides AI models with a clean, noise-free description of what the site is and which pages matter most. A basic llms.txt includes an H1 heading (site name), a blockquote summary, a short description paragraph, and a set of categorized links with descriptions. Adoption is growing among AI platforms and developer tools. It takes under 10 minutes to create and signals AI-readiness to crawlers.

DEF

E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness)

A quality framework originally defined by Google for human search quality raters, now applied algorithmically by AI search engines to evaluate whether content comes from a credible, knowledgeable source.

Experience refers to first-hand knowledge demonstrated through specific examples, original data, and first-person accounts. Expertise is shown through accurate domain terminology, author credentials, and substantive coverage depth. Authoritativeness is earned through topic cluster depth, external citations, and a recognized brand presence. Trustworthiness is signaled by visible contact information, a privacy policy, terms of service, and an About page. For AI search specifically, E-E-A-T signals correlate strongly with citation frequency — AI engines prefer content that demonstrates genuine expertise over content that summarizes what others have already said.

DEF

robots.txt (AI context)

A plain-text file at the root of a website that specifies which web crawlers are allowed or disallowed from accessing its pages. For GEO, it determines which AI search platforms can read and index the site.

robots.txt uses User-agent directives to target specific crawlers and Allow/Disallow directives to control access. A wildcard rule (User-agent: * / Disallow: /) blocks all crawlers by default, including all AI crawlers, unless specific Allow rules for individual bots appear earlier in the file. This pattern is the single most common cause of complete AI search invisibility — many sites set it years ago to prevent scraping and have never revisited it. The fix is straightforward: add explicit Allow: / rules for each AI crawler above the wildcard block.

DEF

Schema Markup (Structured Data)

Machine-readable metadata embedded in a web page that describes the type, authorship, and content of that page to search engines and AI crawlers. JSON-LD is the preferred format.

Schema.org is the shared vocabulary for structured data, supported by Google, Bing, and AI platforms. The schemas most relevant to GEO are: Organization (establishes brand identity and sameAs links), Article and BlogPosting (authored content with publication dates), FAQPage (Q&A content that AI engines frequently quote verbatim), HowTo (instructional content), and Person (author entities). All schema should be server-rendered — injected via JavaScript after page load is invisible to AI crawlers that do not execute JavaScript.

DEF

Speakable

A Schema.org property (SpeakableSpecification) that marks specific sections of a page as particularly suitable for text-to-speech synthesis by AI assistants and voice search platforms.

Speakable tells AI systems: "This CSS selector points to the content that best summarizes this page — use it for voice responses or AI answer synthesis." It can target headings, summary paragraphs, or any other element via CSS selectors. While originally designed for voice search, speakable has taken on new relevance for AI answer generation: it provides a direct signal about which content the author considers most authoritative and quotable. Most sites do not have speakable markup; adding it is a low-effort differentiator.

DEF

GPTBot

OpenAI's primary web crawler, used to fetch web content for training ChatGPT and powering ChatGPT's real-time search feature.

GPTBot was first publicly disclosed by OpenAI in August 2023. It is identified in HTTP requests by the User-agent string "GPTBot". A separate crawler, OAI-SearchBot, handles real-time search queries in ChatGPT. A third user-agent, ChatGPT-User, represents browsing requests made by ChatGPT during live conversations. For full ChatGPT visibility, all three should be permitted in robots.txt. OpenAI publishes the IP ranges used by GPTBot; sites that block by IP rather than User-agent should verify these ranges are not blocked.

DEF

AI Overviews (Google)

A feature in Google Search that displays an AI-generated summary answer at the top of the results page, synthesized from multiple web sources, before the traditional ranked link list.

AI Overviews (formerly Search Generative Experience) use Google's Gemini model to generate direct answers from web content. The sources cited in AI Overviews receive a small link display, but the primary value is brand authority: appearing in AI Overviews signals that Google's AI considers your content trustworthy for the query. To appear in AI Overviews, sites need: accessible Google-Extended crawler (not blocked in robots.txt), strong E-E-A-T signals, relevant and well-structured content, and ideally FAQPage or HowTo schema for the relevant query type.

Related:Google-ExtendedE-E-A-TCitability
DEF

Perplexity AI

An AI-powered search engine that generates cited, conversational answers to queries by reading and synthesizing current web content in real time.

Perplexity differs from ChatGPT search in that every answer includes numbered citations linking directly to the sources used. This makes Perplexity citations highly valuable — they are visible, clickable, and drive direct referral traffic. Perplexity uses its own crawler (PerplexityBot) for real-time indexing. It is generally considered the hardest AI platform to score well on for GEO, because it applies strict citability filters: content must be well-structured, specific, and clearly authoritative to appear in answers. Perplexity also checks llms.txt where available.

Related:AI CrawlerPerplexityBotCitability
DEF

Wikidata

A free, open knowledge graph maintained by the Wikimedia Foundation that stores structured data about entities — organizations, people, places, and concepts — and serves as the primary reference database used by AI models for entity resolution.

Wikidata is the most important external signal for brand entity recognition in GEO. Each entity in Wikidata is assigned a unique Q-number (e.g., Q12345), which acts as a universal identifier across the web. When an AI model encounters your brand name, it checks knowledge graphs — primarily Wikidata — to confirm that the name refers to a specific, verified organization with known attributes. A Wikidata entry with your organization type, founding date, official website URL, and industry classification gives AI models the structured evidence they need to attribute content to your brand by name. Creating a Wikidata entry is free, typically takes 20–30 minutes, and is one of the highest-ROI GEO improvements available to any site.

DEF

Answer Block

A self-contained passage of content — typically 50–200 words — that directly answers a specific question, makes a verifiable claim, and can be quoted by an AI engine without requiring surrounding context to make sense.

The answer block is the fundamental unit of AI-citable content. AI engines like Perplexity and ChatGPT extract passages at the paragraph level, not the page level. A passage qualifies as a strong answer block when it: leads with the answer in the first one or two sentences (answer-first structure), names its subject explicitly rather than using pronouns ("GEO score" not "it"), makes a specific and verifiable claim, and is short enough to be quoted intact. Pages built around question-based headings followed by answer blocks consistently score higher on citability audits than pages with equivalent information presented as flowing prose.

DEF

Statistical Density

The number of specific, quantified data points per 500 words of content — a measurable signal that correlates strongly with AI citation frequency.

AI engines preferentially cite content that makes specific, verifiable numerical claims over content that makes vague qualitative assertions. Statistical density is measured as the count of percentages, dollar amounts, dates, counts, and named measurements per 500-word passage. A density of 5 or more statistics per 500 words is considered excellent for AI citability; fewer than 2 is poor. Common examples of low-density writing: "many sites block AI crawlers" (no count), "GEO is growing rapidly" (no data). High-density rewrites: "38% of sites we've audited block at least one Tier-1 AI crawler" and "AI search queries on Perplexity grew 10x between 2023 and 2024." The improvement is often achievable by adding a single data point per paragraph.

DEF

Server-Side Rendering (SSR)

A web rendering approach in which the full HTML content of a page — including text, schema markup, and internal links — is generated on the server and delivered in the initial HTTP response, before any JavaScript executes in the browser.

Server-side rendering is critical for GEO because most AI crawlers do not execute JavaScript. A page that renders its main content, structured data, or internal links via client-side JavaScript will appear nearly empty to AI crawlers — even if it loads and displays correctly in a browser. This is one of the most common and most impactful technical failures in GEO audits. Frameworks that are SSR-safe by default include Next.js (with App Router), Astro, SvelteKit, and Nuxt. A React SPA (create-react-app) or any client-side-only rendering approach will fail this check. The SSR audit verifies that main content, schema markup, and navigation links are all present in the raw HTML response before JavaScript runs.

DEF

Topical Authority

A measure of how comprehensively a website covers a subject domain, assessed by the depth of individual pieces, the breadth of topic coverage, and the degree to which related pages are interlinked into coherent content clusters.

AI engines prefer to cite sources that demonstrate sustained, comprehensive expertise on a topic over sources that publish a single page on that topic. Topical authority is composed of three dimensions: depth (does any individual piece of content cover its topic end-to-end, in sufficient detail to be a standalone reference?), breadth (does the site cover the full topic space, or only isolated fragments?), and clustering (are related pages linked to one another in a logical structure that signals coherent expertise?). A site with one well-optimized homepage and no supporting content is topically shallow. A site with a pillar page, several supporting articles, and a glossary — all internally linked — demonstrates topical authority.

DEF

AI Crawler Tiers

A classification of AI web crawlers into three groups based on their direct impact on AI search visibility: Tier 1 (active AI search crawlers), Tier 2 (AI ecosystem and training crawlers), and Tier 3 (training-only crawlers with no direct search impact).

Tier 1 crawlers have the most immediate impact on whether your site appears in AI search answers today. They include: GPTBot and OAI-SearchBot (OpenAI/ChatGPT), ChatGPT-User (user-initiated browsing), ClaudeBot (Anthropic/Claude), and PerplexityBot (Perplexity). Blocking any Tier 1 crawler means the corresponding AI platform cannot read or cite your content. Tier 2 crawlers feed AI training pipelines and secondary platforms: Google-Extended (Gemini), Applebot-Extended (Apple Intelligence), Amazonbot, and FacebookBot. Blocking these has less immediate but meaningful long-term impact. Tier 3 crawlers — CCBot, anthropic-ai, Bytespider, cohere-ai — are used for model training only; blocking them has minimal current visibility impact but may affect future model awareness.

DEF

knowsAbout

A Schema.org property on an Organization or Person entity that explicitly lists the topics, domains, or areas of expertise the entity is knowledgeable about — helping AI engines classify and trust the entity as a credible source for specific subjects.

The knowsAbout property is an underused but GEO-specific signal that directly communicates your brand's areas of expertise to AI engines. Where sameAs links prove entity identity, knowsAbout declares topical relevance: "This organization has recognized knowledge in these domains." It accepts either a text value or a Thing entity. Example: an Organization schema for a GEO auditing tool might list knowsAbout values of "Generative Engine Optimization", "AI search visibility", "schema.org markup", "robots.txt", and "brand entity recognition". AI engines that encounter your content on these topics have stronger confidence that your site is an authoritative source when knowsAbout is declared.

DEF

Brand Authority Score

A composite metric (0–100) measuring how well a brand is recognized across the platforms AI models most commonly use as entity reference and corroboration sources, weighted by each platform's correlation with AI citation frequency.

The Brand Authority Score aggregates presence across five platform categories, each weighted by empirical correlation with AI citation frequency: YouTube (25%) — brand channel subscribers, third-party video mentions, and video transcript presence; Reddit (25%) — brand mention count, official community presence, and sentiment; Wikipedia/Wikidata (20%) — article existence and completeness, Wikidata entry quality; LinkedIn (15%) — company page completeness and follower range; Other sources (15%) — news mentions, GitHub stars, Quora presence, Crunchbase listing. YouTube's high weighting reflects research showing it has the strongest correlation (~0.74) with AI citation among all platforms. Unlinked brand mentions — citations of a brand name without a hyperlink — are weighted alongside linked mentions, reflecting how AI models actually process brand signals.

DEF

Passage Self-Containment

A content quality attribute that describes whether a passage of text makes complete sense when read in isolation — without requiring the reader to have read surrounding paragraphs or the broader page context.

AI engines extract content at the passage level, not the page level. A passage that begins "As mentioned above, this means that…" or relies on a previously defined term is not self-contained — an AI engine cannot quote it cleanly. A self-contained passage names its subject explicitly in every instance, makes a claim that is interpretable without context, and can be read as a standalone statement. In practice: replace pronouns with explicit names, restate the key subject at the start of each paragraph, and avoid cross-reference language. The optimal passage length for AI citability is 50–200 words. GEO Auditor measures the percentage of key passages that are self-contained as one of five citability sub-scores.

DEF

INP (Interaction to Next Paint)

A Core Web Vitals metric that measures the responsiveness of a page to user interactions — specifically the time from a user's click, tap, or keypress to the next visual update on screen. INP replaced FID (First Input Delay) as a Core Web Vitals metric in March 2024.

INP is measured at the 75th percentile of all user interactions on a page. A score below 200ms is considered Good; 200–500ms needs improvement; above 500ms is Poor. Unlike FID, which only measured the delay before the browser began processing input, INP measures the complete time from input to visual feedback — making it a more accurate indicator of perceived responsiveness. For GEO, Core Web Vitals including INP contribute to Google AI Overviews inclusion signals and to the overall technical health dimension of a GEO score. High INP scores are most commonly caused by long JavaScript tasks that block the main thread, excessive DOM size, or inefficient event handlers.


Apply this knowledge

See your GEO score in 45 seconds

Free audit across all 6 dimensions. No signup required.

Run a free audit →