10,589 pages analyzed · 3 industries · 100+ content signals · 4 statistical methods · R² = 0.64
In GDPR compliance content, the top-ranked pages have a median of 1 image. In AI and SEO content, top-ranked pages have 11. Same Google. Same algorithm. Same year.
That's not a rounding error. It's a 10x reversal — and it's not the only one.
We measured 7 widely-followed SEO "best practices" across 10,589 top-ranking pages in three completely unrelated industries: legal compliance, online gambling, and AI marketing. Most of them reversed — and the rest turned out to be irrelevant. What predicts a top-3 ranking in one niche is often irrelevant or actively harmful in another.
The explanation is both obvious and uncomfortable for an industry built on universal rules: Google has learned what each audience wants. It doesn't reward "good SEO." It ranks content that serves the person searching — and each audience wants something completely different.
Here's what the SEO industry tells you to do — and here's what the data actually shows.
How the Same "Best Practice" Flips Between Industries
| "Best Practice" | GDPR Compliance | iGaming | AI & SEO |
|---|---|---|---|
| Add more images | Hurts | Neutral | Helps |
| Build author bios | Hurts | Helps | Hurts |
| Link to external sources | Hurts | Hurts | Helps |
| Add social sharing buttons | Hurts | Neutral | Neutral |
| Mention the current year | Hurts | Helps | Hurts |
| Optimize keywords in title | No effect | No effect | Helps |
| Write longer content | No effect | No effect | No effect |
Based on 10,589 pages, 4 statistical methods, effects reported only where Cohen's d >= 0.3 or p < 0.001.
This single table is the study's core finding. Every column tells a different story — because every column represents a different audience.
Google Is a Mirror of Your Audience
When we started this study, we expected to find universal ranking factors — a set of content attributes that predict higher Google positions regardless of industry. We measured over 100 content signals on every page: readability, structure, visual elements, links, citations, engagement widgets, semantic alignment, and more.
What we measured instead was the opposite. Almost every metric reversed between niches.
At first it looked like statistical noise. Then the pattern became impossible to ignore: the "winning" content profile in each niche is a near-perfect reflection of how that audience actually consumes information.
What Ranks in GDPR Compliance (Legal/Regulatory)
The audience: Data Protection Officers, privacy lawyers, compliance managers. Professionals who need precision and authority. They distrust fluff.
What the data shows about top-3 ranked GDPR content:
- Dense text, minimal visuals. Median 1 image. Pages with more images trend toward lower positions.
- High reading complexity. Flesch-Kincaid grade 14.6 — postgraduate level. Winners write more complex sentences than competitors (22.9 vs 22.0 words per sentence).
- No engagement widgets. Winners are 47% less likely to have social sharing buttons (OR = 0.53). 42% less likely to have newsletter signups. 34% less likely to have a table of contents. Every distraction correlates with lower ranking.
- Fewer citations than competitors. 0.5 cited sources on average vs 1.2 for competitive pages (-59% gap). Compliance content written by law firms doesn't cite external sources — it is the source.
- Word count is irrelevant. Winners and competitors are essentially the same length (1,413 vs 1,373 median words, Cohen's d = -0.007 — statistically zero). Length simply doesn't factor into GDPR ranking.
Google ranks the content that reads like a legal memo — because that's what compliance professionals trust. Clean pages. Dense reasoning. No distractions. No selling.
In practice: gdpr-info.eu ranks #1 for multiple competitive keywords with 300-450 words, zero images, and zero CTAs. Meanwhile, pages following "best practices" — like a compliance SaaS site with 3,650 words, 21 images, and 14 CTAs — sit at position 19. The sparse legal reference wins. The optimized marketing page loses.
What Ranks in iGaming (Entertainment/Commercial)
The audience: Bettors, casino players, sports fans. They want fast answers, current odds, and zero friction.
What the data shows about top-3 ranked iGaming content:
- Zero external links. Median 0 outbound links for winners. Sending a gambler to another site means losing a customer.
- Current year references help. Winners reference the current year 36% more than competitors (mean 2.5 vs 1.6). iGaming odds and promotions change daily — "2026" signals you're current.
- Objective tone wins. Winners use 57% less first-person language. Players trust facts and comparisons over personal opinions.
- Author bios help here. +59% gap (5.6 words vs 2.6 for competitive). Gamblers want to know who tested the casino. Personal credibility matters.
- More visual than compliance, less than AI. Median 4 images for winners — 4x more than GDPR, but still 3x fewer than AI-SEO.
Google ranks the content that gets to the point. Current data. Named reviewers. No outbound links draining the player away.
In practice: voluum.com ranks #1 for "what is igaming" with 2,377 words and zero external links. altenar.com ranks #1 for "online gambling laws" with zero external links. Meanwhile, pages heavy on outbound links (100+) consistently sit at position 15+.
What Ranks in AI & SEO Content (Technology/Marketing)
The audience: Marketers, content creators, small business owners. They want scannable, visual, step-by-step guidance.
What the data shows about top-3 ranked AI/SEO content:
- Most image-heavy. Median 11 images for winners — an 11x gap with GDPR. Screenshots, UI demos, step-by-step visuals. Every tutorial needs proof.
- Most hierarchical structure. H3-to-H2 ratio of 0.89 for winners vs 0.18 in iGaming. Content is deeply nested — marketers expect scannable structure with clear sub-sections.
- Easiest reading level. Flesch-Kincaid grade 11.1 vs 14.6 in GDPR — three full grade levels easier. The audience is marketers, not lawyers. Simpler writing ranks higher.
- Citations help. 1.9 cited sources vs 1.3 for competitors (+62% gap). Linking to studies and tools builds credibility in this niche.
- Keyword optimization actually matters. Keyword in title: r = -0.086 (p < 0.0001). This is the only niche where keyword placement shows a statistically significant correlation with ranking. GDPR and iGaming: near-zero.
- Author bios hurt. -33% gap (5.4 words vs 13.9 for competitors). Readers care about the content, not the author's credentials.
Google ranks the tutorial-style content — because marketers learn by doing, not by reading academic papers.
In practice: surferseo.com ranks #1 with 127 images. keywordinsights.ai ranks #1 with 107 images. Every step is shown, not told. These aren't decorated articles — they're visual walkthroughs.
The Winner Profile, Visualized
What a #1-ranked page looks like in each niche — same axes, completely different shapes:
| Axis | GDPR (Legal) | iGaming (Gambling) | AI-SEO (Marketing) |
|---|---|---|---|
| Images | 1 | 4 | 11 |
| Reading level | grade 14.6 | grade 13.8 | grade 11.1 |
| External links | 1 | 0 | 2 |
| Heading depth (H3:H2) | 0.25 | 0.18 | 0.89 |
| Author bio (words) | 12.9 | 5.6 | 5.4 |
| Citations | 0.5 | 0 | 1.9 |
| Current year refs | 0.4 | 2.5 | low |
| Engagement widgets | stripped | some | some |
If these were overlaid on a radar chart, the three profiles would barely overlap. A GDPR winner looks nothing like an AI-SEO winner. And yet most SEO advice treats them as identical.
Every One of These 7 "SEO Best Practices" Is Wrong
The SEO industry gives the same advice regardless of niche: write longer content, add images, build author bios, link to authority sites, add social sharing, mention the current year, and optimize for keywords.
We tested every one. Here is what we measured across 10,589 pages.
Myth 1: "Write longer, more comprehensive content"
The rule. Google rewards thorough content. Aim for 2,000+ words.
The data. This one didn't reverse — it simply doesn't matter. Content length is one of the weakest signals we measured. In GDPR, the effect size is essentially zero (Cohen's d = -0.007). In AI-SEO, winners are marginally shorter (1,407 vs 1,458 words). In iGaming, somewhat longer (1,530 vs 1,352). The direction is inconsistent and the magnitude is tiny across all niches — well below our threshold of |d| >= 0.3 for a practically meaningful effect.
What this means. Of 100+ signals we measured, word count ranks near the bottom. The SEO industry's obsession with "comprehensive, long-form content" is optimizing for one of the least predictive signals in our entire dataset. Google doesn't care how long your content is. It cares whether it serves the searcher — and that has nothing to do with word count.
Myth 2: "Add images and visual content"
The rule. Visual content improves engagement and ranking.
The data. GDPR winners: 1 image. iGaming winners: 4. AI-SEO winners: 11. That's an 11x range across niches. In GDPR, more images actually trends toward lower ranking — the correlation is positive (higher position number = lower rank). In AI-SEO, more images clearly correlates with higher ranking.
Why it makes sense. A DPO reading about data processing agreements doesn't want stock photos of padlocks. They want text. A marketer learning about ChatGPT prompts wants screenshots of every step. "Add more images" is correct advice for exactly one-third of the internet.
Myth 3: "Link to authoritative external sources"
The rule. Outbound links to authority sites signal credibility and help ranking.
The data. iGaming winners have zero external links (median). GDPR winners cite fewer sources than competitors (0.5 vs 1.2, -59% gap). AI-SEO winners cite more sources (1.9 vs 1.3, +62% gap). The exact same practice helps in one niche and hurts in another.
Why it makes sense. An iGaming site sending users to another site is literally losing customers — the business model is containment. A GDPR article by a law firm citing five external sources looks like it aggregated other people's work rather than authored its own. An AI tutorial citing studies and tools builds credibility because the audience expects evidence. Link out when your audience values evidence. Don't when they value containment.
Myth 4: "Build strong author bios for E-E-A-T"
The rule. Google's own E-E-A-T guidelines emphasize demonstrating author expertise.
The data. Author bios hurt ranking in GDPR (-62% gap: winners average 12.9 words vs competitors' 34.6) and AI-SEO (-33% gap: 5.4 vs 13.9). Only iGaming shows author bios helping (+59% gap: 5.6 vs 2.6).
Why it makes sense. GDPR readers trust the institution, not the individual. An article from a law firm carries weight regardless of who wrote it. iGaming readers want a named expert — "who tested this casino? Can I trust their experience?"
A nuance. When we control for other variables in the regression model, the author bio signal becomes noisy — the gap may partly reflect that institutional sites (which rank well for GDPR) simply don't use author bios. The univariate gap is dramatic; the causal story is murkier. What's clear is that the audience's trust mechanism differs: institutions vs. individuals.
The irony. Google published the E-E-A-T framework. Our data shows that E-E-A-T signals (like author bios) don't universally help. What helps is actually being worthy of trust — which manifests completely differently depending on what the audience considers trustworthy.
Myth 5: "Add social sharing buttons and engagement widgets"
The rule. Social signals help ranking. Make content easy to share.
The data. In GDPR, winners are 47% less likely to have social sharing widgets (odds ratio 0.53, statistically significant). 42% less likely to have newsletter signups. 34% less likely to have a table of contents. Every engagement widget we measured correlates with lower ranking.
Across all three niches, social sharing shows no positive signal. It is either neutral or negative.
Why it makes sense. A DPO researching GDPR data breach notification requirements isn't going to tweet about it. They want a clean, distraction-free reference page. The widgets are clutter, and Google has learned that clutter drives away the professional audience these pages serve.
Myth 6: "Mention the current year for freshness signals"
The rule. Adding "2026" to your title or headings signals freshness. Google rewards up-to-date content.
The data. Current year references hurt ranking in GDPR (-37% gap: winners average 0.4 mentions vs competitors' 0.6) and even more in headings (-68% gap in heading_with_year_count). But they help in iGaming (+36%: 2.5 vs 1.6).
Why it makes sense. The GDPR was enacted in 2018. Its core requirements haven't changed. Putting "2026 Guide" in the title signals shallow, recycled content — the same article republished each January with a new year in the heading. iGaming odds, promotions, regulations, and platforms change constantly — "2026" signals you're genuinely current.
Freshness signals only help when your audience's world actually changes frequently.
Myth 7: "Follow the same SEO playbook for every niche"
The rule. SEO best practices are universal.
The data. Keyword optimization — the foundational practice of the SEO industry — has zero correlation with ranking in 2 of 3 niches. Only AI-SEO shows keyword placement signals (keyword in title: r = -0.086, p < 0.0001; keyword in H1: r = -0.078). GDPR and iGaming keyword correlations are not statistically significant.
Why it makes sense. Google's language models have advanced far enough that keyword matching is unnecessary for most established topics. Only in newer, more specific query spaces (AI tools, ChatGPT workflows) — where Google has less behavioral data — does keyword placement still provide additional signal.
An entire industry built on "put the keyword in your title" is practicing a technique that literally doesn't register in most of the data.
The Three Universal Ranking Signals (That Held Across All Niches)
After showing everything that reverses, here is what actually held consistent across all three industries:
1. Semantic alignment with top-ranking content.
Your content must cover the same conceptual territory as the pages that currently rank. Not keyword stuffing — topical alignment. Winners score approximately 0.85 on our semantic similarity measure; trailing pages average 0.70-0.73. The gap is enormous. Cohen's d ranges from 0.98 to 1.12 across niches — meaning the distributions barely overlap. This was the single strongest predictor in our model across all three industries, with elastic net coefficients of -0.31 to -0.37.
2. Information gain.
Content that adds something new — information that competitors' top-3 results don't cover — ranks dramatically better. This signal showed a Cohen's d of approximately 1.1 in all three niches, making it one of the most consistent findings in the study. Critically, this signal didn't appear in our earlier studies using only 60 keywords per niche. It only emerged at 300+ keywords — smaller studies don't have the statistical power to detect it.
3. Topic completeness.
Winners cover all relevant subtopics. Trailing pages average only 67% topic coverage in AI-SEO. Missing even one key subtopic measurably hurts ranking. The elastic net coefficient for topic authority completeness was +0.21 to +0.24 across all niches.
The pattern: Align first (cover what people expect), differentiate second (add what nobody else has), and be complete (don't leave gaps). These three held regardless of whether the audience was lawyers, gamblers, or marketers.
Two Bonus Signals Worth Knowing
While digging through the data, two additional patterns emerged that we didn't expect:
Code blocks are ranking poison. Across all three niches, pages with more code blocks ranked significantly worse. In the AI & SEO niche — where you'd most expect code to help — winners averaged just 0.6 code blocks compared to 4.4 for competitive pages, a -59% gap. Cross-study, this was the single largest negative gap of any metric (-57.5%). The data suggests Google's top slots reward strategic, accessible content over technical implementation details.
First-person writing is an anti-pattern. Content marketing advice loves to say "write in first person" and "use personal stories." Our data from 10,589 pages disagrees. First-person count is the 9th most important predictive feature in the cross-study elastic net, and its coefficient is negative: more first-person writing predicts worse rankings. In iGaming, winners use 57% fewer first-person references than competitive pages (5.1 vs 7.2 mean). The authoritative, objective voice outperforms personal-brand writing across all three niches.
Under the Hood: The Prediction Model
We trained a gradient boosting model on all 10,589 pages across all three niches. Cross-validated R-squared = 0.64 +/- 0.02 (SD across folds) — the model explains roughly 64% of ranking variance from content features alone, tested on data it has never seen during training. Folds were stratified by keyword to prevent data leakage (pages from the same keyword never appear in both training and test sets).
Per-niche performance varies: GDPR R-squared = 0.64 +/- 0.04, AI-SEO R-squared = 0.61 +/- 0.05, iGaming R-squared = 0.50 +/- 0.04. The iGaming model performs weakest — likely because gambling content competes more heavily on domain authority and brand trust signals that a content-only model cannot capture.
The remaining ~36% of ranking variance likely includes backlinks, domain authority, Core Web Vitals, and user engagement signals — factors we deliberately did not measure. This is a content-only study.
Where the Model's Attention Goes
To understand what the model actually uses, we grouped our 100+ features into 12 categories and measured each category's total contribution to ranking prediction. The results validate the article's thesis: half of all feature categories flip direction between niches.
| Category | Share of Model | Same Direction Everywhere? |
|---|---|---|
| Semantic alignment | 29.2% | Yes — always helps |
| Content quality (LLM-scored) | 18.9% | Yes — always helps |
| Topic coverage | 16.0% | Yes — always helps |
| Evidence & authority | 7.3% | Yes — but magnitude varies |
| SEO technical signals | 5.9% | Flips — helps in GDPR, hurts in AI-SEO |
| Engagement widgets | 4.9% | Flips — hurts in GDPR/iGaming, helps in AI-SEO |
| Structure & formatting | 4.5% | Flips — hurts in GDPR, helps elsewhere |
| Document basics | 3.9% | Consistent but weak |
| Links | 3.8% | Flips — hurts in GDPR/AI-SEO, helps in iGaming |
| Intro quality | 2.5% | Flips — helps in GDPR/AI-SEO, hurts in iGaming |
| Visual content | 2.3% | Flips — hurts in GDPR, helps elsewhere |
| Readability | 0.5% | Consistent but negligible |
The top 3 categories — semantic alignment, content quality, and topic coverage — account for 64% of the model's predictive power and hold consistent across all niches. These are the universals.
The bottom 6 categories account for the remaining ~20% and flip direction between niches. These are the audience-dependent signals — the exact features that generic SEO advice gets wrong by treating them as universal.
Our model explains 64% of ranking variance by reading only the content. No backlink data. No domain authority. Just what's on the page and whether it serves its audience correctly.
The Bigger Question: Is This Good or Bad for the Web?
The finding that Google has built per-niche audience models raises a question that goes beyond SEO.
The optimistic reading: Google has become remarkably good at understanding what different audiences actually want. Compliance officers get dense legal references because that's what serves them. Marketers get visual tutorials because that's how they learn. The algorithm isn't imposing a universal format — it's adapting to human information-consumption patterns per domain. This is arguably what search should do.
The uncomfortable reading: If Google rewards content that looks like existing winners, it creates a convergence pressure. Every new GDPR article must look like gdpr-info.eu. Every new AI tutorial must have 11 screenshots. The "audience mirror" becomes an audience mold — and content that breaks the template gets buried, regardless of quality. Our data shows that the top-3 and positions 4-10 in each niche are converging toward the same format. The template narrows over time.
The implications are real. In the ongoing debate about Google's role in shaping web content — from the DOJ antitrust case to the "Search is getting worse" discourse — our data offers an empirical anchor. Google's ranking function is not niche-neutral. It has learned, per vertical, what "good content" looks like based on user behavior. That makes it both more useful and more powerful than most discussions assume.
Whether that power is well-exercised is a question for a different study. But the data is clear: the universal web that SEO practitioners imagine — where the same rules apply everywhere — does not exist in Google's ranking system. Each niche is its own ecosystem, with its own physics.
What This Means for Anyone Creating Content
1. Generic SEO checklists are, at best, inconsistently right. They're averages of averages, built from cross-niche studies that smooth out the very differences that matter. Our data shows that for most "best practices," the advice is actively wrong or irrelevant for at least one of the three niches we studied.
2. The winners didn't win because they "optimized" better. They won because they understood how their audience reads, what format they trust, and what distracts them. In GDPR, trust looks like a clean legal document. In iGaming, trust looks like a named reviewer with current data. In AI content, trust looks like a visual tutorial with citations.
3. Google is further ahead than most content creators realize. The search engine has effectively learned audience psychology at a per-niche level. It doesn't need your keyword stuffing or your engagement widgets. It needs your content to serve the person searching.
4. Every piece of SEO advice should come with a warning label: "Results vary by industry — and they will."
The Niche Content Diagnostic
Use this to audit any content strategy against the data:
Step 1: Which audience archetype?
| If your audience... | They're closest to... | Content should feel like... |
|---|---|---|
| Needs precision, distrusts fluff, values institutional authority | The Careful Professional (GDPR) | A legal memo — dense, clean, no widgets |
| Wants fast answers, current info, distrusts outbound links | The Impatient User (iGaming) | A product spec — current data, named reviewers, contained |
| Learns by doing, expects visual walkthroughs | The Structured Learner (AI-SEO) | A tutorial — screenshots, hierarchy, citations |
Step 2: Which myths flip for you?
| "Best Practice" | Careful Professional | Impatient User | Structured Learner |
|---|---|---|---|
| Add images | Hurts | Neutral | Helps significantly |
| Build author bios | Hurts | Helps | Hurts |
| Link externally | Hurts | Hurts | Helps |
| Add social sharing | Hurts | Neutral | Neutral |
| Mention current year | Hurts | Helps | Hurts |
| Optimize keywords | No effect | No effect | Helps |
Step 3: Audit your top 5 pages. Compare your top-performing pages against the niche-specific winner profile above — not against a generic SEO checklist. If your pages follow advice meant for a different audience archetype, you've found your problem.
The Raw Numbers
Every claim in this article is backed by specific measurements. Here are the complete comparison tables so you can verify — or cite — any finding directly.
Winner vs. Competitor Content Profiles (Medians)
| Metric | GDPR Winners | GDPR Competitive | iGaming Winners | iGaming Competitive | AI-SEO Winners | AI-SEO Competitive |
|---|---|---|---|---|---|---|
| Word count | 1,413 | 1,373 | 1,530 | 1,352 | 1,407 | 1,458 |
| Images | 1 | 1.5 | 4 | 3 | 11 | 7 |
| External links | 1 | 2 | 0 | 0 | 2 | 2 |
| Flesch-Kincaid grade | 14.6 | 14.3 | 13.8 | 13.4 | 11.1 | 11.3 |
| H3:H2 ratio | 0.25 | -- | 0.18 | -- | 0.89 | 1.0 |
| Reading time (min) | 5.7 | 5.5 | 6.1 | 5.4 | 5.6 | -- |
Effect Sizes for Key Findings (Cohen's d)
| Signal | GDPR | iGaming | AI-SEO | Cross-niche |
|---|---|---|---|---|
| Semantic alignment (top-3) | +1.06 | +0.98 | +1.12 | Universal |
| Information gain | +1.11 | +1.10 | +1.11 | Universal |
| Cosine sim to rank-1 | +1.18 | +1.23 | +1.72 | Universal |
| Topic completeness | +0.29 | moderate | +0.48 | Consistent |
| Content length | ~0.01 | <0.01 | ~0.05 | Not significant |
Prediction Model Performance (Cross-Validated R-squared)
| Model | GDPR | iGaming | AI-SEO | Cross-Study Pooled |
|---|---|---|---|---|
| Elastic Net | 0.45 | 0.37 | 0.39 | 0.40 |
| Gradient Boosting | 0.64 | 0.50 | 0.61 | 0.64 |
Winners = positions 1-3. Competitive = positions 4-10. P-values from Mann-Whitney U tests (non-parametric). Effect sizes reported as Cohen's d (standardized mean difference).
Download the Data
We're publishing the aggregate study data so anyone can verify or cite any finding in this article:
- Study Summary Data (CSV) — Model performance, category-level feature importance, effect sizes, gap analysis, and study metadata. All aggregate-level; no individual page data or feature-level detail exposed.
- Study Summary (PDF) — One-page visual summary of the 7 myths, 3 universal signals, and model performance. Designed for citation, printing, or embedding in presentations.
If you use these numbers in your own content, we'd appreciate a link back to this study.
Full Methodology Details
Study Design
- Niches: 3, chosen for maximum diversity — GDPR compliance (legal/regulatory), iGaming (entertainment/commercial), AI & SEO content (technology/marketing)
- Keywords: ~300 per niche (899 total), stratified by search intent: awareness, consideration, and decision queries
- Pages: Top 20 Google results per keyword, crawled and analyzed — yielding 10,589 pages after filtering
- Content filtering: Editorial content only. We removed Reddit, Wikipedia, government portals, forums, and pages under 200 words. We wanted to study content that competes on its own merit, not content that ranks on domain authority alone.
Measurement
- Content signals: Over 100 signals extracted per page, covering: document metrics, readability, heading structure, visual content, internal/external links, SEO elements, evidence and citations, engagement widgets, semantic similarity, and content quality scoring.
- Semantic measurement: Embedding-based cosine similarity to top-3 ranked content and to rank-1 content per keyword.
- Quality measurement: LLM-based scoring for coverage, depth, specificity, engagement, information gain, and intent alignment.
Statistical Methods
Four independent methods that must converge before we report a finding:
- Spearman rank correlations with Bonferroni correction — does each metric correlate with ranking position?
- Mann-Whitney U tests — do winners (positions 1-3) differ from the rest? Non-parametric, no distribution assumptions.
- Cohen's d effect sizes — is the difference practically meaningful? Our threshold for "actionable": |d| >= 0.3.
- Regularized regression — Elastic Net (L1+L2, cross-validated alpha) to identify which metrics predict rank together, controlling for multicollinearity. Plus Gradient Boosting regression for non-linear patterns.
Model Validation
- Cross-validation: Stratified k-fold, out-of-sample testing. Models were tested on data they were never trained on.
- Cross-study pooling: We pooled all 10,589 pages across niches and trained a unified model. Cross-validated R-squared = 0.6449 (gradient boosting).
- Per-niche validation: GDPR R-squared = 0.64, AI-SEO R-squared = 0.61, iGaming R-squared = 0.50.
Limitations
We want to be transparent about what this study does not cover:
- Content-only analysis. We did not measure backlinks, domain authority, Core Web Vitals, or user engagement signals. Content explains ~64% of ranking variance — the rest comes from factors outside this study's scope.
- Temporal snapshot. Data collected in 2025-2026. Rankings change.
- Three niches. More industries are needed to claim truly universal findings. Our three niches were chosen for diversity, but they are not exhaustive.
- Embedding model limitations. Our semantic similarity measure uses a model with a context window of approximately 200 words. This means semantic scores are most sensitive to titles and opening sections.
- Correlation, not causation. We can identify what top-ranking pages have in common and where they differ. We cannot prove that changing these attributes will cause a ranking change.
Frequently Asked Questions
How accurate is a content-only ranking prediction model?
Our cross-validated R-squared = 0.6449, meaning content features explain about 64% of ranking variance on data the model has never seen. The remaining ~36% likely includes domain authority, backlinks, and behavioral signals. For comparison, a random model would achieve R-squared = 0.
Do ranking factors really differ between industries?
Dramatically. In our data, virtually every content signal we measured showed different — and often opposite — correlations with ranking across the three niches. Of 158 metrics measured, 66 showed significant correlations in GDPR, 55 in iGaming, and 37 in AI-SEO — and the sets overlap only partially.
How many pages do you need for a reliable niche study?
Our per-niche studies used 300 keywords (2,800-3,400 pages each). Our earlier studies using only 60 keywords missed the information gain signal entirely — it emerged only at the 300-keyword scale. Smaller studies lack the statistical power to detect weaker but important signals.
What signals had the biggest impact across all niches?
Semantic alignment (cosine similarity to top-3 results), information gain (unique content not present in competitors), and topic completeness were the only signals that held consistently across all three niches. Everything else — images, links, readability, author bios, keyword placement — varied by niche.