About the Index

A public benchmark for social preview reliability

We analyze HTML metadata quality across a stable ecommerce cohort. Cohort metrics are public; domain lookup is diagnostic only.

How the index works

Analyze a Domain

Subdomains and full URLs are normalized automatically.

Research Methodology

Methodology

The index measures structural social preview reliability for retail ecommerce domains. Measurement is HTML-only and does not execute JavaScript. Public reporting is aggregated at cohort level, with no public ranking lists.

Snapshot full_20260227-002835Feb 27, 2026

Cohort selection

Cohort construction is deterministic and rank-based. Source inputs are taken from a fixed Tranco date and transformed into a bounded ecommerce crawl cohort using conservative structural criteria. Universe and active set sizes describe crawl scope, while benchmark eligibility determines aggregate inclusion.

Tranco cohort dateFeb 26, 2026

Top list boundTranco Top 200k

Universe / active set3,000 / 1,500

Cohort definition: Tranco Top 200k → active crawl set → benchmark-eligible retail catalog domains.

Domain unit and normalization

The reporting unit is the registrable domain (eTLD+1). Subdomains are normalized into the registrable domain for cohort accounting and score aggregation. Country-code domains are analyzed independently to avoid cross-market blending.

shop.example.com and www.example.com normalize to example.com.
nike.com and nike.de are analyzed as separate units.
Lookup can show Analyzed under host: ... when redirects resolve to a canonical host.

HTML-only crawl model

Crawling fetches server-rendered HTML and metadata endpoints only. JavaScript execution is not performed. This aligns measurement with how social preview crawlers primarily consume metadata from initial HTML responses.

Seeded homepage fetch establishes crawl scope and host controls.
Sampling is deterministic and bounded, typically 100–200 URLs per domain.
Sampling weight emphasizes product URLs to measure share-card reliability risk.

Product discovery and page classification

Product URLs are identified using conservative structural signals rather than merchant brand assumptions. Discovered URLs are grouped into homepage, product, category, blog, and other types before reliability classification.

Discovery health is tracked separately from score outcomes.
Coverage metrics quantify whether product sampling is structurally sufficient.
Median product discovery and strong coverage are monitoring concepts, not score overrides.

Preview reliability classification

URL checks are classified into Stable, Degraded, or Unreliable tiers. Unreliable marks signals with high likelihood of broken or materially degraded link previews.

Tier definitions

Stable: core preview signals are present and structurally valid.
Degraded: preview likely renders but with quality or consistency loss.
Unreliable: strong structural indicators of preview failure risk.

Unreliable examples

Missing og:image.
Inaccessible or invalid og:image.
Severe canonical vs og:url mismatch when present.
Invalid or non-renderable image dimensions for preview cards.

Classification is intentionally conservative. Image bytes are not stored; only metadata probes (status and dimensions) are recorded.

Benchmark eligibility (why 647 domains)

Benchmark aggregates include only domains that satisfy structural retail catalog eligibility within the snapshot window. This keeps the benchmark comparable across runs and avoids blending fundamentally different site shapes into retail catalog statistics.

Minimum product URL checks are required.
Minimum total URL checks are required.
Sufficient discovery coverage is required.
Domains can still have lookup scores without entering benchmark aggregates.

For snapshot full_20260227-002835, the benchmark sample size is 647 because only domains passing these eligibility rules are included.

Inclusion rule: Benchmark aggregates include only benchmark-eligible retail catalog domains.

Snapshot logic and reproducibility

Each public snapshot is computed from a fixed data window and deterministic rank-ordered cohort traversal. Aggregate metrics are derived only from eligible domains observed inside that window.

Data windowJan 13, 2026 → Feb 27, 2026

Publish dateFeb 27, 2026

Sample size647 domains included

Selection order is deterministic and rank-based.
Snapshot metadata records publish date and sample size.
On-demand and Slack-triggered analyses do not affect public benchmark aggregates.

Accessibility and blocking

Some hosts may block automated HTML access through rate limiting, bot defenses, or network controls. These effects are reported as aggregate crawl accessibility metrics.

Accessibility and blocking metrics reflect transport-level crawler access and should not be interpreted as user-facing service availability or downtime.

Limitations

HTML-only measurement does not model hypothetical JS-executed preview behavior.
Platform detection is heuristic and sensitive to small sample sizes.
Country domains are independent units; brand families are not merged automatically.
Small platform samples should not be over-interpreted.
Classification is conservative and designed for comparability over precision claims.