Website crawler for structured analysis

A crawler that can walk through any website, visit each internal page, and collect structured data for analysis. The goal is not just scraping raw HTML but building a usable map of the site.

For each page it could capture the URL, page title, meta description, headings, main content blocks, internal links, images, forms, calls to action, schema markup, and page performance signals. Then it could aggregate that into a website-level report.

Build a crawl map and page inventory
Collect structured page metadata and content signals
Highlight weak pages, duplicate patterns, and missing content
Surface navigation issues and broken internal links
Support export for further analysis or LLM-based summarisation

This would be useful for understanding a website quickly, auditing information architecture, spotting SEO/content gaps, and analysing competitors or client sites in a repeatable way.