A crawler that can walk through any website, visit each internal page, and collect structured data for analysis. The goal is not just scraping raw HTML but building a usable map of the site.
For each page it could capture the URL, page title, meta description, headings, main content blocks, internal links, images, forms, calls to action, schema markup, and page performance signals. Then it could aggregate that into a website-level report.
- Build a crawl map and page inventory
- Collect structured page metadata and content signals
- Highlight weak pages, duplicate patterns, and missing content
- Surface navigation issues and broken internal links
- Support export for further analysis or LLM-based summarisation
This would be useful for understanding a website quickly, auditing information architecture, spotting SEO/content gaps, and analysing competitors or client sites in a repeatable way.