May 8, 2021

Auditing Sites With Archive.org

Hey all, I wanted to share a quick tip for anyone auditing sites or working through site migrations, etc.

I’ve found sometimes that in some cases historic URLs can be lost and as such sometimes there are also lost opportunities. (external sties linking to 404's as an example) so what I like to do is use this tool to get a broad view of all URLs that at some point have been crawled by archive.org

https://web.archive.org/cdx/search/cdx?url=dom&matchType=domain&fl=original&collapse=urlkey (replace dom with the website domain)

This then gives a huge list of URLs, there may be some crap in there but for the most part it's super useful. I then take these URLs and whack them in screaming frog list most to check what their status is. You’d be surprised at how often there are some quick wins in here.

You can also de-dupe against any other URL lists you might have which again is really useful

Leave a Reply

Your email address will not be published. Required fields are marked *