LinkChecker - Detect Link Rot on Your Website

Posted:
Updated:

Source Code: Gitlab

What was the Problem?

I first encountered the idea of avoiding link rot while reading an answer on StackOverflow - someone was asked to provide code directly in their answer instead of just linking to an external website. The reason given for this was that if the external website was to go down then their answer would become less useful (or even useless).

I think about this a lot when I work on my website - if I am going to provide a link to an external resource, I want to be sure that it is accessible.

The Solution

With that in mind, I wrote a little (122 lines) Python script that does a few things:

  1. Takes in an XML sitemap
  2. Opens every URL in that sitemap
  3. Opens every <a> element on every page
  4. Keeps track of the HTTP status code of every anchor
  5. Makes a little report detailing every page with a list of broken links (anything in the 4xx and 5xx range).

To my delight it was immediately useful - pointing out 3 links that had been dead for who knows how long!

Learnings

Getting Started

If you are interested in having a lightweight script scrape your website on demand or on a regular basis to check all your links for link rot, head over to the Gitlab page and follow the instructions to get it set up on your website!

File an issue if it doesn’t work or send me an email if you get real stuck!

– Finn 👋