404 Errors… What the bots find

You click a link to that must-read article that someone has linked to only to discover that ARGH!!!! it’s a 404 error page as somewhere down the line the URL was copied wrong, the site changed its permalinks and the redirect isn’t working properly or they removed the article completely. We all have them on our site and you will be surprised at how many. Last week I looked at finding the source of User’s hitting the 404 pages and it gave you a list of URLs as a result but this week we’re going to look into the depths of the site and discover where bots are finding those 404’s.

Discover where your site is linked and when crawled is producing a 404 error page and what the difference between Not Found and Soft 404's

In this article, I’ll cover

What are bots?

Why Bots have a higher number of 404 errors?

How to find where the bots are finding your 404 page?

What a soft and not found 404 error are and why one is more critical than the other?

What are bots?

I guess before I go any further what are bots – well I guess you could call the crawlers, creepers, web spiders or bots they crawl the web going from site to site following the links that form the web. These aren’t real people although there are people that check and monitor these bots they are codes that are set up that do the work.

As they crawl from one site to the next through the links they record information about the pages, sites, links, and content and then use this to produce reports, analyse and make decisions based on the content.

They can be used for all sorts of things but in this instant, we’re going to look at the search engine bots and in particular google. Why google bots – because google is generally the biggest search engine and also it provides us the most information easily through two of its tools – Google Analytics and Search Console.

Why bots have a higher number of 404 errors than users?

So following the steps from last week you probably have a list of your User found errors and can easily see where the problems are coming from some are easier to fix than others but we’ll come to that in part 4 of the series.

Now when we look at the errors that the bots have found you will probably see a LOT more. Why?

On my site Rainy Day Mum I had 100 different ways people had got to the 404 error on the site but looking at the bots there are 672 Not found and 81 Soft 404’s. Don’t worry I’ll explain the difference between Not Found and Soft 404’s later and in another post later on we’ll look at how you can deal with those at a later date.

But why the difference between 100 and 672 – that’s 572 different points on the site that result in a 404 error because the bots are crawling every link and the users – well what is linked may not be really useful. It’s actually probably not and may be long redundant on your site but if the bot found it and then is always a chance that at some point in the future a user will find it as well and that will move from bots to users and start to affect your bounce rate.

So although fixing the user routes to the bounce rates is important – the next step in your funnel to reduce your bounce rate is to fix the bots finds.

How to find where the bots are finding your 404 pages!

Instead of Google Analytics, we are going to use Search Console to find the errors. If you are not familiar with Search Console then I suggest you sign up for our FREE Sitemaps and Search Console Course to become familiar with it and make sure that it is set up correctly to use.

 

Go to Search Console and your website property

  1. Click on Crawl -> Crawl ErrorsDiscover where your site is linked and when crawled is producing a 404 error page and what the difference between Not Found and Soft 404's
  2. Now click on Not FoundDiscover where your site is linked and when crawled is producing a 404 error page and what the difference between Not Found and Soft 404's
  3. Below the line graph, there will be a table with all of the path part of the URL on your site that the bots have found a 404 error page.

This list is organised in priority order – that priority is given by how important google considers you fixing the issues to be. It’s based on

We’ve ranked the errors so that those at the top of the priority list will be ones where there’s something you can do, whether that’s fixing broken links on your own site, fixing bugs in your server software, updating your Sitemaps to prune dead URLs, or adding a 301 redirect to get users to the “real” page. We determine this based on a multitude of factors, including whether or not you included the URL in a Sitemap, how many places it’s linked from (and if any of those are also on your site), and whether the URL has gotten any traffic recently from search. – From Google Webmaster Blog

Now you have that data you know which pages on your site the bots are finding 404 errors – but how are they getting there?

  1. Click on one of the paths in the table below the line graph
  2. A box will pop-up with 3 optionsDiscover where your site is linked and when crawled is producing a 404 error page and what the difference between Not Found and Soft 404's
  3. Click on Linked from – this will give you a list of the places where the page is linked Discover where your site is linked and when crawled is producing a 404 error page and what the difference between Not Found and Soft 404's

If you want to work through this list methodically then download to CSV or google docs and you can start to work through it.

What is the difference between Soft 404’s and Not Found 404’s?

In search console, you will have seen two different types of crawl errors (there are 3 but the 3rd is usually to do with your site being down so you would be aware if that error pops up). The Not Found errors are the ones that we have been focusing on. These produce a 404 code when the page is accessed or an Error page like the one below for our site.

Looking at crawl errors and where bots are finding your 404 pages as well as explaining Soft 404's

A Soft 404 though doesn’t. An error page may pop-up but the page isn’t producing a 404 code so it’s not giving the correct information to the bots.

Discover where your site is linked and when crawled is producing a 404 error page and what the difference between Not Found and Soft 404's

These do affect your SEO and need to be looked into in more detail and fixed. That process is more technical but I will cover it at a later date.

Next Step….

Fixing your 404 errors from the User data and from the bots

Share these tips with other bloggers
Cerys Parker

Cerys is the founder of Rainy Day Mum a top UK parenting blog. Prior to having children, she taught digital media and web development. Supporting other bloggers to develop, grow and expand their blogs through actionable tasks that aren't as terrifying as they seem to be!

Click Here to Leave a Comment Below

Sarah Nenni-Daher - June 16, 2017

I have an example I’d like to ask about. If we schedule articles to links not yet published (‘cuz we forget so we do it ahead of time), will the errors resolve on their own when the site is crawled again?

🙂

Reply
Leave a Reply: