404 Errors… What the bots find
You click a link to that must-read article that someone has linked to only to discover that ARGH!!!! it’s a 404 error page as somewhere down the line the URL was copied wrong, the site changed its permalinks and the redirect isn’t working properly or they removed the article completely. We all have them on our site and you will be surprised at how many. Last week I looked at finding the source of User’s hitting the 404 pages and it gave you a list of URLs as a result but this week we’re going to look into the depths of the site and discover where bots are finding those 404’s.
In this article, I’ll cover
What are bots?
Why Bots have a higher number of 404 errors?
How to find where the bots are finding your 404 page?
What a soft and not found 404 error are and why one is more critical than the other?
What are bots?
I guess before I go any further what are bots – well I guess you could call the crawlers, creepers, web spiders or bots they crawl the web going from site to site following the links that form the web. These aren’t real people although there are people that check and monitor these bots they are codes that are set up that do the work.
As they crawl from one site to the next through the links they record information about the pages, sites, links, and content and then use this to produce reports, analyse and make decisions based on the content.
They can be used for all sorts of things but in this instant, we’re going to look at the search engine bots and in particular google. Why google bots – because google is generally the biggest search engine and also it provides us the most information easily through two of its tools – Google Analytics and Search Console.
Why bots have a higher number of 404 errors than users?
So following the steps from last week you probably have a list of your User found errors and can easily see where the problems are coming from some are easier to fix than others but we’ll come to that in part 4 of the series.
Now when we look at the errors that the bots have found you will probably see a LOT more. Why?
On my site Rainy Day Mum I had 100 different ways people had got to the 404 error on the site but looking at the bots there are 672 Not found and 81 Soft 404’s. Don’t worry I’ll explain the difference between Not Found and Soft 404’s later and in another post later on we’ll look at how you can deal with those at a later date.
But why the difference between 100 and 672 – that’s 572 different points on the site that result in a 404 error because the bots are crawling every link and the users – well what is linked may not be really useful. It’s actually probably not and may be long redundant on your site but if the bot found it and then is always a chance that at some point in the future a user will find it as well and that will move from bots to users and start to affect your bounce rate.
So although fixing the user routes to the bounce rates is important – the next step in your funnel to reduce your bounce rate is to fix the bots finds.
How to find where the bots are finding your 404 pages!
Instead of Google Analytics, we are going to use Search Console to find the errors. If you are not familiar with Search Console then I suggest you sign up for our FREE Sitemaps and Search Console Course to become familiar with it and make sure that it is set up correctly to use.
Go to Search Console and your website property
- Click on Crawl -> Crawl Errors
- Now click on Not Found
- Below the line graph, there will be a table with all of the path part of the URL on your site that the bots have found a 404 error page.
This list is organised in priority order – that priority is given by how important google considers you fixing the issues to be. It’s based on
We’ve ranked the errors so that those at the top of the priority list will be ones where there’s something you can do, whether that’s fixing broken links on your own site, fixing bugs in your server software, updating your Sitemaps to prune dead URLs, or adding a 301 redirect to get users to the “real” page. We determine this based on a multitude of factors, including whether or not you included the URL in a Sitemap, how many places it’s linked from (and if any of those are also on your site), and whether the URL has gotten any traffic recently from search. – From Google Webmaster Blog
Now you have that data you know which pages on your site the bots are finding 404 errors – but how are they getting there?
- Click on one of the paths in the table below the line graph
- A box will pop-up with 3 options
- Click on Linked from – this will give you a list of the places where the page is linked
If you want to work through this list methodically then download to CSV or google docs and you can start to work through it.
What is the difference between Soft 404’s and Not Found 404’s?
In search console, you will have seen two different types of crawl errors (there are 3 but the 3rd is usually to do with your site being down so you would be aware if that error pops up). The Not Found errors are the ones that we have been focusing on. These produce a 404 code when the page is accessed or an Error page like the one below for our site.
A Soft 404 though doesn’t. An error page may pop-up but the page isn’t producing a 404 code so it’s not giving the correct information to the bots.
These do affect your SEO and need to be looked into in more detail and fixed. That process is more technical but I will cover it at a later date.
Fixing your 404 errors from the User data and from the bots