Search engines use “crawlers” to discover pages on websites. These pages are then scored by hundreds of signals to determine if they should be “indexed” and where they should be “ranked” in the index for a given keyword.
Most technical SEO auditsdon’t analyze your website’s crawl, as off the shelf SEO tools don’t gather the proper data. This is a huge mistake, particularly for larger websites.
Mismanaging your website’s crawl can lead to:
Decrease in your average daily crawl allocation
Decrease in your indexationrate, and ultimately
Decrease in your rankings and organic traffic
This webinar will explore
for increased visibility in search engines.
What is a Crawl Budget?
Google’s Gary Illyes noted there’s no “official” definition of crawl budget in a Google Webmaster blog post: “Recently, we’ve heard a number of definitions for ‘crawl budget’, however, we don’t have a single term that would describe everything that ‘crawl budget’ stands for externally.”
He goes onto break the budget itself into three components:
1. Crawl rate limit
The number of simultaneous connections Googlebot may use and also the time it waits between each crawl.
2. Crawl health
If a site responds quickly to Googlebot, the bot will see the site as healthy and can increase the crawl rate. Conversely, if the site responds poorly, the crawl rate can go down.
3. Crawl demand
If a site doesn’t have much content, or new content being added, or doesn't have content linked to, the demand for Google to crawl that site reduces.
You can hopefully see there are some things we can have some influence over here and that some of the activity you are doing already has a bigger impact on your ability to rank you than thought.
Popularity - URLs that are more popular, tend to be crawled more often to maintain freshness
Staleness - Google’s attempt to prune pages that no longer provide user value
What is my site's crawl budget?
We can use Google Search Console (or Bing Webmaster Tools) to get the data we need.
Just log in to Search Console > Domain > Crawl > Crawl Stats to see
.
How to read Crawl Stats reporting in GSC
Google doesn’t give any guidance on what these reports mean, so let’s take a look at 3 client websites to give some context:
1. Website with a stable crawl
2. Website with a crawl that dropped suddenly
3. Website with a increasing crawl
What factors affect my site’s crawl budget?
Any of the following factors could have a negative impact on your crawl budget.
a. Faceted Navigation and Session Identifiers
For larger sites, this is a way to filter and display certain results. While great for users – for Googlebot not so much. This creates lots of duplicate content issues, sends Google the wrong message, and can mean Googlebot crawls pages we don't want it to.
b. On-site duplicate content
This can occur pretty easily on custom built eCommerce websites. For example:
https://example.com/product.php?item=swedish-fish
https://example.com/product.php?category=gummy-candy&item=swedish-fish&affiliateid=1234
https://example.com/product.php?item=swedish-fish&trackingid=334&sort=price$sessionid=1234
This can easily happen on custom built eCommerce websites - the solution is to use canonicalization or dynamic url parameters.
c. Soft error pages
Having Google see a 200 (OK) status returned for a page that doesn't exist and that should be a 404 error. Google doesn’t want to waste time crawling these pages and over time, this can negatively impact how often Google visits your site.
d. Avoid infinite spaces
These happen when links go on and on. Google gives the example of a calendar with a “Next Month” link. Googlebot could theoretically follow those links forever.
e. Low quality and spam content
Google wants to show users the best search results possible. Having a lot of low quality, thin or spammy content will cause Google to lose trust in your website.
How can I audit my website’s crawl budget?
The best way to audit your site crawl budget is to analyze your log files. Your server logs should be readily available but can normally be accessed via cPanel, Plesk or FTP.
Downloading your log files you can use a tool such as botify which can help you understand where Googlebot is spending most of its time on your site.
You are looking to find out:
Which pages get the most hits/ requests
Which pages return errors
Which pages get the least hits/ requests
Once you have identified these pages and errors, we can start to look at increasing our crawl budget.
How can I increase my website’s crawl budget?
There are a number of things you can do to increase your crawl budget and are both onsite and offsite elements. Make sure you check all of these methods to increase your crawl budget.
a. Reduce crawl errors
The most common website errors are:
Broken pages – 404 error pages
Pages that time out
Misused or not used directive tags – No Index / Canonical / Redirect
Blocking pages
Faulty mobile device sniffer - serving desktop pages to mobile users.
How to find them
Use a tool such as ScreamingFrog, SEMRush or AHREFs to find and fix your broken site links and check for any errors you can repair.
How to fix them
Visit timed out pages to see what is causing them to fail – third-party embed, image sizes – repair or remove.
Check all your directive tags at a page level to ensure you have the right tags in the right page. If you are using a plugin such as YoastSEO, make sure it is configured correctly.
Ensure you recreate any sitemaps and resubmit them to your GSC.
Check your site renders correctly on mobile devices.
b. Reduce Redirect Chains
Redirect chains stop the flow of link equity around a site and dramatically reduces Google crawl rates.
Matt Cutts in one of his webmaster help videos back in 2011 recommended 1-2 hops as ideal, three if needed, but suggested that once you got into the 4-6 range, the success rate of Googlebot following those redirect hops was very low.
How to find
Use your Log analyser to find redirects and see where they redirect too.
Create a list of URLs that redirect to another URL that then also redirects. Botify has a great tool for doing this.
How to fix
Once you have this list, you will be able to remove the chains by changing the redirect on the first URL to the last URL in the chain.
c. Better use of Robots.txt
Make sure you are using your robots.txt file to block parts of your site you don't need Google to waste its crawl budget on.
These could include:
Query parameters
Subdomain or development pages
Low-value pages – PPC pages, user forum profile pages.
d. Increase your website’s speed
As Google touched on above, site speed is a factor in your crawl health – the faster your site, the better crawl health you will have and the more Google will crawl your site. You can see that Google has highlighted this as a major reason for reduced crawl budget.
How to find issues
Use a tool such as SEMRush, Pingdom or Google Site Speed Tool and figure out how you can make sure your site responds faster.
How to fix issues
Some of the major elements to investigate will be:
Browser caching – you can use a plugin such as W3 Cache, WP Optimise, or SuperCache
Image and code compression – make sure images are optimized before uploading and run WP Smush Minify HTML, CSS and Javascript
e. Dynamic URL parameter settings
For those with eCommerce platforms that have dynamic URLs, Googlebot treats any dynamic URLs that lead to the same page as separate pages. This means you may be wasting your crawl budget by having Google to crawl the same page over and over as it thinks it is a different page.
How to find
With most big eEommerce sites you will see your site has parameter settings. Just look at the URLs in the address bar.
You might not be able to work around that from a display point of view, but you can tell Google what to ignore and how to handle dynamic pages.
How to fix
You can manage your URL parameters by going to your Google Search Console and clicking Crawl > Search Parameters.
From here, you can let Googlebot know if your CMS adds parameters to your URLs that doesn’t change the page’s actual content.
f. Get more Backlinks
Backlinks not only help to improve your organic rankings, they push search crawlers to your site more often.
Why bother?
Based on our data (average of FTF client data)
55% increase in organic traffic
87% increase in keyword rankings
38% increase in Leads and Revenue.
Wrapping it Up
While this is a part of technical SEO, it is something that gets overlooked by lots of SEOs. Crawl budgets as you can now see are a big part in making sure your site gets the crawl it needs. If you don't have the right amount of budget or are using it up sending Google to the wrong pages, your site will never drive you the traffic it should. As you can see from the above, this stuff works. And it can be a great way to unblock a site's ability to drive organic search traffic.